PANDAseq: paired-end assembler for illumina sequences View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-02-14

AUTHORS

Andre P Masella, Andrea K Bartram, Jakub M Truszkowski, Daniel G Brown, Josh D Neufeld

ABSTRACT

BackgroundIllumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.ResultsPANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.ConclusionsPANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. More... »

PAGES

31

References to SciGraph publications

  • 2010-10-21. BIPES, a cost-effective high-throughput method for assessing microbial diversity in THE ISME JOURNAL: MULTIDISCIPLINARY JOURNAL OF MICROBIAL ECOLOGY
  • 2011-06-16. Illumina-based analysis of microbial community diversity in THE ISME JOURNAL: MULTIDISCIPLINARY JOURNAL OF MICROBIAL ECOLOGY
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/1471-2105-13-31

    DOI

    http://dx.doi.org/10.1186/1471-2105-13-31

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1047017534

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/22333067


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bacteria", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Metagenomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Bacterial", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "RNA, Ribosomal, 16S", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Masella", 
            "givenName": "Andre P", 
            "id": "sg:person.0770612035.54", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Bartram", 
            "givenName": "Andrea K", 
            "id": "sg:person.0654641046.01", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Truszkowski", 
            "givenName": "Jakub M", 
            "id": "sg:person.01320220640.40", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Brown", 
            "givenName": "Daniel G", 
            "id": "sg:person.0642727740.54", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada", 
              "id": "http://www.grid.ac/institutes/grid.46078.3d", 
              "name": [
                "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Neufeld", 
            "givenName": "Josh D", 
            "id": "sg:person.01030400146.17", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/ismej.2010.160", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1048299429", 
              "https://doi.org/10.1038/ismej.2010.160"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ismej.2011.74", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1012050464", 
              "https://doi.org/10.1038/ismej.2011.74"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2012-02-14", 
        "datePublishedReg": "2012-02-14", 
        "description": "BackgroundIllumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.ResultsPANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.ConclusionsPANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over na\u00efve assembly with negligible loss of \"good\" sequence.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/1471-2105-13-31", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1023786", 
            "issn": [
              "1471-2105"
            ], 
            "name": "BMC Bioinformatics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "13"
          }
        ], 
        "keywords": [
          "upstream processing", 
          "error correction", 
          "negligible loss", 
          "error mask", 
          "template", 
          "genomic DNA", 
          "source template", 
          "assembly", 
          "control library", 
          "alternative method", 
          "quality information", 
          "mask", 
          "amplicons", 
          "billions", 
          "sequence yields", 
          "microbial communities", 
          "rRNA gene", 
          "incorporation", 
          "reads", 
          "mismatch", 
          "DNA", 
          "processing", 
          "yield", 
          "sequence", 
          "method", 
          "library", 
          "loss", 
          "available tools", 
          "assemblers", 
          "genes", 
          "organisms", 
          "tool", 
          "error", 
          "benchmarks", 
          "information", 
          "increase", 
          "correction", 
          "scale", 
          "number", 
          "basis", 
          "paired-end reads", 
          "data", 
          "community", 
          "most errors", 
          "Illumina sequences", 
          "low-quality bases"
        ], 
        "name": "PANDAseq: paired-end assembler for illumina sequences", 
        "pagination": "31", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1047017534"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/1471-2105-13-31"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "22333067"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/1471-2105-13-31", 
          "https://app.dimensions.ai/details/publication/pub.1047017534"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T17:00", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_575.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/1471-2105-13-31"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'


     

    This table displays all metadata directly associated to this object as RDF triples.

    164 TRIPLES      21 PREDICATES      78 URIs      68 LITERALS      12 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/1471-2105-13-31 schema:about N3c6d401f05744830a932b61c9b734e80
    2 N3f9acf046f2946ce8a80016f47296b3c
    3 N4ba6088d51de4467bfd2041d99d15a8f
    4 Na35a0e2312b44878a84b582212b02ba2
    5 Ne822c20ee9684df1bff785e81f6f4b35
    6 anzsrc-for:06
    7 anzsrc-for:0604
    8 schema:author N8bfa350a592f427f8c355e883a724f80
    9 schema:citation sg:pub.10.1038/ismej.2010.160
    10 sg:pub.10.1038/ismej.2011.74
    11 schema:datePublished 2012-02-14
    12 schema:datePublishedReg 2012-02-14
    13 schema:description BackgroundIllumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.ResultsPANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.ConclusionsPANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.
    14 schema:genre article
    15 schema:isAccessibleForFree true
    16 schema:isPartOf N3945ebcc45f145e3ad53209ae788f095
    17 Nd6b7209b5c9d48d59466b0882fc03321
    18 sg:journal.1023786
    19 schema:keywords DNA
    20 Illumina sequences
    21 alternative method
    22 amplicons
    23 assemblers
    24 assembly
    25 available tools
    26 basis
    27 benchmarks
    28 billions
    29 community
    30 control library
    31 correction
    32 data
    33 error
    34 error correction
    35 error mask
    36 genes
    37 genomic DNA
    38 incorporation
    39 increase
    40 information
    41 library
    42 loss
    43 low-quality bases
    44 mask
    45 method
    46 microbial communities
    47 mismatch
    48 most errors
    49 negligible loss
    50 number
    51 organisms
    52 paired-end reads
    53 processing
    54 quality information
    55 rRNA gene
    56 reads
    57 scale
    58 sequence
    59 sequence yields
    60 source template
    61 template
    62 tool
    63 upstream processing
    64 yield
    65 schema:name PANDAseq: paired-end assembler for illumina sequences
    66 schema:pagination 31
    67 schema:productId N161894d41eb34f0a8cb0b032b3a1dff8
    68 N68fb8e2389cd464f974a123c64c1b034
    69 Nc12143bae39b42078695169d64188f62
    70 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047017534
    71 https://doi.org/10.1186/1471-2105-13-31
    72 schema:sdDatePublished 2022-08-04T17:00
    73 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    74 schema:sdPublisher Na573b4fb483946e997134f3c3c39d8ff
    75 schema:url https://doi.org/10.1186/1471-2105-13-31
    76 sgo:license sg:explorer/license/
    77 sgo:sdDataset articles
    78 rdf:type schema:ScholarlyArticle
    79 N161894d41eb34f0a8cb0b032b3a1dff8 schema:name pubmed_id
    80 schema:value 22333067
    81 rdf:type schema:PropertyValue
    82 N3945ebcc45f145e3ad53209ae788f095 schema:volumeNumber 13
    83 rdf:type schema:PublicationVolume
    84 N3c6d401f05744830a932b61c9b734e80 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    85 schema:name Software
    86 rdf:type schema:DefinedTerm
    87 N3f9acf046f2946ce8a80016f47296b3c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    88 schema:name Metagenomics
    89 rdf:type schema:DefinedTerm
    90 N4ba6088d51de4467bfd2041d99d15a8f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    91 schema:name Bacteria
    92 rdf:type schema:DefinedTerm
    93 N68fb8e2389cd464f974a123c64c1b034 schema:name doi
    94 schema:value 10.1186/1471-2105-13-31
    95 rdf:type schema:PropertyValue
    96 N8bfa350a592f427f8c355e883a724f80 rdf:first sg:person.0770612035.54
    97 rdf:rest Nb5759ce644454686aeb0359c1ab13e50
    98 N95d8dd5f717f4d928d3282e7de8ccba4 rdf:first sg:person.01320220640.40
    99 rdf:rest N98dcbc488c344cdcba649753c775561d
    100 N98dcbc488c344cdcba649753c775561d rdf:first sg:person.0642727740.54
    101 rdf:rest Nd435829124404071900aab07f5c6b46b
    102 Na35a0e2312b44878a84b582212b02ba2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    103 schema:name RNA, Ribosomal, 16S
    104 rdf:type schema:DefinedTerm
    105 Na573b4fb483946e997134f3c3c39d8ff schema:name Springer Nature - SN SciGraph project
    106 rdf:type schema:Organization
    107 Nb5759ce644454686aeb0359c1ab13e50 rdf:first sg:person.0654641046.01
    108 rdf:rest N95d8dd5f717f4d928d3282e7de8ccba4
    109 Nc12143bae39b42078695169d64188f62 schema:name dimensions_id
    110 schema:value pub.1047017534
    111 rdf:type schema:PropertyValue
    112 Nd435829124404071900aab07f5c6b46b rdf:first sg:person.01030400146.17
    113 rdf:rest rdf:nil
    114 Nd6b7209b5c9d48d59466b0882fc03321 schema:issueNumber 1
    115 rdf:type schema:PublicationIssue
    116 Ne822c20ee9684df1bff785e81f6f4b35 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    117 schema:name RNA, Bacterial
    118 rdf:type schema:DefinedTerm
    119 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    120 schema:name Biological Sciences
    121 rdf:type schema:DefinedTerm
    122 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    123 schema:name Genetics
    124 rdf:type schema:DefinedTerm
    125 sg:journal.1023786 schema:issn 1471-2105
    126 schema:name BMC Bioinformatics
    127 schema:publisher Springer Nature
    128 rdf:type schema:Periodical
    129 sg:person.01030400146.17 schema:affiliation grid-institutes:grid.46078.3d
    130 schema:familyName Neufeld
    131 schema:givenName Josh D
    132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17
    133 rdf:type schema:Person
    134 sg:person.01320220640.40 schema:affiliation grid-institutes:grid.46078.3d
    135 schema:familyName Truszkowski
    136 schema:givenName Jakub M
    137 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
    138 rdf:type schema:Person
    139 sg:person.0642727740.54 schema:affiliation grid-institutes:grid.46078.3d
    140 schema:familyName Brown
    141 schema:givenName Daniel G
    142 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
    143 rdf:type schema:Person
    144 sg:person.0654641046.01 schema:affiliation grid-institutes:grid.46078.3d
    145 schema:familyName Bartram
    146 schema:givenName Andrea K
    147 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01
    148 rdf:type schema:Person
    149 sg:person.0770612035.54 schema:affiliation grid-institutes:grid.46078.3d
    150 schema:familyName Masella
    151 schema:givenName Andre P
    152 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54
    153 rdf:type schema:Person
    154 sg:pub.10.1038/ismej.2010.160 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048299429
    155 https://doi.org/10.1038/ismej.2010.160
    156 rdf:type schema:CreativeWork
    157 sg:pub.10.1038/ismej.2011.74 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012050464
    158 https://doi.org/10.1038/ismej.2011.74
    159 rdf:type schema:CreativeWork
    160 grid-institutes:grid.46078.3d schema:alternateName David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
    161 Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
    162 schema:name David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
    163 Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
    164 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...