PANDAseq: paired-end assembler for illumina sequences View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-12

AUTHORS

Andre P Masella, Andrea K Bartram, Jakub M Truszkowski, Daniel G Brown, Josh D Neufeld

ABSTRACT

BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. More... »

PAGES

31

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-13-31

DOI

http://dx.doi.org/10.1186/1471-2105-13-31

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047017534

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/22333067


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "RNA, Bacterial", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "RNA, Ribosomal, 16S", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Masella", 
        "givenName": "Andre P", 
        "id": "sg:person.0770612035.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bartram", 
        "givenName": "Andrea K", 
        "id": "sg:person.0654641046.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Truszkowski", 
        "givenName": "Jakub M", 
        "id": "sg:person.01320220640.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Neufeld", 
        "givenName": "Josh D", 
        "id": "sg:person.01030400146.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/ismej.2011.74", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012050464", 
          "https://doi.org/10.1038/ismej.2011.74"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btl158", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014668137"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1128/aem.02772-10", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015690506"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.1000080107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019627885"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(70)90057-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021169618"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkl889", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029901284"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0011840", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031450281"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkp1137", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037112607"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkn879", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044918953"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0015406", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045070681"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ismej.2010.160", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048299429", 
          "https://doi.org/10.1038/ismej.2010.160"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1082464155", 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-12", 
    "datePublishedReg": "2012-12-01", 
    "description": "BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.\nRESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.\nCONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over na\u00efve assembly with negligible loss of \"good\" sequence.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2105-13-31", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "13"
      }
    ], 
    "name": "PANDAseq: paired-end assembler for illumina sequences", 
    "pagination": "31", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "dffa0a8ebb74cdda5de6a0bf219d4be6004ddf4bfcb2d16f8f56090ec4b9986b"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "22333067"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-13-31"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1047017534"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-13-31", 
      "https://app.dimensions.ai/details/publication/pub.1047017534"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T19:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000507.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186/1471-2105-13-31"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'


 

This table displays all metadata directly associated to this object as RDF triples.

154 TRIPLES      21 PREDICATES      46 URIs      26 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-13-31 schema:about N761ab78c19cd46c383d4776a30e303fc
2 Na6269d5b3c5a40538042acb84f1015a9
3 Nbf6bb02e210c49e0b95c472def0cb1f4
4 Nc1f29824fad44aeba1ded145aaaf9536
5 Ncfdd145edfe84df9b8ed01c82c7975c7
6 anzsrc-for:06
7 anzsrc-for:0604
8 schema:author Nc70ca26ca57c408999a164780785cf25
9 schema:citation sg:pub.10.1038/ismej.2010.160
10 sg:pub.10.1038/ismej.2011.74
11 https://app.dimensions.ai/details/publication/pub.1082464155
12 https://doi.org/10.1016/0022-2836(70)90057-4
13 https://doi.org/10.1073/pnas.1000080107
14 https://doi.org/10.1093/bioinformatics/btl158
15 https://doi.org/10.1093/nar/gkl889
16 https://doi.org/10.1093/nar/gkn879
17 https://doi.org/10.1093/nar/gkp1137
18 https://doi.org/10.1128/aem.02772-10
19 https://doi.org/10.1371/journal.pone.0011840
20 https://doi.org/10.1371/journal.pone.0015406
21 schema:datePublished 2012-12
22 schema:datePublishedReg 2012-12-01
23 schema:description BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.
24 schema:genre research_article
25 schema:inLanguage en
26 schema:isAccessibleForFree true
27 schema:isPartOf N2df7fc9ea1e24d2191120be879495171
28 N7bfacd85dbd24364a356dc0cf931d94b
29 sg:journal.1023786
30 schema:name PANDAseq: paired-end assembler for illumina sequences
31 schema:pagination 31
32 schema:productId N5967e54042024fafab35101a07d82359
33 N7252075053804913ad051e59067d64e6
34 Na273a584abd34c44a2ca5d7eef527919
35 Nc7cd5ec40b094efcb80530a028f2024d
36 Nec1e085751624f088971d0883340750d
37 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047017534
38 https://doi.org/10.1186/1471-2105-13-31
39 schema:sdDatePublished 2019-04-10T19:56
40 schema:sdLicense https://scigraph.springernature.com/explorer/license/
41 schema:sdPublisher N0d986201353e4aeabe4706b97742c36f
42 schema:url http://link.springer.com/10.1186/1471-2105-13-31
43 sgo:license sg:explorer/license/
44 sgo:sdDataset articles
45 rdf:type schema:ScholarlyArticle
46 N0d986201353e4aeabe4706b97742c36f schema:name Springer Nature - SN SciGraph project
47 rdf:type schema:Organization
48 N2df7fc9ea1e24d2191120be879495171 schema:volumeNumber 13
49 rdf:type schema:PublicationVolume
50 N48cb980327614d0687985f9463398da5 rdf:first sg:person.0642727740.54
51 rdf:rest N7a26d223679342cda870f1d36e67a050
52 N5967e54042024fafab35101a07d82359 schema:name doi
53 schema:value 10.1186/1471-2105-13-31
54 rdf:type schema:PropertyValue
55 N5da3f6c4f081436e99493ce1280bcb87 rdf:first sg:person.0654641046.01
56 rdf:rest Nf607421ce9ac421da5c093527606a0c5
57 N7252075053804913ad051e59067d64e6 schema:name pubmed_id
58 schema:value 22333067
59 rdf:type schema:PropertyValue
60 N761ab78c19cd46c383d4776a30e303fc schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
61 schema:name Software
62 rdf:type schema:DefinedTerm
63 N7a26d223679342cda870f1d36e67a050 rdf:first sg:person.01030400146.17
64 rdf:rest rdf:nil
65 N7bfacd85dbd24364a356dc0cf931d94b schema:issueNumber 1
66 rdf:type schema:PublicationIssue
67 Na273a584abd34c44a2ca5d7eef527919 schema:name readcube_id
68 schema:value dffa0a8ebb74cdda5de6a0bf219d4be6004ddf4bfcb2d16f8f56090ec4b9986b
69 rdf:type schema:PropertyValue
70 Na6269d5b3c5a40538042acb84f1015a9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
71 schema:name RNA, Bacterial
72 rdf:type schema:DefinedTerm
73 Nbf6bb02e210c49e0b95c472def0cb1f4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
74 schema:name RNA, Ribosomal, 16S
75 rdf:type schema:DefinedTerm
76 Nc1f29824fad44aeba1ded145aaaf9536 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
77 schema:name Bacteria
78 rdf:type schema:DefinedTerm
79 Nc70ca26ca57c408999a164780785cf25 rdf:first sg:person.0770612035.54
80 rdf:rest N5da3f6c4f081436e99493ce1280bcb87
81 Nc7cd5ec40b094efcb80530a028f2024d schema:name dimensions_id
82 schema:value pub.1047017534
83 rdf:type schema:PropertyValue
84 Ncfdd145edfe84df9b8ed01c82c7975c7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
85 schema:name Metagenomics
86 rdf:type schema:DefinedTerm
87 Nec1e085751624f088971d0883340750d schema:name nlm_unique_id
88 schema:value 100965194
89 rdf:type schema:PropertyValue
90 Nf607421ce9ac421da5c093527606a0c5 rdf:first sg:person.01320220640.40
91 rdf:rest N48cb980327614d0687985f9463398da5
92 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
93 schema:name Biological Sciences
94 rdf:type schema:DefinedTerm
95 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
96 schema:name Genetics
97 rdf:type schema:DefinedTerm
98 sg:journal.1023786 schema:issn 1471-2105
99 schema:name BMC Bioinformatics
100 rdf:type schema:Periodical
101 sg:person.01030400146.17 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
102 schema:familyName Neufeld
103 schema:givenName Josh D
104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17
105 rdf:type schema:Person
106 sg:person.01320220640.40 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
107 schema:familyName Truszkowski
108 schema:givenName Jakub M
109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
110 rdf:type schema:Person
111 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
112 schema:familyName Brown
113 schema:givenName Daniel G
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
115 rdf:type schema:Person
116 sg:person.0654641046.01 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
117 schema:familyName Bartram
118 schema:givenName Andrea K
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01
120 rdf:type schema:Person
121 sg:person.0770612035.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
122 schema:familyName Masella
123 schema:givenName Andre P
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54
125 rdf:type schema:Person
126 sg:pub.10.1038/ismej.2010.160 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048299429
127 https://doi.org/10.1038/ismej.2010.160
128 rdf:type schema:CreativeWork
129 sg:pub.10.1038/ismej.2011.74 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012050464
130 https://doi.org/10.1038/ismej.2011.74
131 rdf:type schema:CreativeWork
132 https://app.dimensions.ai/details/publication/pub.1082464155 schema:CreativeWork
133 https://doi.org/10.1016/0022-2836(70)90057-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021169618
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1073/pnas.1000080107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019627885
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1093/bioinformatics/btl158 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014668137
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1093/nar/gkl889 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029901284
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1093/nar/gkn879 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044918953
142 rdf:type schema:CreativeWork
143 https://doi.org/10.1093/nar/gkp1137 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037112607
144 rdf:type schema:CreativeWork
145 https://doi.org/10.1128/aem.02772-10 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015690506
146 rdf:type schema:CreativeWork
147 https://doi.org/10.1371/journal.pone.0011840 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031450281
148 rdf:type schema:CreativeWork
149 https://doi.org/10.1371/journal.pone.0015406 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045070681
150 rdf:type schema:CreativeWork
151 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
152 schema:name David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
153 Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
154 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...