PANDAseq: paired-end assembler for illumina sequences View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-12

AUTHORS

Andre P Masella, Andrea K Bartram, Jakub M Truszkowski, Daniel G Brown, Josh D Neufeld

ABSTRACT

BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. More... »

PAGES

31

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-13-31

DOI

http://dx.doi.org/10.1186/1471-2105-13-31

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047017534

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/22333067


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "RNA, Bacterial", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "RNA, Ribosomal, 16S", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Masella", 
        "givenName": "Andre P", 
        "id": "sg:person.0770612035.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bartram", 
        "givenName": "Andrea K", 
        "id": "sg:person.0654641046.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Truszkowski", 
        "givenName": "Jakub M", 
        "id": "sg:person.01320220640.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Biology, University of Waterloo, Waterloo, Ontario, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Neufeld", 
        "givenName": "Josh D", 
        "id": "sg:person.01030400146.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/ismej.2011.74", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012050464", 
          "https://doi.org/10.1038/ismej.2011.74"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btl158", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014668137"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1128/aem.02772-10", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015690506"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.1000080107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019627885"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0022-2836(70)90057-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021169618"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkl889", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029901284"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0011840", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031450281"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkp1137", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037112607"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkn879", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044918953"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0015406", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045070681"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ismej.2010.160", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048299429", 
          "https://doi.org/10.1038/ismej.2010.160"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1082464155", 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-12", 
    "datePublishedReg": "2012-12-01", 
    "description": "BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information.\nRESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.\nCONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over na\u00efve assembly with negligible loss of \"good\" sequence.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2105-13-31", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "13"
      }
    ], 
    "name": "PANDAseq: paired-end assembler for illumina sequences", 
    "pagination": "31", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "dffa0a8ebb74cdda5de6a0bf219d4be6004ddf4bfcb2d16f8f56090ec4b9986b"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "22333067"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-13-31"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1047017534"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-13-31", 
      "https://app.dimensions.ai/details/publication/pub.1047017534"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T19:56", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000507.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186/1471-2105-13-31"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-31'


 

This table displays all metadata directly associated to this object as RDF triples.

154 TRIPLES      21 PREDICATES      46 URIs      26 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-13-31 schema:about N32ecf2df60374072b57cab125f2e36d6
2 N3e6863c3181245b5b77e3597cd464b16
3 N4141a0e0cb6a4606a95c62f976cde803
4 N563a1caa8a5d415d8a321fe4a362e5fa
5 Nfb46d15d80a8406aa1769a8face33383
6 anzsrc-for:06
7 anzsrc-for:0604
8 schema:author N9f2378cfacb3434a8ae2a0ada7797ced
9 schema:citation sg:pub.10.1038/ismej.2010.160
10 sg:pub.10.1038/ismej.2011.74
11 https://app.dimensions.ai/details/publication/pub.1082464155
12 https://doi.org/10.1016/0022-2836(70)90057-4
13 https://doi.org/10.1073/pnas.1000080107
14 https://doi.org/10.1093/bioinformatics/btl158
15 https://doi.org/10.1093/nar/gkl889
16 https://doi.org/10.1093/nar/gkn879
17 https://doi.org/10.1093/nar/gkp1137
18 https://doi.org/10.1128/aem.02772-10
19 https://doi.org/10.1371/journal.pone.0011840
20 https://doi.org/10.1371/journal.pone.0015406
21 schema:datePublished 2012-12
22 schema:datePublishedReg 2012-12-01
23 schema:description BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.
24 schema:genre research_article
25 schema:inLanguage en
26 schema:isAccessibleForFree true
27 schema:isPartOf N31f8f4ba3c2449fd9d843b61c3edd06a
28 N55c1259730ee42fdaee728c06c7c519e
29 sg:journal.1023786
30 schema:name PANDAseq: paired-end assembler for illumina sequences
31 schema:pagination 31
32 schema:productId N256733f09ab54801ab2b543abd077968
33 N5d12b59dbb434d55860b0b6c49136dbb
34 N73a95b032fec4b4ba653af78cf3f408c
35 Na5e85a85c21e46bf89554f2643b680c8
36 Nafb7a367b7794a2fbdb0c14414e69c1f
37 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047017534
38 https://doi.org/10.1186/1471-2105-13-31
39 schema:sdDatePublished 2019-04-10T19:56
40 schema:sdLicense https://scigraph.springernature.com/explorer/license/
41 schema:sdPublisher N8f764fa0fd23446faf87a151cf6015e1
42 schema:url http://link.springer.com/10.1186/1471-2105-13-31
43 sgo:license sg:explorer/license/
44 sgo:sdDataset articles
45 rdf:type schema:ScholarlyArticle
46 N12a9507077e44f2eafc183ec96060238 rdf:first sg:person.0654641046.01
47 rdf:rest Ndef4958efd204edfbcf7ed870a541807
48 N15347bc6fd404a2ca4433a3e32949d3d rdf:first sg:person.0642727740.54
49 rdf:rest N2e7f346b443f4116a268059da0242d67
50 N256733f09ab54801ab2b543abd077968 schema:name doi
51 schema:value 10.1186/1471-2105-13-31
52 rdf:type schema:PropertyValue
53 N2e7f346b443f4116a268059da0242d67 rdf:first sg:person.01030400146.17
54 rdf:rest rdf:nil
55 N31f8f4ba3c2449fd9d843b61c3edd06a schema:volumeNumber 13
56 rdf:type schema:PublicationVolume
57 N32ecf2df60374072b57cab125f2e36d6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
58 schema:name RNA, Ribosomal, 16S
59 rdf:type schema:DefinedTerm
60 N3e6863c3181245b5b77e3597cd464b16 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
61 schema:name Bacteria
62 rdf:type schema:DefinedTerm
63 N4141a0e0cb6a4606a95c62f976cde803 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
64 schema:name Software
65 rdf:type schema:DefinedTerm
66 N55c1259730ee42fdaee728c06c7c519e schema:issueNumber 1
67 rdf:type schema:PublicationIssue
68 N563a1caa8a5d415d8a321fe4a362e5fa schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
69 schema:name RNA, Bacterial
70 rdf:type schema:DefinedTerm
71 N5d12b59dbb434d55860b0b6c49136dbb schema:name nlm_unique_id
72 schema:value 100965194
73 rdf:type schema:PropertyValue
74 N73a95b032fec4b4ba653af78cf3f408c schema:name dimensions_id
75 schema:value pub.1047017534
76 rdf:type schema:PropertyValue
77 N8f764fa0fd23446faf87a151cf6015e1 schema:name Springer Nature - SN SciGraph project
78 rdf:type schema:Organization
79 N9f2378cfacb3434a8ae2a0ada7797ced rdf:first sg:person.0770612035.54
80 rdf:rest N12a9507077e44f2eafc183ec96060238
81 Na5e85a85c21e46bf89554f2643b680c8 schema:name readcube_id
82 schema:value dffa0a8ebb74cdda5de6a0bf219d4be6004ddf4bfcb2d16f8f56090ec4b9986b
83 rdf:type schema:PropertyValue
84 Nafb7a367b7794a2fbdb0c14414e69c1f schema:name pubmed_id
85 schema:value 22333067
86 rdf:type schema:PropertyValue
87 Ndef4958efd204edfbcf7ed870a541807 rdf:first sg:person.01320220640.40
88 rdf:rest N15347bc6fd404a2ca4433a3e32949d3d
89 Nfb46d15d80a8406aa1769a8face33383 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
90 schema:name Metagenomics
91 rdf:type schema:DefinedTerm
92 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
93 schema:name Biological Sciences
94 rdf:type schema:DefinedTerm
95 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
96 schema:name Genetics
97 rdf:type schema:DefinedTerm
98 sg:journal.1023786 schema:issn 1471-2105
99 schema:name BMC Bioinformatics
100 rdf:type schema:Periodical
101 sg:person.01030400146.17 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
102 schema:familyName Neufeld
103 schema:givenName Josh D
104 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01030400146.17
105 rdf:type schema:Person
106 sg:person.01320220640.40 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
107 schema:familyName Truszkowski
108 schema:givenName Jakub M
109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01320220640.40
110 rdf:type schema:Person
111 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
112 schema:familyName Brown
113 schema:givenName Daniel G
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
115 rdf:type schema:Person
116 sg:person.0654641046.01 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
117 schema:familyName Bartram
118 schema:givenName Andrea K
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0654641046.01
120 rdf:type schema:Person
121 sg:person.0770612035.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
122 schema:familyName Masella
123 schema:givenName Andre P
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0770612035.54
125 rdf:type schema:Person
126 sg:pub.10.1038/ismej.2010.160 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048299429
127 https://doi.org/10.1038/ismej.2010.160
128 rdf:type schema:CreativeWork
129 sg:pub.10.1038/ismej.2011.74 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012050464
130 https://doi.org/10.1038/ismej.2011.74
131 rdf:type schema:CreativeWork
132 https://app.dimensions.ai/details/publication/pub.1082464155 schema:CreativeWork
133 https://doi.org/10.1016/0022-2836(70)90057-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021169618
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1073/pnas.1000080107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019627885
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1093/bioinformatics/btl158 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014668137
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1093/nar/gkl889 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029901284
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1093/nar/gkn879 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044918953
142 rdf:type schema:CreativeWork
143 https://doi.org/10.1093/nar/gkp1137 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037112607
144 rdf:type schema:CreativeWork
145 https://doi.org/10.1128/aem.02772-10 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015690506
146 rdf:type schema:CreativeWork
147 https://doi.org/10.1371/journal.pone.0011840 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031450281
148 rdf:type schema:CreativeWork
149 https://doi.org/10.1371/journal.pone.0015406 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045070681
150 rdf:type schema:CreativeWork
151 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
152 schema:name David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
153 Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
154 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...