Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-09-19

AUTHORS

Mark J Chaisson, Glenn Tesler

ABSTRACT

BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. RESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. CONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads. More... »

PAGES

238-238

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-13-238

DOI

http://dx.doi.org/10.1186/1471-2105-13-238

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1028668057

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/22988817


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA", 
          "id": "http://www.grid.ac/institutes/grid.423340.2", 
          "name": [
            "Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chaisson", 
        "givenName": "Mark J", 
        "id": "sg:person.012610254333.24", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610254333.24"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA", 
          "id": "http://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tesler", 
        "givenName": "Glenn", 
        "id": "sg:person.01305222154.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01305222154.18"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature09379", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045172372", 
          "https://doi.org/10.1038/nature09379"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.1459", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032824809", 
          "https://doi.org/10.1038/nmeth.1459"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.2147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031949408", 
          "https://doi.org/10.1038/nbt.2147"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nnano.2009.12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006588299", 
          "https://doi.org/10.1038/nnano.2009.12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2009-10-3-r25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049583368", 
          "https://doi.org/10.1186/gb-2009-10-3-r25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2004-5-2-r12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022585853", 
          "https://doi.org/10.1186/gb-2004-5-2-r12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng.437", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035989827", 
          "https://doi.org/10.1038/ng.437"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-39763-2_1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044437393", 
          "https://doi.org/10.1007/978-3-540-39763-2_1"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-09-19", 
    "datePublishedReg": "2012-09-19", 
    "description": "BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.\nRESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.\nCONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-13-238", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "13"
      }
    ], 
    "keywords": [
      "single-molecule sequencing", 
      "molecule sequencing", 
      "single molecule sequencing reads", 
      "high-throughput sequencing", 
      "next-generation sequencing methods", 
      "tens of kilobases", 
      "thousands of bases", 
      "high-throughput datasets", 
      "sequencing projects", 
      "next-generation sequencing", 
      "sequencing reads", 
      "Basic Local Alignment", 
      "sequencing methods", 
      "sequencing", 
      "sequencing errors", 
      "reads", 
      "simulated reads", 
      "genome", 
      "kilobases", 
      "local alignment", 
      "DNA", 
      "divergence", 
      "combinatorial model", 
      "mapability", 
      "high error rates", 
      "basis", 
      "insertion", 
      "alignment method", 
      "thousands", 
      "deletion errors", 
      "alignment", 
      "inference", 
      "recent methods", 
      "BLASR", 
      "mapping accuracy", 
      "dataset", 
      "results", 
      "rate", 
      "model", 
      "tens", 
      "approach", 
      "method", 
      "applications", 
      "refinement", 
      "project", 
      "high accuracy", 
      "error rate", 
      "agreement", 
      "accuracy", 
      "speed", 
      "error", 
      "theory", 
      "successive refinement", 
      "SMS sequencing", 
      "SMS reads", 
      "method BLASR", 
      "bacterial sequencing project", 
      "SMS reads", 
      "molecule sequencing reads"
    ], 
    "name": "Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory", 
    "pagination": "238-238", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1028668057"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-13-238"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "22988817"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-13-238", 
      "https://app.dimensions.ai/details/publication/pub.1028668057"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T18:28", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_578.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-13-238"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'


 

This table displays all metadata directly associated to this object as RDF triples.

189 TRIPLES      22 PREDICATES      99 URIs      83 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-13-238 schema:about N13697f634990411f8f5b74cedfb378ef
2 N178a128d0f6a47699bc6e9dd6eb8af46
3 N2c8608f6e81f41139a9d80f3ac068e2c
4 N487ccaf3faee481aac36ce7a494692fe
5 Nbc3fcdd5a48f4020989a7986fe5b3ae5
6 Nc22d0ab57b7b481485ed1a443b90986c
7 Nfb56ecabee5043cf8bb8ac446f6e05e7
8 anzsrc-for:06
9 anzsrc-for:0604
10 schema:author N5149f1c5b084406baec61153d3955cf4
11 schema:citation sg:pub.10.1007/978-3-540-39763-2_1
12 sg:pub.10.1038/nature09379
13 sg:pub.10.1038/nbt.2147
14 sg:pub.10.1038/ng.437
15 sg:pub.10.1038/nmeth.1459
16 sg:pub.10.1038/nnano.2009.12
17 sg:pub.10.1186/gb-2004-5-2-r12
18 sg:pub.10.1186/gb-2009-10-3-r25
19 schema:datePublished 2012-09-19
20 schema:datePublishedReg 2012-09-19
21 schema:description BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. RESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. CONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
22 schema:genre article
23 schema:inLanguage en
24 schema:isAccessibleForFree true
25 schema:isPartOf N25ad79742b9e449997d5efb56fe651a2
26 N324729142eec49199f867396e56b352a
27 sg:journal.1023786
28 schema:keywords BLASR
29 Basic Local Alignment
30 DNA
31 SMS reads
32 SMS sequencing
33 accuracy
34 agreement
35 alignment
36 alignment method
37 applications
38 approach
39 bacterial sequencing project
40 basis
41 combinatorial model
42 dataset
43 deletion errors
44 divergence
45 error
46 error rate
47 genome
48 high accuracy
49 high error rates
50 high-throughput datasets
51 high-throughput sequencing
52 inference
53 insertion
54 kilobases
55 local alignment
56 mapability
57 mapping accuracy
58 method
59 method BLASR
60 model
61 molecule sequencing
62 molecule sequencing reads
63 next-generation sequencing
64 next-generation sequencing methods
65 project
66 rate
67 reads
68 recent methods
69 refinement
70 results
71 sequencing
72 sequencing errors
73 sequencing methods
74 sequencing projects
75 sequencing reads
76 simulated reads
77 single molecule sequencing reads
78 single-molecule sequencing
79 speed
80 successive refinement
81 tens
82 tens of kilobases
83 theory
84 thousands
85 thousands of bases
86 schema:name Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
87 schema:pagination 238-238
88 schema:productId N375c004f0d1445aeba978796e0a2e7d2
89 N591f57dcaa5148799d0d632e297759a3
90 Na1ffd5c70bef47d9a23fba73d3da099b
91 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028668057
92 https://doi.org/10.1186/1471-2105-13-238
93 schema:sdDatePublished 2022-01-01T18:28
94 schema:sdLicense https://scigraph.springernature.com/explorer/license/
95 schema:sdPublisher Nb9e0dad91db94a709545a87a0dcc9ffb
96 schema:url https://doi.org/10.1186/1471-2105-13-238
97 sgo:license sg:explorer/license/
98 sgo:sdDataset articles
99 rdf:type schema:ScholarlyArticle
100 N049c034ce36f4c2b82be0c94e473b1f9 rdf:first sg:person.01305222154.18
101 rdf:rest rdf:nil
102 N13697f634990411f8f5b74cedfb378ef schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
103 schema:name Base Sequence
104 rdf:type schema:DefinedTerm
105 N178a128d0f6a47699bc6e9dd6eb8af46 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
106 schema:name Sequence Analysis, DNA
107 rdf:type schema:DefinedTerm
108 N25ad79742b9e449997d5efb56fe651a2 schema:volumeNumber 13
109 rdf:type schema:PublicationVolume
110 N2c8608f6e81f41139a9d80f3ac068e2c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name DNA
112 rdf:type schema:DefinedTerm
113 N324729142eec49199f867396e56b352a schema:issueNumber 1
114 rdf:type schema:PublicationIssue
115 N375c004f0d1445aeba978796e0a2e7d2 schema:name pubmed_id
116 schema:value 22988817
117 rdf:type schema:PropertyValue
118 N487ccaf3faee481aac36ce7a494692fe schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
119 schema:name Sequence Alignment
120 rdf:type schema:DefinedTerm
121 N5149f1c5b084406baec61153d3955cf4 rdf:first sg:person.012610254333.24
122 rdf:rest N049c034ce36f4c2b82be0c94e473b1f9
123 N591f57dcaa5148799d0d632e297759a3 schema:name dimensions_id
124 schema:value pub.1028668057
125 rdf:type schema:PropertyValue
126 Na1ffd5c70bef47d9a23fba73d3da099b schema:name doi
127 schema:value 10.1186/1471-2105-13-238
128 rdf:type schema:PropertyValue
129 Nb9e0dad91db94a709545a87a0dcc9ffb schema:name Springer Nature - SN SciGraph project
130 rdf:type schema:Organization
131 Nbc3fcdd5a48f4020989a7986fe5b3ae5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
132 schema:name Software
133 rdf:type schema:DefinedTerm
134 Nc22d0ab57b7b481485ed1a443b90986c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
135 schema:name Genome
136 rdf:type schema:DefinedTerm
137 Nfb56ecabee5043cf8bb8ac446f6e05e7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
138 schema:name Algorithms
139 rdf:type schema:DefinedTerm
140 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
141 schema:name Biological Sciences
142 rdf:type schema:DefinedTerm
143 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
144 schema:name Genetics
145 rdf:type schema:DefinedTerm
146 sg:journal.1023786 schema:issn 1471-2105
147 schema:name BMC Bioinformatics
148 schema:publisher Springer Nature
149 rdf:type schema:Periodical
150 sg:person.012610254333.24 schema:affiliation grid-institutes:grid.423340.2
151 schema:familyName Chaisson
152 schema:givenName Mark J
153 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610254333.24
154 rdf:type schema:Person
155 sg:person.01305222154.18 schema:affiliation grid-institutes:grid.266100.3
156 schema:familyName Tesler
157 schema:givenName Glenn
158 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01305222154.18
159 rdf:type schema:Person
160 sg:pub.10.1007/978-3-540-39763-2_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044437393
161 https://doi.org/10.1007/978-3-540-39763-2_1
162 rdf:type schema:CreativeWork
163 sg:pub.10.1038/nature09379 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045172372
164 https://doi.org/10.1038/nature09379
165 rdf:type schema:CreativeWork
166 sg:pub.10.1038/nbt.2147 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031949408
167 https://doi.org/10.1038/nbt.2147
168 rdf:type schema:CreativeWork
169 sg:pub.10.1038/ng.437 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035989827
170 https://doi.org/10.1038/ng.437
171 rdf:type schema:CreativeWork
172 sg:pub.10.1038/nmeth.1459 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032824809
173 https://doi.org/10.1038/nmeth.1459
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nnano.2009.12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006588299
176 https://doi.org/10.1038/nnano.2009.12
177 rdf:type schema:CreativeWork
178 sg:pub.10.1186/gb-2004-5-2-r12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022585853
179 https://doi.org/10.1186/gb-2004-5-2-r12
180 rdf:type schema:CreativeWork
181 sg:pub.10.1186/gb-2009-10-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049583368
182 https://doi.org/10.1186/gb-2009-10-3-r25
183 rdf:type schema:CreativeWork
184 grid-institutes:grid.266100.3 schema:alternateName Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA
185 schema:name Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA
186 rdf:type schema:Organization
187 grid-institutes:grid.423340.2 schema:alternateName Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA
188 schema:name Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA
189 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...