Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-09-19

AUTHORS

Mark J Chaisson, Glenn Tesler

ABSTRACT

BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. RESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. CONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads. More... »

PAGES

238-238

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-13-238

DOI

http://dx.doi.org/10.1186/1471-2105-13-238

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1028668057

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/22988817


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA", 
          "id": "http://www.grid.ac/institutes/grid.423340.2", 
          "name": [
            "Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chaisson", 
        "givenName": "Mark J", 
        "id": "sg:person.012610254333.24", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610254333.24"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA", 
          "id": "http://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tesler", 
        "givenName": "Glenn", 
        "id": "sg:person.01305222154.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01305222154.18"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature09379", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045172372", 
          "https://doi.org/10.1038/nature09379"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.1459", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032824809", 
          "https://doi.org/10.1038/nmeth.1459"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.2147", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031949408", 
          "https://doi.org/10.1038/nbt.2147"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nnano.2009.12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006588299", 
          "https://doi.org/10.1038/nnano.2009.12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2009-10-3-r25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049583368", 
          "https://doi.org/10.1186/gb-2009-10-3-r25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2004-5-2-r12", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022585853", 
          "https://doi.org/10.1186/gb-2004-5-2-r12"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ng.437", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035989827", 
          "https://doi.org/10.1038/ng.437"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-39763-2_1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044437393", 
          "https://doi.org/10.1007/978-3-540-39763-2_1"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-09-19", 
    "datePublishedReg": "2012-09-19", 
    "description": "BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.\nRESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.\nCONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-13-238", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "13"
      }
    ], 
    "keywords": [
      "single-molecule sequencing", 
      "molecule sequencing", 
      "single molecule sequencing reads", 
      "high-throughput sequencing", 
      "next-generation sequencing methods", 
      "tens of kilobases", 
      "thousands of bases", 
      "high-throughput datasets", 
      "sequencing projects", 
      "next-generation sequencing", 
      "sequencing reads", 
      "Basic Local Alignment", 
      "sequencing methods", 
      "sequencing", 
      "sequencing errors", 
      "reads", 
      "simulated reads", 
      "genome", 
      "kilobases", 
      "local alignment", 
      "DNA", 
      "divergence", 
      "combinatorial model", 
      "mapability", 
      "high error rates", 
      "basis", 
      "insertion", 
      "alignment method", 
      "thousands", 
      "deletion errors", 
      "alignment", 
      "inference", 
      "recent methods", 
      "BLASR", 
      "mapping accuracy", 
      "dataset", 
      "results", 
      "rate", 
      "model", 
      "tens", 
      "approach", 
      "method", 
      "applications", 
      "refinement", 
      "project", 
      "high accuracy", 
      "error rate", 
      "agreement", 
      "accuracy", 
      "speed", 
      "error", 
      "theory", 
      "successive refinement", 
      "SMS sequencing", 
      "SMS reads", 
      "method BLASR", 
      "bacterial sequencing project", 
      "SMS reads", 
      "molecule sequencing reads"
    ], 
    "name": "Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory", 
    "pagination": "238-238", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1028668057"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-13-238"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "22988817"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-13-238", 
      "https://app.dimensions.ai/details/publication/pub.1028668057"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T18:28", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_578.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-13-238"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-13-238'


 

This table displays all metadata directly associated to this object as RDF triples.

189 TRIPLES      22 PREDICATES      99 URIs      83 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-13-238 schema:about N19430fc65cd54af0983d965ff3ecc05a
2 N23d8ced926484760811194a794a7bd70
3 N2f94885021544c06bcf1821bbb0b3508
4 N3686fee335e24387887af6fcff7f6e83
5 N542c032bd80741b4b295bac616659012
6 Na02d13a22c814a14b19375ad396bd843
7 Nef074f34c9934d86bff2cc573f148fd0
8 anzsrc-for:06
9 anzsrc-for:0604
10 schema:author Nd9a2861c20dc48d698bc8a212b12168e
11 schema:citation sg:pub.10.1007/978-3-540-39763-2_1
12 sg:pub.10.1038/nature09379
13 sg:pub.10.1038/nbt.2147
14 sg:pub.10.1038/ng.437
15 sg:pub.10.1038/nmeth.1459
16 sg:pub.10.1038/nnano.2009.12
17 sg:pub.10.1186/gb-2004-5-2-r12
18 sg:pub.10.1186/gb-2009-10-3-r25
19 schema:datePublished 2012-09-19
20 schema:datePublishedReg 2012-09-19
21 schema:description BACKGROUND: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. RESULTS: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. CONCLUSIONS: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
22 schema:genre article
23 schema:inLanguage en
24 schema:isAccessibleForFree true
25 schema:isPartOf N0dcb96e8606d4c3f9f8037f0d9fa6838
26 N7ab9f0a3c8944c17be6baad9612e605c
27 sg:journal.1023786
28 schema:keywords BLASR
29 Basic Local Alignment
30 DNA
31 SMS reads
32 SMS sequencing
33 accuracy
34 agreement
35 alignment
36 alignment method
37 applications
38 approach
39 bacterial sequencing project
40 basis
41 combinatorial model
42 dataset
43 deletion errors
44 divergence
45 error
46 error rate
47 genome
48 high accuracy
49 high error rates
50 high-throughput datasets
51 high-throughput sequencing
52 inference
53 insertion
54 kilobases
55 local alignment
56 mapability
57 mapping accuracy
58 method
59 method BLASR
60 model
61 molecule sequencing
62 molecule sequencing reads
63 next-generation sequencing
64 next-generation sequencing methods
65 project
66 rate
67 reads
68 recent methods
69 refinement
70 results
71 sequencing
72 sequencing errors
73 sequencing methods
74 sequencing projects
75 sequencing reads
76 simulated reads
77 single molecule sequencing reads
78 single-molecule sequencing
79 speed
80 successive refinement
81 tens
82 tens of kilobases
83 theory
84 thousands
85 thousands of bases
86 schema:name Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
87 schema:pagination 238-238
88 schema:productId N52b33a05576b45149ab382a786db472d
89 Nb141faa745c0423b9cdfa312d9eabd8b
90 Ncbd07cec53444920b5f9cada43549c10
91 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028668057
92 https://doi.org/10.1186/1471-2105-13-238
93 schema:sdDatePublished 2022-01-01T18:28
94 schema:sdLicense https://scigraph.springernature.com/explorer/license/
95 schema:sdPublisher Ndbeb0bb71cf04275b89d360fa6b55715
96 schema:url https://doi.org/10.1186/1471-2105-13-238
97 sgo:license sg:explorer/license/
98 sgo:sdDataset articles
99 rdf:type schema:ScholarlyArticle
100 N0dcb96e8606d4c3f9f8037f0d9fa6838 schema:issueNumber 1
101 rdf:type schema:PublicationIssue
102 N19430fc65cd54af0983d965ff3ecc05a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
103 schema:name Software
104 rdf:type schema:DefinedTerm
105 N23d8ced926484760811194a794a7bd70 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
106 schema:name Genome
107 rdf:type schema:DefinedTerm
108 N2f94885021544c06bcf1821bbb0b3508 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
109 schema:name Sequence Alignment
110 rdf:type schema:DefinedTerm
111 N3686fee335e24387887af6fcff7f6e83 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
112 schema:name Base Sequence
113 rdf:type schema:DefinedTerm
114 N52b33a05576b45149ab382a786db472d schema:name dimensions_id
115 schema:value pub.1028668057
116 rdf:type schema:PropertyValue
117 N542c032bd80741b4b295bac616659012 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
118 schema:name DNA
119 rdf:type schema:DefinedTerm
120 N7ab9f0a3c8944c17be6baad9612e605c schema:volumeNumber 13
121 rdf:type schema:PublicationVolume
122 Na02d13a22c814a14b19375ad396bd843 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
123 schema:name Sequence Analysis, DNA
124 rdf:type schema:DefinedTerm
125 Nb141faa745c0423b9cdfa312d9eabd8b schema:name pubmed_id
126 schema:value 22988817
127 rdf:type schema:PropertyValue
128 Ncbd07cec53444920b5f9cada43549c10 schema:name doi
129 schema:value 10.1186/1471-2105-13-238
130 rdf:type schema:PropertyValue
131 Nd9a2861c20dc48d698bc8a212b12168e rdf:first sg:person.012610254333.24
132 rdf:rest Nfe24684da20f440dac87e9db1512b1e0
133 Ndbeb0bb71cf04275b89d360fa6b55715 schema:name Springer Nature - SN SciGraph project
134 rdf:type schema:Organization
135 Nef074f34c9934d86bff2cc573f148fd0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
136 schema:name Algorithms
137 rdf:type schema:DefinedTerm
138 Nfe24684da20f440dac87e9db1512b1e0 rdf:first sg:person.01305222154.18
139 rdf:rest rdf:nil
140 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
141 schema:name Biological Sciences
142 rdf:type schema:DefinedTerm
143 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
144 schema:name Genetics
145 rdf:type schema:DefinedTerm
146 sg:journal.1023786 schema:issn 1471-2105
147 schema:name BMC Bioinformatics
148 schema:publisher Springer Nature
149 rdf:type schema:Periodical
150 sg:person.012610254333.24 schema:affiliation grid-institutes:grid.423340.2
151 schema:familyName Chaisson
152 schema:givenName Mark J
153 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012610254333.24
154 rdf:type schema:Person
155 sg:person.01305222154.18 schema:affiliation grid-institutes:grid.266100.3
156 schema:familyName Tesler
157 schema:givenName Glenn
158 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01305222154.18
159 rdf:type schema:Person
160 sg:pub.10.1007/978-3-540-39763-2_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044437393
161 https://doi.org/10.1007/978-3-540-39763-2_1
162 rdf:type schema:CreativeWork
163 sg:pub.10.1038/nature09379 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045172372
164 https://doi.org/10.1038/nature09379
165 rdf:type schema:CreativeWork
166 sg:pub.10.1038/nbt.2147 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031949408
167 https://doi.org/10.1038/nbt.2147
168 rdf:type schema:CreativeWork
169 sg:pub.10.1038/ng.437 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035989827
170 https://doi.org/10.1038/ng.437
171 rdf:type schema:CreativeWork
172 sg:pub.10.1038/nmeth.1459 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032824809
173 https://doi.org/10.1038/nmeth.1459
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nnano.2009.12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006588299
176 https://doi.org/10.1038/nnano.2009.12
177 rdf:type schema:CreativeWork
178 sg:pub.10.1186/gb-2004-5-2-r12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022585853
179 https://doi.org/10.1186/gb-2004-5-2-r12
180 rdf:type schema:CreativeWork
181 sg:pub.10.1186/gb-2009-10-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049583368
182 https://doi.org/10.1186/gb-2009-10-3-r25
183 rdf:type schema:CreativeWork
184 grid-institutes:grid.266100.3 schema:alternateName Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA
185 schema:name Department of Mathematics, University of California, San Diego, 9500 Gilman Dr, CA, La Jolla, USA
186 rdf:type schema:Organization
187 grid-institutes:grid.423340.2 schema:alternateName Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA
188 schema:name Department Secondary Analysis, Pacific Biosciences, 1005 Hamilton Rd, CA, Menlo Park, USA
189 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...