Read Mapping Algorithms for Single Molecule Sequencing Data View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2008-01-01

AUTHORS

Vladimir Yanovsky , Stephen M. Rumble , Michael Brudno

ABSTRACT

Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing. More... »

PAGES

38-49

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-540-87360-0
978-3-540-87361-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-87361-7_4

DOI

http://dx.doi.org/10.1007/978-3-540-87361-7_4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1003092050


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Science", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Department of Computer Science"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yanovsky", 
        "givenName": "Vladimir", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Computer Science", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Department of Computer Science"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rumble", 
        "givenName": "Stephen M.", 
        "id": "sg:person.0757123576.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0757123576.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Donnelly Centre for Cellular and Biomolecular Research, University of Toronto", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Department of Computer Science", 
            "Donnelly Centre for Cellular and Biomolecular Research, University of Toronto"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2008-01-01", 
    "datePublishedReg": "2008-01-01", 
    "description": "Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.", 
    "editor": [
      {
        "familyName": "Crandall", 
        "givenName": "Keith A.", 
        "type": "Person"
      }, 
      {
        "familyName": "Lagergren", 
        "givenName": "Jens", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-87361-7_4", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-87360-0", 
        "978-3-540-87361-7"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "single molecule sequencing technology", 
      "piece of DNA", 
      "Smith-Waterman algorithm", 
      "millions of reads", 
      "reference genome", 
      "preparation of DNA", 
      "sequencing technologies", 
      "sequencing data", 
      "vectorized instructions", 
      "sequencing methods", 
      "fast implementation", 
      "matching process", 
      "graph representation", 
      "alignment algorithm", 
      "mapping algorithm", 
      "high error rates", 
      "reads", 
      "algorithm", 
      "complete advantage", 
      "optimal alignment", 
      "filtering method", 
      "error rate", 
      "candidate locations", 
      "sequencing", 
      "multiple reads", 
      "DNA", 
      "spaced seeds", 
      "Heliscope", 
      "genome", 
      "technology", 
      "seeds", 
      "implementation", 
      "same location", 
      "representation", 
      "method", 
      "millions", 
      "advantages", 
      "location", 
      "ability", 
      "alignment", 
      "instruction", 
      "order", 
      "pieces", 
      "data", 
      "process", 
      "rate", 
      "preparation", 
      "days", 
      "approach", 
      "paper", 
      "Molecule Sequencing technologies", 
      "novel rapid alignment algorithms", 
      "rapid alignment algorithms", 
      "two-pass Single Molecule Sequencing methods", 
      "Single Molecule Sequencing methods", 
      "Molecule Sequencing methods", 
      "Weighted Sequence Graph (WSG) representation", 
      "Sequence Graph (WSG) representation", 
      "mer filtering methods", 
      "pass sequencing", 
      "Single Molecule Sequencing Data", 
      "Molecule Sequencing Data"
    ], 
    "name": "Read Mapping Algorithms for Single Molecule Sequencing Data", 
    "pagination": "38-49", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1003092050"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-87361-7_4"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-87361-7_4", 
      "https://app.dimensions.ai/details/publication/pub.1003092050"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:49", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_170.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-540-87361-7_4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-87361-7_4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-87361-7_4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-87361-7_4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-87361-7_4'


 

This table displays all metadata directly associated to this object as RDF triples.

144 TRIPLES      23 PREDICATES      87 URIs      80 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-87361-7_4 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N9b0109cd0f8e467ebcd0ed145548ef75
4 schema:datePublished 2008-01-01
5 schema:datePublishedReg 2008-01-01
6 schema:description Single Molecule Sequencing technologies such as the Heliscope simplify the preparation of DNA for sequencing, while sampling millions of reads in a day. Simultaneously, the technology suffers from a significantly higher error rate, ameliorated by the ability to sample multiple reads from the same location. In this paper we develop novel rapid alignment algorithms for two-pass Single Molecule Sequencing methods. We combine the Weighted Sequence Graph (WSG) representation of all optimal and near optimal alignments between the two reads sampled from a piece of DNA with k-mer filtering methods and spaced seeds to quickly generate candidate locations for the reads on the reference genome. We also propose a fast implementation of the Smith-Waterman algorithm using vectorized instructions that significantly speeds up the matching process. Our method combines these approaches in order to build an algorithm that is both fast and accurate, since it is able to take complete advantage of both of the reads sampled during two pass sequencing.
7 schema:editor Naaf6be434cd2452db3416621ebfe0d5d
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Nb682734d4489408c984a5c0256b11d66
12 schema:keywords DNA
13 Heliscope
14 Molecule Sequencing Data
15 Molecule Sequencing methods
16 Molecule Sequencing technologies
17 Sequence Graph (WSG) representation
18 Single Molecule Sequencing Data
19 Single Molecule Sequencing methods
20 Smith-Waterman algorithm
21 Weighted Sequence Graph (WSG) representation
22 ability
23 advantages
24 algorithm
25 alignment
26 alignment algorithm
27 approach
28 candidate locations
29 complete advantage
30 data
31 days
32 error rate
33 fast implementation
34 filtering method
35 genome
36 graph representation
37 high error rates
38 implementation
39 instruction
40 location
41 mapping algorithm
42 matching process
43 mer filtering methods
44 method
45 millions
46 millions of reads
47 multiple reads
48 novel rapid alignment algorithms
49 optimal alignment
50 order
51 paper
52 pass sequencing
53 piece of DNA
54 pieces
55 preparation
56 preparation of DNA
57 process
58 rapid alignment algorithms
59 rate
60 reads
61 reference genome
62 representation
63 same location
64 seeds
65 sequencing
66 sequencing data
67 sequencing methods
68 sequencing technologies
69 single molecule sequencing technology
70 spaced seeds
71 technology
72 two-pass Single Molecule Sequencing methods
73 vectorized instructions
74 schema:name Read Mapping Algorithms for Single Molecule Sequencing Data
75 schema:pagination 38-49
76 schema:productId N5641585d898b4747af0caad2849a2a46
77 Na95417ad39e84e09aec4ddc9a64ea20f
78 schema:publisher N20cbe22e2da54592ba3edb3a9902bc3a
79 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003092050
80 https://doi.org/10.1007/978-3-540-87361-7_4
81 schema:sdDatePublished 2021-11-01T18:49
82 schema:sdLicense https://scigraph.springernature.com/explorer/license/
83 schema:sdPublisher N8b28c9cdc9de46e2a0dbfdd9353daf0b
84 schema:url https://doi.org/10.1007/978-3-540-87361-7_4
85 sgo:license sg:explorer/license/
86 sgo:sdDataset chapters
87 rdf:type schema:Chapter
88 N20cbe22e2da54592ba3edb3a9902bc3a schema:name Springer Nature
89 rdf:type schema:Organisation
90 N4a940bf72a1f467cb6db6be3a931634c schema:familyName Crandall
91 schema:givenName Keith A.
92 rdf:type schema:Person
93 N4bdb1b0779e84e35b78b7b90c2698611 rdf:first sg:person.0757123576.01
94 rdf:rest Ndd4d9bdc4c7c43f185ff8a4b8d7f2089
95 N5641585d898b4747af0caad2849a2a46 schema:name dimensions_id
96 schema:value pub.1003092050
97 rdf:type schema:PropertyValue
98 N7594c5ae460342e5aa72f5bcb135d978 schema:familyName Lagergren
99 schema:givenName Jens
100 rdf:type schema:Person
101 N8b28c9cdc9de46e2a0dbfdd9353daf0b schema:name Springer Nature - SN SciGraph project
102 rdf:type schema:Organization
103 N9b0109cd0f8e467ebcd0ed145548ef75 rdf:first Nbec08c7229a043938d1698c4a43ba1b0
104 rdf:rest N4bdb1b0779e84e35b78b7b90c2698611
105 Na95417ad39e84e09aec4ddc9a64ea20f schema:name doi
106 schema:value 10.1007/978-3-540-87361-7_4
107 rdf:type schema:PropertyValue
108 Naaf6be434cd2452db3416621ebfe0d5d rdf:first N4a940bf72a1f467cb6db6be3a931634c
109 rdf:rest Nf308427bd4f5479690290f4c453badf5
110 Nb682734d4489408c984a5c0256b11d66 schema:isbn 978-3-540-87360-0
111 978-3-540-87361-7
112 schema:name Algorithms in Bioinformatics
113 rdf:type schema:Book
114 Nbec08c7229a043938d1698c4a43ba1b0 schema:affiliation grid-institutes:None
115 schema:familyName Yanovsky
116 schema:givenName Vladimir
117 rdf:type schema:Person
118 Ndd4d9bdc4c7c43f185ff8a4b8d7f2089 rdf:first sg:person.01253563237.25
119 rdf:rest rdf:nil
120 Nf308427bd4f5479690290f4c453badf5 rdf:first N7594c5ae460342e5aa72f5bcb135d978
121 rdf:rest rdf:nil
122 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
123 schema:name Biological Sciences
124 rdf:type schema:DefinedTerm
125 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
126 schema:name Genetics
127 rdf:type schema:DefinedTerm
128 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.17063.33
129 schema:familyName Brudno
130 schema:givenName Michael
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
132 rdf:type schema:Person
133 sg:person.0757123576.01 schema:affiliation grid-institutes:None
134 schema:familyName Rumble
135 schema:givenName Stephen M.
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0757123576.01
137 rdf:type schema:Person
138 grid-institutes:None schema:alternateName Department of Computer Science
139 schema:name Department of Computer Science
140 rdf:type schema:Organization
141 grid-institutes:grid.17063.33 schema:alternateName Donnelly Centre for Cellular and Biomolecular Research, University of Toronto
142 schema:name Department of Computer Science
143 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto
144 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...