Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2008-01-01

AUTHORS

Paul Medvedev , Michael Brudno

ABSTRACT

Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs. More... »

PAGES

50-64

Book

TITLE

Research in Computational Molecular Biology

ISBN

978-3-540-78838-6
978-3-540-78839-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-78839-3_5

DOI

http://dx.doi.org/10.1007/978-3-540-78839-3_5

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047053373


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Computer Science, University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Medvedev", 
        "givenName": "Paul", 
        "id": "sg:person.0722365100.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722365100.46"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "Department of Computer Science, University of Toronto, Canada", 
            "Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2008-01-01", 
    "datePublishedReg": "2008-01-01", 
    "description": "Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.", 
    "editor": [
      {
        "familyName": "Vingron", 
        "givenName": "Martin", 
        "type": "Person"
      }, 
      {
        "familyName": "Wong", 
        "givenName": "Limsoon", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-78839-3_5", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-78838-6", 
        "978-3-540-78839-3"
      ], 
      "name": "Research in Computational Molecular Biology", 
      "type": "Book"
    }, 
    "keywords": [
      "mate-pair data", 
      "whole genome shotgun assembly", 
      "next generation sequencing technology", 
      "generation sequencing technology", 
      "short DNA sequences", 
      "genome assembly", 
      "reference genome", 
      "shotgun assembly", 
      "DNA sequences", 
      "sequencing technologies", 
      "short reads", 
      "longer contigs", 
      "contigs", 
      "genome", 
      "read data", 
      "reads", 
      "network flow-based algorithm", 
      "assembly", 
      "repeats", 
      "copy counts", 
      "coli", 
      "NGS", 
      "sequence", 
      "high coverage", 
      "flow-based algorithm", 
      "millions", 
      "individuals", 
      "data", 
      "technology", 
      "counts", 
      "order", 
      "advantages", 
      "coverage", 
      "high accuracy", 
      "accuracy", 
      "algorithm", 
      "paper", 
      "second algorithm", 
      "ab initio genome assembly", 
      "initio genome assembly", 
      "novel network flow-based algorithm", 
      "Ab Initio Whole Genome Shotgun Assembly", 
      "Initio Whole Genome Shotgun Assembly", 
      "Genome Shotgun Assembly", 
      "Mated Short Reads"
    ], 
    "name": "Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads", 
    "pagination": "50-64", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1047053373"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-78839-3_5"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-78839-3_5", 
      "https://app.dimensions.ai/details/publication/pub.1047053373"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2021-11-01T18:55", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/chapter/chapter_310.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-540-78839-3_5"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-78839-3_5'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-78839-3_5'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-78839-3_5'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-78839-3_5'


 

This table displays all metadata directly associated to this object as RDF triples.

119 TRIPLES      23 PREDICATES      70 URIs      63 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-78839-3_5 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Nc7f19c7b46d14e06bce91fb01b46e9f6
4 schema:datePublished 2008-01-01
5 schema:datePublishedReg 2008-01-01
6 schema:description Next Generation Sequencing (NGS) technologies are capable of reading millions of short DNA sequences both quickly and cheaply. While these technologies are already being used for resequencing individuals once a reference genome exists, it has not been shown if it is possible to use them for ab initio genome assembly. In this paper, we give a novel network flow-based algorithm that, by taking advantage of the high coverage provided by NGS, accurately estimates the copy counts of repeats in a genome. We also give a second algorithm that combines the predicted copy-counts with mate-pair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from E. Coli and predict copy-counts with extremely high accuracy, while assembling long contigs.
7 schema:editor N7bf63c6508de456a8b211b9e24636f93
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N23027d5d405b4037873664b31d388026
12 schema:keywords Ab Initio Whole Genome Shotgun Assembly
13 DNA sequences
14 Genome Shotgun Assembly
15 Initio Whole Genome Shotgun Assembly
16 Mated Short Reads
17 NGS
18 ab initio genome assembly
19 accuracy
20 advantages
21 algorithm
22 assembly
23 coli
24 contigs
25 copy counts
26 counts
27 coverage
28 data
29 flow-based algorithm
30 generation sequencing technology
31 genome
32 genome assembly
33 high accuracy
34 high coverage
35 individuals
36 initio genome assembly
37 longer contigs
38 mate-pair data
39 millions
40 network flow-based algorithm
41 next generation sequencing technology
42 novel network flow-based algorithm
43 order
44 paper
45 read data
46 reads
47 reference genome
48 repeats
49 second algorithm
50 sequence
51 sequencing technologies
52 short DNA sequences
53 short reads
54 shotgun assembly
55 technology
56 whole genome shotgun assembly
57 schema:name Ab Initio Whole Genome Shotgun Assembly with Mated Short Reads
58 schema:pagination 50-64
59 schema:productId N04e131632f9d4cb58bd2183e6c59689e
60 Nc4d356b8eb434ed8b130cfb3fd8a657f
61 schema:publisher N4780edca4a0a4c248bb7d3021b2ebc25
62 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047053373
63 https://doi.org/10.1007/978-3-540-78839-3_5
64 schema:sdDatePublished 2021-11-01T18:55
65 schema:sdLicense https://scigraph.springernature.com/explorer/license/
66 schema:sdPublisher N063e8c65f5f24989b09b114bef66f575
67 schema:url https://doi.org/10.1007/978-3-540-78839-3_5
68 sgo:license sg:explorer/license/
69 sgo:sdDataset chapters
70 rdf:type schema:Chapter
71 N04e131632f9d4cb58bd2183e6c59689e schema:name doi
72 schema:value 10.1007/978-3-540-78839-3_5
73 rdf:type schema:PropertyValue
74 N063e8c65f5f24989b09b114bef66f575 schema:name Springer Nature - SN SciGraph project
75 rdf:type schema:Organization
76 N23027d5d405b4037873664b31d388026 schema:isbn 978-3-540-78838-6
77 978-3-540-78839-3
78 schema:name Research in Computational Molecular Biology
79 rdf:type schema:Book
80 N360061b314f74a2b8be220c6fb7592d1 schema:familyName Wong
81 schema:givenName Limsoon
82 rdf:type schema:Person
83 N4780edca4a0a4c248bb7d3021b2ebc25 schema:name Springer Nature
84 rdf:type schema:Organisation
85 N50319a2ade0f4a54b062a89d5e6b5cec rdf:first N360061b314f74a2b8be220c6fb7592d1
86 rdf:rest rdf:nil
87 N7bf63c6508de456a8b211b9e24636f93 rdf:first Nda6b4cd52d1847b584cb031b9f6da5ec
88 rdf:rest N50319a2ade0f4a54b062a89d5e6b5cec
89 Na2afecbc918143d7bd5593830fefe75f rdf:first sg:person.01253563237.25
90 rdf:rest rdf:nil
91 Nc4d356b8eb434ed8b130cfb3fd8a657f schema:name dimensions_id
92 schema:value pub.1047053373
93 rdf:type schema:PropertyValue
94 Nc7f19c7b46d14e06bce91fb01b46e9f6 rdf:first sg:person.0722365100.46
95 rdf:rest Na2afecbc918143d7bd5593830fefe75f
96 Nda6b4cd52d1847b584cb031b9f6da5ec schema:familyName Vingron
97 schema:givenName Martin
98 rdf:type schema:Person
99 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
100 schema:name Biological Sciences
101 rdf:type schema:DefinedTerm
102 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
103 schema:name Genetics
104 rdf:type schema:DefinedTerm
105 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.17063.33
106 schema:familyName Brudno
107 schema:givenName Michael
108 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
109 rdf:type schema:Person
110 sg:person.0722365100.46 schema:affiliation grid-institutes:grid.17063.33
111 schema:familyName Medvedev
112 schema:givenName Paul
113 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722365100.46
114 rdf:type schema:Person
115 grid-institutes:grid.17063.33 schema:alternateName Department of Computer Science, University of Toronto, Canada
116 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
117 schema:name Department of Computer Science, University of Toronto, Canada
118 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Canada
119 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...