Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2013

AUTHORS

Viraj Deshpande , Eric D. K. Fung , Son Pham , Vineet Bafna

ABSTRACT

Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 [1] proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. More... »

PAGES

349-363

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-642-40452-8
978-3-642-40453-5

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-40453-5_27

DOI

http://dx.doi.org/10.1007/978-3-642-40453-5_27

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1003590990


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of California, San Diego", 
          "id": "https://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Department of Computer Science & Engineering, University of California, San Diego, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Deshpande", 
        "givenName": "Viraj", 
        "id": "sg:person.011341561347.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011341561347.63"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California, San Diego", 
          "id": "https://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Bioinformatics Undergraduate Program, Department of Bioengineering, University of California, San Diego, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fung", 
        "givenName": "Eric D. K.", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California, San Diego", 
          "id": "https://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Department of Computer Science & Engineering, University of California, San Diego, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pham", 
        "givenName": "Son", 
        "id": "sg:person.0667766623.43", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0667766623.43"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of California, San Diego", 
          "id": "https://www.grid.ac/institutes/grid.266100.3", 
          "name": [
            "Department of Computer Science & Engineering, University of California, San Diego, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bafna", 
        "givenName": "Vineet", 
        "id": "sg:person.01013646164.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013646164.40"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1073/pnas.171285098", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010138766"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.089532.108", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011404279"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt0413-265", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014432550", 
          "https://doi.org/10.1038/nbt0413-265"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bti310", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015580011"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature02390", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017627596", 
          "https://doi.org/10.1038/nature02390"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature02390", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017627596", 
          "https://doi.org/10.1038/nature02390"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.126953.111", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022021574"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.2288", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024779544", 
          "https://doi.org/10.1038/nbt.2288"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.7088808", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025227180"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-13-238", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028668057", 
          "https://doi.org/10.1186/1471-2105-13-238"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.141515.112", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029788836"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.287.5461.2196", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030783427"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.2280", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033360952", 
          "https://doi.org/10.1038/nbt.2280"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/6.7.2601", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043762986"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.074492.107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051720574"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1089/cmb.1995.2.291", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1059245099"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2013", 
    "datePublishedReg": "2013-01-01", 
    "description": "Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 [1] proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.", 
    "editor": [
      {
        "familyName": "Darling", 
        "givenName": "Aaron", 
        "type": "Person"
      }, 
      {
        "familyName": "Stoye", 
        "givenName": "Jens", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-40453-5_27", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-40452-8", 
        "978-3-642-40453-5"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "name": "Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads", 
    "pagination": "349-363", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-40453-5_27"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "111b1e542357d0a5c6691a00590e7ecffe31a8b4de1c8ae4bb25dc25413fa33c"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1003590990"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-40453-5_27", 
      "https://app.dimensions.ai/details/publication/pub.1003590990"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T18:08", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000245.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-642-40453-5_27"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-40453-5_27'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-40453-5_27'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-40453-5_27'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-40453-5_27'


 

This table displays all metadata directly associated to this object as RDF triples.

141 TRIPLES      23 PREDICATES      42 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-40453-5_27 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Ndac9e249d3fd401abf0e5b2fd25a1ac0
4 schema:citation sg:pub.10.1038/nature02390
5 sg:pub.10.1038/nbt.2280
6 sg:pub.10.1038/nbt.2288
7 sg:pub.10.1038/nbt0413-265
8 sg:pub.10.1186/1471-2105-13-238
9 https://doi.org/10.1073/pnas.171285098
10 https://doi.org/10.1089/cmb.1995.2.291
11 https://doi.org/10.1093/bioinformatics/bti310
12 https://doi.org/10.1093/nar/6.7.2601
13 https://doi.org/10.1101/gr.074492.107
14 https://doi.org/10.1101/gr.089532.108
15 https://doi.org/10.1101/gr.126953.111
16 https://doi.org/10.1101/gr.141515.112
17 https://doi.org/10.1101/gr.7088808
18 https://doi.org/10.1126/science.287.5461.2196
19 schema:datePublished 2013
20 schema:datePublishedReg 2013-01-01
21 schema:description Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 [1] proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats.
22 schema:editor N3cb309bdf49f4a15918d5afceac7a764
23 schema:genre chapter
24 schema:inLanguage en
25 schema:isAccessibleForFree true
26 schema:isPartOf Na11436311ebb49acba9434738d4488e1
27 schema:name Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads
28 schema:pagination 349-363
29 schema:productId N0ca25f866515463cb38b0464522f7e3d
30 N57fd5cfc7c034369b258726163f18d82
31 N5b28fbf356ae445e86892686ad3d59a3
32 schema:publisher N98ce39992a7e4d62818fef8747acfdd1
33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003590990
34 https://doi.org/10.1007/978-3-642-40453-5_27
35 schema:sdDatePublished 2019-04-15T18:08
36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
37 schema:sdPublisher Nf0cc61759a0f42d582c1268389ff7398
38 schema:url http://link.springer.com/10.1007/978-3-642-40453-5_27
39 sgo:license sg:explorer/license/
40 sgo:sdDataset chapters
41 rdf:type schema:Chapter
42 N0ca25f866515463cb38b0464522f7e3d schema:name readcube_id
43 schema:value 111b1e542357d0a5c6691a00590e7ecffe31a8b4de1c8ae4bb25dc25413fa33c
44 rdf:type schema:PropertyValue
45 N199e34addb4545278035cb3b702d11d7 schema:familyName Darling
46 schema:givenName Aaron
47 rdf:type schema:Person
48 N3cb309bdf49f4a15918d5afceac7a764 rdf:first N199e34addb4545278035cb3b702d11d7
49 rdf:rest Nfde9044f941a40fda8e18fc8be8e89f4
50 N5524d27f1f2142df8a5d41f559efa31d rdf:first Nae46c2ea6d764c7cb420f709b1b0e111
51 rdf:rest N891027d95213452696c3c6f892439a9c
52 N57fd5cfc7c034369b258726163f18d82 schema:name dimensions_id
53 schema:value pub.1003590990
54 rdf:type schema:PropertyValue
55 N5b28fbf356ae445e86892686ad3d59a3 schema:name doi
56 schema:value 10.1007/978-3-642-40453-5_27
57 rdf:type schema:PropertyValue
58 N60bec276276b4ffeb847d208a7fb4eec schema:familyName Stoye
59 schema:givenName Jens
60 rdf:type schema:Person
61 N891027d95213452696c3c6f892439a9c rdf:first sg:person.0667766623.43
62 rdf:rest Ne35ba0e6e7bf4892a665e76f8eb4f277
63 N98ce39992a7e4d62818fef8747acfdd1 schema:location Berlin, Heidelberg
64 schema:name Springer Berlin Heidelberg
65 rdf:type schema:Organisation
66 Na11436311ebb49acba9434738d4488e1 schema:isbn 978-3-642-40452-8
67 978-3-642-40453-5
68 schema:name Algorithms in Bioinformatics
69 rdf:type schema:Book
70 Nae46c2ea6d764c7cb420f709b1b0e111 schema:affiliation https://www.grid.ac/institutes/grid.266100.3
71 schema:familyName Fung
72 schema:givenName Eric D. K.
73 rdf:type schema:Person
74 Ndac9e249d3fd401abf0e5b2fd25a1ac0 rdf:first sg:person.011341561347.63
75 rdf:rest N5524d27f1f2142df8a5d41f559efa31d
76 Ne35ba0e6e7bf4892a665e76f8eb4f277 rdf:first sg:person.01013646164.40
77 rdf:rest rdf:nil
78 Nf0cc61759a0f42d582c1268389ff7398 schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 Nfde9044f941a40fda8e18fc8be8e89f4 rdf:first N60bec276276b4ffeb847d208a7fb4eec
81 rdf:rest rdf:nil
82 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
83 schema:name Biological Sciences
84 rdf:type schema:DefinedTerm
85 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
86 schema:name Genetics
87 rdf:type schema:DefinedTerm
88 sg:person.01013646164.40 schema:affiliation https://www.grid.ac/institutes/grid.266100.3
89 schema:familyName Bafna
90 schema:givenName Vineet
91 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01013646164.40
92 rdf:type schema:Person
93 sg:person.011341561347.63 schema:affiliation https://www.grid.ac/institutes/grid.266100.3
94 schema:familyName Deshpande
95 schema:givenName Viraj
96 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011341561347.63
97 rdf:type schema:Person
98 sg:person.0667766623.43 schema:affiliation https://www.grid.ac/institutes/grid.266100.3
99 schema:familyName Pham
100 schema:givenName Son
101 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0667766623.43
102 rdf:type schema:Person
103 sg:pub.10.1038/nature02390 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017627596
104 https://doi.org/10.1038/nature02390
105 rdf:type schema:CreativeWork
106 sg:pub.10.1038/nbt.2280 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033360952
107 https://doi.org/10.1038/nbt.2280
108 rdf:type schema:CreativeWork
109 sg:pub.10.1038/nbt.2288 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024779544
110 https://doi.org/10.1038/nbt.2288
111 rdf:type schema:CreativeWork
112 sg:pub.10.1038/nbt0413-265 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014432550
113 https://doi.org/10.1038/nbt0413-265
114 rdf:type schema:CreativeWork
115 sg:pub.10.1186/1471-2105-13-238 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028668057
116 https://doi.org/10.1186/1471-2105-13-238
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1073/pnas.171285098 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010138766
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1089/cmb.1995.2.291 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059245099
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1093/bioinformatics/bti310 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015580011
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1093/nar/6.7.2601 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043762986
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1101/gr.074492.107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051720574
127 rdf:type schema:CreativeWork
128 https://doi.org/10.1101/gr.089532.108 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011404279
129 rdf:type schema:CreativeWork
130 https://doi.org/10.1101/gr.126953.111 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022021574
131 rdf:type schema:CreativeWork
132 https://doi.org/10.1101/gr.141515.112 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029788836
133 rdf:type schema:CreativeWork
134 https://doi.org/10.1101/gr.7088808 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025227180
135 rdf:type schema:CreativeWork
136 https://doi.org/10.1126/science.287.5461.2196 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030783427
137 rdf:type schema:CreativeWork
138 https://www.grid.ac/institutes/grid.266100.3 schema:alternateName University of California, San Diego
139 schema:name Bioinformatics Undergraduate Program, Department of Bioengineering, University of California, San Diego, CA, USA
140 Department of Computer Science & Engineering, University of California, San Diego, CA, USA
141 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...