Computability of Models for Sequence Assembly View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2007-01-01

AUTHORS

Paul Medvedev , Konstantinos Georgiou , Gene Myers , Michael Brudno

ABSTRACT

Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest double-stranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly. More... »

PAGES

289-301

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-540-74125-1
978-3-540-74126-8

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-74126-8_27

DOI

http://dx.doi.org/10.1007/978-3-540-74126-8_27

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1008956353


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computation Theory and Mathematics", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Medvedev", 
        "givenName": "Paul", 
        "id": "sg:person.0722365100.46", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722365100.46"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Georgiou", 
        "givenName": "Konstantinos", 
        "id": "sg:person.010624011446.02", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010624011446.02"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Janelia Farms, Howard Hughes Medical Institute, USA", 
          "id": "http://www.grid.ac/institutes/grid.413575.1", 
          "name": [
            "Janelia Farms, Howard Hughes Medical Institute, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Myers", 
        "givenName": "Gene", 
        "id": "sg:person.0761445171.42", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0761445171.42"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Toronto, Canada", 
          "id": "http://www.grid.ac/institutes/grid.17063.33", 
          "name": [
            "University of Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2007-01-01", 
    "datePublishedReg": "2007-01-01", 
    "description": "Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest double-stranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.", 
    "editor": [
      {
        "familyName": "Giancarlo", 
        "givenName": "Raffaele", 
        "type": "Person"
      }, 
      {
        "familyName": "Hannenhalli", 
        "givenName": "Sridhar", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-74126-8_27", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-74125-1", 
        "978-3-540-74126-8"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "keywords": [
      "optimal polynomial time algorithm", 
      "NP-hardness", 
      "polynomial time algorithm", 
      "graph-theoretic model", 
      "Chinese postman problem", 
      "graph algorithms", 
      "computational hardness", 
      "de Bruijn graphs", 
      "time algorithm", 
      "postman problem", 
      "assembly paradigm", 
      "overlap graph", 
      "algorithm", 
      "string graph", 
      "Bruijn graph", 
      "graph", 
      "sequence assembly", 
      "theoretical results", 
      "bidirected graphs", 
      "second result", 
      "NP", 
      "flow technique", 
      "different models", 
      "short read assemblies", 
      "read assemblies", 
      "complexity", 
      "practical method", 
      "computability", 
      "earlier results", 
      "model", 
      "first part", 
      "paradigm", 
      "set", 
      "problem", 
      "applications", 
      "technique", 
      "genome assembly", 
      "results", 
      "words", 
      "long words", 
      "knowledge", 
      "method", 
      "sequence", 
      "part", 
      "forefront", 
      "assembly", 
      "DNA sequences", 
      "hardness", 
      "hybridization", 
      "DNA", 
      "popular graph-theoretic sequence assembly paradigms", 
      "graph-theoretic sequence assembly paradigms", 
      "sequence assembly paradigms", 
      "Computability of Models"
    ], 
    "name": "Computability of Models for Sequence Assembly", 
    "pagination": "289-301", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1008956353"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-74126-8_27"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-74126-8_27", 
      "https://app.dimensions.ai/details/publication/pub.1008956353"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:07", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_12.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-540-74126-8_27"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-74126-8_27'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-74126-8_27'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-74126-8_27'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-74126-8_27'


 

This table displays all metadata directly associated to this object as RDF triples.

143 TRIPLES      23 PREDICATES      79 URIs      72 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-74126-8_27 schema:about anzsrc-for:08
2 anzsrc-for:0802
3 schema:author N5b3ea4bb80aa4cb49d6464afd904d3ad
4 schema:datePublished 2007-01-01
5 schema:datePublishedReg 2007-01-01
6 schema:description Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest double-stranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.
7 schema:editor Nb9184de0f2d9481eafeffd8ad2fd18bf
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf Nb2bed30fd4d548669bdfed7fe8dd4e70
12 schema:keywords Bruijn graph
13 Chinese postman problem
14 Computability of Models
15 DNA
16 DNA sequences
17 NP
18 NP-hardness
19 algorithm
20 applications
21 assembly
22 assembly paradigm
23 bidirected graphs
24 complexity
25 computability
26 computational hardness
27 de Bruijn graphs
28 different models
29 earlier results
30 first part
31 flow technique
32 forefront
33 genome assembly
34 graph
35 graph algorithms
36 graph-theoretic model
37 graph-theoretic sequence assembly paradigms
38 hardness
39 hybridization
40 knowledge
41 long words
42 method
43 model
44 optimal polynomial time algorithm
45 overlap graph
46 paradigm
47 part
48 polynomial time algorithm
49 popular graph-theoretic sequence assembly paradigms
50 postman problem
51 practical method
52 problem
53 read assemblies
54 results
55 second result
56 sequence
57 sequence assembly
58 sequence assembly paradigms
59 set
60 short read assemblies
61 string graph
62 technique
63 theoretical results
64 time algorithm
65 words
66 schema:name Computability of Models for Sequence Assembly
67 schema:pagination 289-301
68 schema:productId N0e38a15a6dbd46a5bf49af6d97752e2d
69 N6f4fb74a054a4e69b6aab7be3a8ff4e9
70 schema:publisher N6e8b57a89d2549349cdd968f74c2300e
71 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008956353
72 https://doi.org/10.1007/978-3-540-74126-8_27
73 schema:sdDatePublished 2022-01-01T19:07
74 schema:sdLicense https://scigraph.springernature.com/explorer/license/
75 schema:sdPublisher N2a710193faa54aee8a53fd51deb7a732
76 schema:url https://doi.org/10.1007/978-3-540-74126-8_27
77 sgo:license sg:explorer/license/
78 sgo:sdDataset chapters
79 rdf:type schema:Chapter
80 N0e38a15a6dbd46a5bf49af6d97752e2d schema:name doi
81 schema:value 10.1007/978-3-540-74126-8_27
82 rdf:type schema:PropertyValue
83 N266e8148b18b446784907d5cce67efaa rdf:first sg:person.010624011446.02
84 rdf:rest Nd37efd43e1a44020a4972ee33e6e8ff9
85 N2a710193faa54aee8a53fd51deb7a732 schema:name Springer Nature - SN SciGraph project
86 rdf:type schema:Organization
87 N4fe28885c9c248f7b797efe9c9dadc51 schema:familyName Giancarlo
88 schema:givenName Raffaele
89 rdf:type schema:Person
90 N5b3ea4bb80aa4cb49d6464afd904d3ad rdf:first sg:person.0722365100.46
91 rdf:rest N266e8148b18b446784907d5cce67efaa
92 N6e8b57a89d2549349cdd968f74c2300e schema:name Springer Nature
93 rdf:type schema:Organisation
94 N6f4fb74a054a4e69b6aab7be3a8ff4e9 schema:name dimensions_id
95 schema:value pub.1008956353
96 rdf:type schema:PropertyValue
97 Nb2bed30fd4d548669bdfed7fe8dd4e70 schema:isbn 978-3-540-74125-1
98 978-3-540-74126-8
99 schema:name Algorithms in Bioinformatics
100 rdf:type schema:Book
101 Nb83107ab7371473ebce653f23d5ba140 schema:familyName Hannenhalli
102 schema:givenName Sridhar
103 rdf:type schema:Person
104 Nb9184de0f2d9481eafeffd8ad2fd18bf rdf:first N4fe28885c9c248f7b797efe9c9dadc51
105 rdf:rest Nc2247ce2ab7c4144bf4402346fc6ec5a
106 Nc2247ce2ab7c4144bf4402346fc6ec5a rdf:first Nb83107ab7371473ebce653f23d5ba140
107 rdf:rest rdf:nil
108 Nd37efd43e1a44020a4972ee33e6e8ff9 rdf:first sg:person.0761445171.42
109 rdf:rest Nf131e0be2bca451794a1596ec0a45c45
110 Nf131e0be2bca451794a1596ec0a45c45 rdf:first sg:person.01253563237.25
111 rdf:rest rdf:nil
112 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
113 schema:name Information and Computing Sciences
114 rdf:type schema:DefinedTerm
115 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
116 schema:name Computation Theory and Mathematics
117 rdf:type schema:DefinedTerm
118 sg:person.010624011446.02 schema:affiliation grid-institutes:grid.17063.33
119 schema:familyName Georgiou
120 schema:givenName Konstantinos
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010624011446.02
122 rdf:type schema:Person
123 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.17063.33
124 schema:familyName Brudno
125 schema:givenName Michael
126 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
127 rdf:type schema:Person
128 sg:person.0722365100.46 schema:affiliation grid-institutes:grid.17063.33
129 schema:familyName Medvedev
130 schema:givenName Paul
131 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0722365100.46
132 rdf:type schema:Person
133 sg:person.0761445171.42 schema:affiliation grid-institutes:grid.413575.1
134 schema:familyName Myers
135 schema:givenName Gene
136 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0761445171.42
137 rdf:type schema:Person
138 grid-institutes:grid.17063.33 schema:alternateName University of Toronto, Canada
139 schema:name University of Toronto, Canada
140 rdf:type schema:Organization
141 grid-institutes:grid.413575.1 schema:alternateName Janelia Farms, Howard Hughes Medical Institute, USA
142 schema:name Janelia Farms, Howard Hughes Medical Institute, USA
143 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...