Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2003

AUTHORS

Broňa Brejová , Daniel G. Brown , Tomáš Vinař

ABSTRACT

We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics. More... »

PAGES

42-54

Book

TITLE

Combinatorial Pattern Matching

ISBN

978-3-540-40311-1
978-3-540-44888-4

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4

DOI

http://dx.doi.org/10.1007/3-540-44888-8_4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047397326


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brejov\u00e1", 
        "givenName": "Bro\u0148a", 
        "id": "sg:person.0642141060.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G.", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vina\u0159", 
        "givenName": "Tom\u00e1\u0161", 
        "id": "sg:person.01041305251.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/nar/28.1.45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004742321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/18.3.440", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006017712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.10.8.1115", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052106233"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/5.18626", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061178979"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2003", 
    "datePublishedReg": "2003-01-01", 
    "description": "We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.", 
    "editor": [
      {
        "familyName": "Baeza-Yates", 
        "givenName": "Ricardo", 
        "type": "Person"
      }, 
      {
        "familyName": "Ch\u00e1vez", 
        "givenName": "Edgar", 
        "type": "Person"
      }, 
      {
        "familyName": "Crochemore", 
        "givenName": "Maxime", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-44888-8_4", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-40311-1", 
        "978-3-540-44888-4"
      ], 
      "name": "Combinatorial Pattern Matching", 
      "type": "Book"
    }, 
    "name": "Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions", 
    "pagination": "42-54", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-44888-8_4"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "e1c8ffb9dc36713746d07abcbb024515e037809181a64491b1b757b6ba077e2b"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1047397326"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-44888-8_4", 
      "https://app.dimensions.ai/details/publication/pub.1047397326"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T22:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000272.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-44888-8_4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'


 

This table displays all metadata directly associated to this object as RDF triples.

101 TRIPLES      23 PREDICATES      31 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-44888-8_4 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N4b4a83ff2a12481090efad4fd56d3a00
4 schema:citation https://doi.org/10.1093/bioinformatics/18.3.440
5 https://doi.org/10.1093/nar/28.1.45
6 https://doi.org/10.1101/gr.10.8.1115
7 https://doi.org/10.1109/5.18626
8 schema:datePublished 2003
9 schema:datePublishedReg 2003-01-01
10 schema:description We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.
11 schema:editor Ne6f192165959413ca1d3f26579ca68d8
12 schema:genre chapter
13 schema:inLanguage en
14 schema:isAccessibleForFree true
15 schema:isPartOf N88b6bb98a2b2499d890662c609d6b946
16 schema:name Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions
17 schema:pagination 42-54
18 schema:productId N4c486ed4d4af4861aebb82e8b5bc1554
19 N4ee8b48bc8af48e7a8fb9b7ebc2e4638
20 Nc9793fd5f28341e4a001b61ff72d4f68
21 schema:publisher N00c94b8711aa433087893d6e67237445
22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047397326
23 https://doi.org/10.1007/3-540-44888-8_4
24 schema:sdDatePublished 2019-04-15T22:01
25 schema:sdLicense https://scigraph.springernature.com/explorer/license/
26 schema:sdPublisher Ne24dcae5eb6c42288914ffe2c972cc98
27 schema:url http://link.springer.com/10.1007/3-540-44888-8_4
28 sgo:license sg:explorer/license/
29 sgo:sdDataset chapters
30 rdf:type schema:Chapter
31 N00c94b8711aa433087893d6e67237445 schema:location Berlin, Heidelberg
32 schema:name Springer Berlin Heidelberg
33 rdf:type schema:Organisation
34 N40a15d5152494245ae59474ecdbf32f2 rdf:first N688b099efdf04fbf88bec83d66722fd9
35 rdf:rest N7565e2666a614f91a3a566cb84fb4980
36 N4b4a83ff2a12481090efad4fd56d3a00 rdf:first sg:person.0642141060.90
37 rdf:rest N5cc997e0f8d34f34a5992a82153fbc80
38 N4c486ed4d4af4861aebb82e8b5bc1554 schema:name doi
39 schema:value 10.1007/3-540-44888-8_4
40 rdf:type schema:PropertyValue
41 N4ee8b48bc8af48e7a8fb9b7ebc2e4638 schema:name readcube_id
42 schema:value e1c8ffb9dc36713746d07abcbb024515e037809181a64491b1b757b6ba077e2b
43 rdf:type schema:PropertyValue
44 N5cc997e0f8d34f34a5992a82153fbc80 rdf:first sg:person.0642727740.54
45 rdf:rest Nf76a6bb3181d42c5b014dbb612f4a615
46 N688b099efdf04fbf88bec83d66722fd9 schema:familyName Chávez
47 schema:givenName Edgar
48 rdf:type schema:Person
49 N7565e2666a614f91a3a566cb84fb4980 rdf:first Ne2ce348be8864e9a9eac965fae568981
50 rdf:rest rdf:nil
51 N88b6bb98a2b2499d890662c609d6b946 schema:isbn 978-3-540-40311-1
52 978-3-540-44888-4
53 schema:name Combinatorial Pattern Matching
54 rdf:type schema:Book
55 Nc9793fd5f28341e4a001b61ff72d4f68 schema:name dimensions_id
56 schema:value pub.1047397326
57 rdf:type schema:PropertyValue
58 Ne24dcae5eb6c42288914ffe2c972cc98 schema:name Springer Nature - SN SciGraph project
59 rdf:type schema:Organization
60 Ne2ce348be8864e9a9eac965fae568981 schema:familyName Crochemore
61 schema:givenName Maxime
62 rdf:type schema:Person
63 Ne6f192165959413ca1d3f26579ca68d8 rdf:first Nf22f774fd4714cbab9db437bf968f70b
64 rdf:rest N40a15d5152494245ae59474ecdbf32f2
65 Nf22f774fd4714cbab9db437bf968f70b schema:familyName Baeza-Yates
66 schema:givenName Ricardo
67 rdf:type schema:Person
68 Nf76a6bb3181d42c5b014dbb612f4a615 rdf:first sg:person.01041305251.67
69 rdf:rest rdf:nil
70 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
71 schema:name Biological Sciences
72 rdf:type schema:DefinedTerm
73 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
74 schema:name Genetics
75 rdf:type schema:DefinedTerm
76 sg:person.01041305251.67 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
77 schema:familyName Vinař
78 schema:givenName Tomáš
79 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67
80 rdf:type schema:Person
81 sg:person.0642141060.90 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
82 schema:familyName Brejová
83 schema:givenName Broňa
84 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90
85 rdf:type schema:Person
86 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
87 schema:familyName Brown
88 schema:givenName Daniel G.
89 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
90 rdf:type schema:Person
91 https://doi.org/10.1093/bioinformatics/18.3.440 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006017712
92 rdf:type schema:CreativeWork
93 https://doi.org/10.1093/nar/28.1.45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004742321
94 rdf:type schema:CreativeWork
95 https://doi.org/10.1101/gr.10.8.1115 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052106233
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1109/5.18626 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061178979
98 rdf:type schema:CreativeWork
99 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
100 schema:name School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
101 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...