Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2003

AUTHORS

Broňa Brejová , Daniel G. Brown , Tomáš Vinař

ABSTRACT

We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics. More... »

PAGES

42-54

Book

TITLE

Combinatorial Pattern Matching

ISBN

978-3-540-40311-1
978-3-540-44888-4

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4

DOI

http://dx.doi.org/10.1007/3-540-44888-8_4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047397326


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brejov\u00e1", 
        "givenName": "Bro\u0148a", 
        "id": "sg:person.0642141060.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G.", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vina\u0159", 
        "givenName": "Tom\u00e1\u0161", 
        "id": "sg:person.01041305251.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/nar/28.1.45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004742321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/18.3.440", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006017712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.10.8.1115", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052106233"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/5.18626", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061178979"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2003", 
    "datePublishedReg": "2003-01-01", 
    "description": "We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.", 
    "editor": [
      {
        "familyName": "Baeza-Yates", 
        "givenName": "Ricardo", 
        "type": "Person"
      }, 
      {
        "familyName": "Ch\u00e1vez", 
        "givenName": "Edgar", 
        "type": "Person"
      }, 
      {
        "familyName": "Crochemore", 
        "givenName": "Maxime", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-44888-8_4", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-40311-1", 
        "978-3-540-44888-4"
      ], 
      "name": "Combinatorial Pattern Matching", 
      "type": "Book"
    }, 
    "name": "Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions", 
    "pagination": "42-54", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-44888-8_4"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "e1c8ffb9dc36713746d07abcbb024515e037809181a64491b1b757b6ba077e2b"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1047397326"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-44888-8_4", 
      "https://app.dimensions.ai/details/publication/pub.1047397326"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T22:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000272.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-44888-8_4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-44888-8_4'


 

This table displays all metadata directly associated to this object as RDF triples.

101 TRIPLES      23 PREDICATES      31 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-44888-8_4 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author N364215f3a164453e9dc6b39017c765a5
4 schema:citation https://doi.org/10.1093/bioinformatics/18.3.440
5 https://doi.org/10.1093/nar/28.1.45
6 https://doi.org/10.1101/gr.10.8.1115
7 https://doi.org/10.1109/5.18626
8 schema:datePublished 2003
9 schema:datePublishedReg 2003-01-01
10 schema:description We study the problem of computing optimal spaced seeds for detecting sequences generated by a Hidden Markov model. Inspired by recent work in DNA sequence alignment, we have developed such a model for representing the conservation between related DNA coding sequences. Our model includes positional dependencies and periodic rates of conservation, as well as regional deviations in overall conservation rate. We show that, for hidden Markov models in general, the probability that a seed is matched in a region can be computed efficiently, and use these methods to compute the optimal seed for our models. Our experiments on real data show that the optimal seeds are substantially more sensitive than the seeds used in the standard alignment program BLAST, and also substantially better than those of PatternHunter or WABA, both of which use spaced seeds. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.
11 schema:editor N0b279428a7884a83acbf700eaead71df
12 schema:genre chapter
13 schema:inLanguage en
14 schema:isAccessibleForFree true
15 schema:isPartOf N306d4e40000548b59993c2a7e44640bb
16 schema:name Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions
17 schema:pagination 42-54
18 schema:productId N06b6a800338443148505cd2719b0bbf8
19 Nbea3455e3f9246a28ff106f09e8959ab
20 Ne10355ffd91a4868aea3583e3a1ccb19
21 schema:publisher N865e6b2c745d4408b002eabb42d2ddb1
22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047397326
23 https://doi.org/10.1007/3-540-44888-8_4
24 schema:sdDatePublished 2019-04-15T22:01
25 schema:sdLicense https://scigraph.springernature.com/explorer/license/
26 schema:sdPublisher N654d1548bd1d4ae1986d5f07aa97e3e1
27 schema:url http://link.springer.com/10.1007/3-540-44888-8_4
28 sgo:license sg:explorer/license/
29 sgo:sdDataset chapters
30 rdf:type schema:Chapter
31 N06b6a800338443148505cd2719b0bbf8 schema:name readcube_id
32 schema:value e1c8ffb9dc36713746d07abcbb024515e037809181a64491b1b757b6ba077e2b
33 rdf:type schema:PropertyValue
34 N0b279428a7884a83acbf700eaead71df rdf:first N0d39fde0eb834d51bb0fd92bd389c128
35 rdf:rest N9c83cbcee0d14a5ebe9cb4c01cad5556
36 N0d39fde0eb834d51bb0fd92bd389c128 schema:familyName Baeza-Yates
37 schema:givenName Ricardo
38 rdf:type schema:Person
39 N2d9838209de247d6b672561c816cac8f rdf:first sg:person.0642727740.54
40 rdf:rest N5f5ffaba45584f89aaa1f398fbe1d8c8
41 N306d4e40000548b59993c2a7e44640bb schema:isbn 978-3-540-40311-1
42 978-3-540-44888-4
43 schema:name Combinatorial Pattern Matching
44 rdf:type schema:Book
45 N364215f3a164453e9dc6b39017c765a5 rdf:first sg:person.0642141060.90
46 rdf:rest N2d9838209de247d6b672561c816cac8f
47 N5f5ffaba45584f89aaa1f398fbe1d8c8 rdf:first sg:person.01041305251.67
48 rdf:rest rdf:nil
49 N654d1548bd1d4ae1986d5f07aa97e3e1 schema:name Springer Nature - SN SciGraph project
50 rdf:type schema:Organization
51 N865e6b2c745d4408b002eabb42d2ddb1 schema:location Berlin, Heidelberg
52 schema:name Springer Berlin Heidelberg
53 rdf:type schema:Organisation
54 N93669c2f366348d480f88cc64caab49b schema:familyName Crochemore
55 schema:givenName Maxime
56 rdf:type schema:Person
57 N9c83cbcee0d14a5ebe9cb4c01cad5556 rdf:first Nda2f6efc797a4496950ac0953f31f0e7
58 rdf:rest Na64267cdcc2a4a4398f36832fca094bb
59 Na64267cdcc2a4a4398f36832fca094bb rdf:first N93669c2f366348d480f88cc64caab49b
60 rdf:rest rdf:nil
61 Nbea3455e3f9246a28ff106f09e8959ab schema:name doi
62 schema:value 10.1007/3-540-44888-8_4
63 rdf:type schema:PropertyValue
64 Nda2f6efc797a4496950ac0953f31f0e7 schema:familyName Chávez
65 schema:givenName Edgar
66 rdf:type schema:Person
67 Ne10355ffd91a4868aea3583e3a1ccb19 schema:name dimensions_id
68 schema:value pub.1047397326
69 rdf:type schema:PropertyValue
70 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
71 schema:name Biological Sciences
72 rdf:type schema:DefinedTerm
73 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
74 schema:name Genetics
75 rdf:type schema:DefinedTerm
76 sg:person.01041305251.67 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
77 schema:familyName Vinař
78 schema:givenName Tomáš
79 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67
80 rdf:type schema:Person
81 sg:person.0642141060.90 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
82 schema:familyName Brejová
83 schema:givenName Broňa
84 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90
85 rdf:type schema:Person
86 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
87 schema:familyName Brown
88 schema:givenName Daniel G.
89 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
90 rdf:type schema:Person
91 https://doi.org/10.1093/bioinformatics/18.3.440 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006017712
92 rdf:type schema:CreativeWork
93 https://doi.org/10.1093/nar/28.1.45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004742321
94 rdf:type schema:CreativeWork
95 https://doi.org/10.1101/gr.10.8.1115 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052106233
96 rdf:type schema:CreativeWork
97 https://doi.org/10.1109/5.18626 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061178979
98 rdf:type schema:CreativeWork
99 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
100 schema:name School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1, Canada
101 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...