Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specificity View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2003

AUTHORS

Broňa Brejová , Daniel G. Brown , Tomáš Vinař

ABSTRACT

We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core region of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences. More... »

PAGES

39-54

Book

TITLE

Algorithms in Bioinformatics

ISBN

978-3-540-20076-5
978-3-540-39763-2

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-39763-2_4

DOI

http://dx.doi.org/10.1007/978-3-540-39763-2_4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1048701396


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brejov\u00e1", 
        "givenName": "Bro\u0148a", 
        "id": "sg:person.0642141060.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brown", 
        "givenName": "Daniel G.", 
        "id": "sg:person.0642727740.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vina\u0159", 
        "givenName": "Tom\u00e1\u0161", 
        "id": "sg:person.01041305251.67", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/nar/28.1.45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004742321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/18.3.440", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006017712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.229202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006260064"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.89.22.10915", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010234644"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0022-2836(05)80360-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013618994"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/640075.640083", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018184175"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/25.17.3389", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047265454"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-44888-8_4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047397326", 
          "https://doi.org/10.1007/3-540-44888-8_4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/17.suppl_1.s140", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051444796"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2003", 
    "datePublishedReg": "2003-01-01", 
    "description": "We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core region of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.", 
    "editor": [
      {
        "familyName": "Benson", 
        "givenName": "Gary", 
        "type": "Person"
      }, 
      {
        "familyName": "Page", 
        "givenName": "Roderic D. M.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-39763-2_4", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-20076-5", 
        "978-3-540-39763-2"
      ], 
      "name": "Algorithms in Bioinformatics", 
      "type": "Book"
    }, 
    "name": "Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specificity", 
    "pagination": "39-54", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1048701396"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-39763-2_4"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "e800ec4e414f26b62c8895ba4e882ffcc7c2427ff651e52368efc98cce9bf0e6"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-39763-2_4", 
      "https://app.dimensions.ai/details/publication/pub.1048701396"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T08:05", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000359_0000000359/records_29219_00000002.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-39763-2_4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-39763-2_4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-39763-2_4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-39763-2_4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-39763-2_4'


 

This table displays all metadata directly associated to this object as RDF triples.

112 TRIPLES      23 PREDICATES      36 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-39763-2_4 schema:about anzsrc-for:06
2 anzsrc-for:0604
3 schema:author Nb2ed40662dc9458eb93e96d085ff0776
4 schema:citation sg:pub.10.1007/3-540-44888-8_4
5 https://doi.org/10.1016/s0022-2836(05)80360-2
6 https://doi.org/10.1073/pnas.89.22.10915
7 https://doi.org/10.1093/bioinformatics/17.suppl_1.s140
8 https://doi.org/10.1093/bioinformatics/18.3.440
9 https://doi.org/10.1093/nar/25.17.3389
10 https://doi.org/10.1093/nar/28.1.45
11 https://doi.org/10.1101/gr.229202
12 https://doi.org/10.1145/640075.640083
13 schema:datePublished 2003
14 schema:datePublishedReg 2003-01-01
15 schema:description We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core region of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.
16 schema:editor Nebacd680e4664abda91c8cb49d46dd74
17 schema:genre chapter
18 schema:inLanguage en
19 schema:isAccessibleForFree true
20 schema:isPartOf N4883467e1ffc4a92927fdf4d212f06b5
21 schema:name Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specificity
22 schema:pagination 39-54
23 schema:productId N1a2990d1301340088b377a6f0284621d
24 N8c2cb5aa34074b4dac305e8a3b546e40
25 Ndde59a3b4326438aad00131256cbac56
26 schema:publisher N17a79ae28b3f4f74ab32bcf7966967f3
27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048701396
28 https://doi.org/10.1007/978-3-540-39763-2_4
29 schema:sdDatePublished 2019-04-16T08:05
30 schema:sdLicense https://scigraph.springernature.com/explorer/license/
31 schema:sdPublisher N106386272efa4cd190f11791bce8c7db
32 schema:url https://link.springer.com/10.1007%2F978-3-540-39763-2_4
33 sgo:license sg:explorer/license/
34 sgo:sdDataset chapters
35 rdf:type schema:Chapter
36 N106386272efa4cd190f11791bce8c7db schema:name Springer Nature - SN SciGraph project
37 rdf:type schema:Organization
38 N17a79ae28b3f4f74ab32bcf7966967f3 schema:location Berlin, Heidelberg
39 schema:name Springer Berlin Heidelberg
40 rdf:type schema:Organisation
41 N1a2990d1301340088b377a6f0284621d schema:name doi
42 schema:value 10.1007/978-3-540-39763-2_4
43 rdf:type schema:PropertyValue
44 N4883467e1ffc4a92927fdf4d212f06b5 schema:isbn 978-3-540-20076-5
45 978-3-540-39763-2
46 schema:name Algorithms in Bioinformatics
47 rdf:type schema:Book
48 N50337713c50340ffb16a89ca45e07fcb rdf:first sg:person.01041305251.67
49 rdf:rest rdf:nil
50 N540a02d093814bbd9c4188a7a401ee4d rdf:first Na23ccb34ad404b4ca1f096e20e21cf76
51 rdf:rest rdf:nil
52 N8c2cb5aa34074b4dac305e8a3b546e40 schema:name dimensions_id
53 schema:value pub.1048701396
54 rdf:type schema:PropertyValue
55 N9c1307b4313e47cd893cf43c0ae39a0c rdf:first sg:person.0642727740.54
56 rdf:rest N50337713c50340ffb16a89ca45e07fcb
57 Na23ccb34ad404b4ca1f096e20e21cf76 schema:familyName Page
58 schema:givenName Roderic D. M.
59 rdf:type schema:Person
60 Nb2ed40662dc9458eb93e96d085ff0776 rdf:first sg:person.0642141060.90
61 rdf:rest N9c1307b4313e47cd893cf43c0ae39a0c
62 Nb85dc94d385b4f34b393fbc69688e5a5 schema:familyName Benson
63 schema:givenName Gary
64 rdf:type schema:Person
65 Ndde59a3b4326438aad00131256cbac56 schema:name readcube_id
66 schema:value e800ec4e414f26b62c8895ba4e882ffcc7c2427ff651e52368efc98cce9bf0e6
67 rdf:type schema:PropertyValue
68 Nebacd680e4664abda91c8cb49d46dd74 rdf:first Nb85dc94d385b4f34b393fbc69688e5a5
69 rdf:rest N540a02d093814bbd9c4188a7a401ee4d
70 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
71 schema:name Biological Sciences
72 rdf:type schema:DefinedTerm
73 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
74 schema:name Genetics
75 rdf:type schema:DefinedTerm
76 sg:person.01041305251.67 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
77 schema:familyName Vinař
78 schema:givenName Tomáš
79 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01041305251.67
80 rdf:type schema:Person
81 sg:person.0642141060.90 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
82 schema:familyName Brejová
83 schema:givenName Broňa
84 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642141060.90
85 rdf:type schema:Person
86 sg:person.0642727740.54 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
87 schema:familyName Brown
88 schema:givenName Daniel G.
89 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0642727740.54
90 rdf:type schema:Person
91 sg:pub.10.1007/3-540-44888-8_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047397326
92 https://doi.org/10.1007/3-540-44888-8_4
93 rdf:type schema:CreativeWork
94 https://doi.org/10.1016/s0022-2836(05)80360-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013618994
95 rdf:type schema:CreativeWork
96 https://doi.org/10.1073/pnas.89.22.10915 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010234644
97 rdf:type schema:CreativeWork
98 https://doi.org/10.1093/bioinformatics/17.suppl_1.s140 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051444796
99 rdf:type schema:CreativeWork
100 https://doi.org/10.1093/bioinformatics/18.3.440 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006017712
101 rdf:type schema:CreativeWork
102 https://doi.org/10.1093/nar/25.17.3389 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047265454
103 rdf:type schema:CreativeWork
104 https://doi.org/10.1093/nar/28.1.45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004742321
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1101/gr.229202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006260064
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1145/640075.640083 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018184175
109 rdf:type schema:CreativeWork
110 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
111 schema:name School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
112 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...