Efficient similarity search in sequence databases View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1993

AUTHORS

Rakesh Agrawal , Christos Faloutsos , Arun Swami

ABSTRACT

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences. More... »

PAGES

69-84

Book

TITLE

Foundations of Data Organization and Algorithms

ISBN

978-3-540-57301-2
978-3-540-48047-1

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5

DOI

http://dx.doi.org/10.1007/3-540-57301-1_5

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1051043966


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Agrawal", 
        "givenName": "Rakesh", 
        "id": "sg:person.015021155467.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015021155467.25"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Faloutsos", 
        "givenName": "Christos", 
        "id": "sg:person.010155142175.75", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010155142175.75"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Swami", 
        "givenName": "Arun", 
        "id": "sg:person.012065731731.64", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012065731731.64"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "1993", 
    "datePublishedReg": "1993-01-01", 
    "description": "We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1\u20133) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.", 
    "editor": [
      {
        "familyName": "Lomet", 
        "givenName": "David B.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-57301-1_5", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-57301-2", 
        "978-3-540-48047-1"
      ], 
      "name": "Foundations of Data Organization and Algorithms", 
      "type": "Book"
    }, 
    "name": "Efficient similarity search in sequence databases", 
    "pagination": "69-84", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-57301-1_5"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "f327b9cd98bd30dd9227458415c6ba0ab556f151c97db5f8b4bd5866deecf531"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1051043966"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-57301-1_5", 
      "https://app.dimensions.ai/details/publication/pub.1051043966"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T21:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000087.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-57301-1_5"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'


 

This table displays all metadata directly associated to this object as RDF triples.

79 TRIPLES      22 PREDICATES      27 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-57301-1_5 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N0862aed0aead466f896178b1cf7f0d89
4 schema:datePublished 1993
5 schema:datePublishedReg 1993-01-01
6 schema:description We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.
7 schema:editor Nf2b8fc799ec34cb482b535f76f163e03
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N1d19faee3b6840f093a540ab7674e8ad
12 schema:name Efficient similarity search in sequence databases
13 schema:pagination 69-84
14 schema:productId Naa5af2f7da13490f86da418fb9b60815
15 Nb22d81b372964c99a6674daa0f41ff41
16 Nd6003a6e1973424ebe109832667285f2
17 schema:publisher N243199aee97b4d20bae409f771545740
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051043966
19 https://doi.org/10.1007/3-540-57301-1_5
20 schema:sdDatePublished 2019-04-15T21:48
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher N9731cf88ed294aa788412aa1a600ae56
23 schema:url http://link.springer.com/10.1007/3-540-57301-1_5
24 sgo:license sg:explorer/license/
25 sgo:sdDataset chapters
26 rdf:type schema:Chapter
27 N0862aed0aead466f896178b1cf7f0d89 rdf:first sg:person.015021155467.25
28 rdf:rest N48f50868201346d3a5dcb42e8dc01051
29 N1d19faee3b6840f093a540ab7674e8ad schema:isbn 978-3-540-48047-1
30 978-3-540-57301-2
31 schema:name Foundations of Data Organization and Algorithms
32 rdf:type schema:Book
33 N243199aee97b4d20bae409f771545740 schema:location Berlin, Heidelberg
34 schema:name Springer Berlin Heidelberg
35 rdf:type schema:Organisation
36 N48f50868201346d3a5dcb42e8dc01051 rdf:first sg:person.010155142175.75
37 rdf:rest Nc2dc6e1118a84b96a400683f02c03711
38 N949fb1ac4af942f9bb0a8c8885f648e7 schema:familyName Lomet
39 schema:givenName David B.
40 rdf:type schema:Person
41 N9731cf88ed294aa788412aa1a600ae56 schema:name Springer Nature - SN SciGraph project
42 rdf:type schema:Organization
43 Naa5af2f7da13490f86da418fb9b60815 schema:name doi
44 schema:value 10.1007/3-540-57301-1_5
45 rdf:type schema:PropertyValue
46 Nb22d81b372964c99a6674daa0f41ff41 schema:name dimensions_id
47 schema:value pub.1051043966
48 rdf:type schema:PropertyValue
49 Nc2dc6e1118a84b96a400683f02c03711 rdf:first sg:person.012065731731.64
50 rdf:rest rdf:nil
51 Nd6003a6e1973424ebe109832667285f2 schema:name readcube_id
52 schema:value f327b9cd98bd30dd9227458415c6ba0ab556f151c97db5f8b4bd5866deecf531
53 rdf:type schema:PropertyValue
54 Nf2b8fc799ec34cb482b535f76f163e03 rdf:first N949fb1ac4af942f9bb0a8c8885f648e7
55 rdf:rest rdf:nil
56 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
57 schema:name Information and Computing Sciences
58 rdf:type schema:DefinedTerm
59 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
60 schema:name Information Systems
61 rdf:type schema:DefinedTerm
62 sg:person.010155142175.75 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
63 schema:familyName Faloutsos
64 schema:givenName Christos
65 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010155142175.75
66 rdf:type schema:Person
67 sg:person.012065731731.64 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
68 schema:familyName Swami
69 schema:givenName Arun
70 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012065731731.64
71 rdf:type schema:Person
72 sg:person.015021155467.25 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
73 schema:familyName Agrawal
74 schema:givenName Rakesh
75 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015021155467.25
76 rdf:type schema:Person
77 https://www.grid.ac/institutes/grid.481551.c schema:alternateName IBM Research - Almaden
78 schema:name IBM Almaden Research Center, 650 Harry Road, 95120-6099 San Jose, CA
79 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...