Efficient similarity search in sequence databases View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

1993

AUTHORS

Rakesh Agrawal , Christos Faloutsos , Arun Swami

ABSTRACT

We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences. More... »

PAGES

69-84

Book

TITLE

Foundations of Data Organization and Algorithms

ISBN

978-3-540-57301-2
978-3-540-48047-1

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5

DOI

http://dx.doi.org/10.1007/3-540-57301-1_5

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1051043966


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Agrawal", 
        "givenName": "Rakesh", 
        "id": "sg:person.015021155467.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015021155467.25"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Faloutsos", 
        "givenName": "Christos", 
        "id": "sg:person.010155142175.75", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010155142175.75"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research - Almaden", 
          "id": "https://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 650 Harry Road, 95120-6099\u00a0San Jose, CA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Swami", 
        "givenName": "Arun", 
        "id": "sg:person.012065731731.64", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012065731731.64"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "1993", 
    "datePublishedReg": "1993-01-01", 
    "description": "We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1\u20133) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.", 
    "editor": [
      {
        "familyName": "Lomet", 
        "givenName": "David B.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-57301-1_5", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-57301-2", 
        "978-3-540-48047-1"
      ], 
      "name": "Foundations of Data Organization and Algorithms", 
      "type": "Book"
    }, 
    "name": "Efficient similarity search in sequence databases", 
    "pagination": "69-84", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-57301-1_5"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "f327b9cd98bd30dd9227458415c6ba0ab556f151c97db5f8b4bd5866deecf531"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1051043966"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-57301-1_5", 
      "https://app.dimensions.ai/details/publication/pub.1051043966"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T21:48", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8693_00000087.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-57301-1_5"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-57301-1_5'


 

This table displays all metadata directly associated to this object as RDF triples.

79 TRIPLES      22 PREDICATES      27 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-57301-1_5 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N43d84b23a77048638afaac648d924096
4 schema:datePublished 1993
5 schema:datePublishedReg 1993-01-01
6 schema:description We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.
7 schema:editor N48c8a80c10a845549d5458988b23de5c
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf N9a318b09108a4ec5819b816d0814afd6
12 schema:name Efficient similarity search in sequence databases
13 schema:pagination 69-84
14 schema:productId N42272b6d49084b41b6f3fbd2fceed034
15 N577ff50a089a496aac0c348880eb57f1
16 N9e97c310d4984d3f9e82e2e3ed1d005d
17 schema:publisher N3b6774206c494145bdd94a2b6d2a5d14
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051043966
19 https://doi.org/10.1007/3-540-57301-1_5
20 schema:sdDatePublished 2019-04-15T21:48
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher Nde65cdcc50f844f487bf0d54611e25cc
23 schema:url http://link.springer.com/10.1007/3-540-57301-1_5
24 sgo:license sg:explorer/license/
25 sgo:sdDataset chapters
26 rdf:type schema:Chapter
27 N3b6774206c494145bdd94a2b6d2a5d14 schema:location Berlin, Heidelberg
28 schema:name Springer Berlin Heidelberg
29 rdf:type schema:Organisation
30 N42272b6d49084b41b6f3fbd2fceed034 schema:name doi
31 schema:value 10.1007/3-540-57301-1_5
32 rdf:type schema:PropertyValue
33 N43d84b23a77048638afaac648d924096 rdf:first sg:person.015021155467.25
34 rdf:rest Ne164039d05d141809fab81e670f0b6ca
35 N48c8a80c10a845549d5458988b23de5c rdf:first Ncbd44aed67aa4078b5e71b492f83f6da
36 rdf:rest rdf:nil
37 N577ff50a089a496aac0c348880eb57f1 schema:name dimensions_id
38 schema:value pub.1051043966
39 rdf:type schema:PropertyValue
40 N9a318b09108a4ec5819b816d0814afd6 schema:isbn 978-3-540-48047-1
41 978-3-540-57301-2
42 schema:name Foundations of Data Organization and Algorithms
43 rdf:type schema:Book
44 N9e97c310d4984d3f9e82e2e3ed1d005d schema:name readcube_id
45 schema:value f327b9cd98bd30dd9227458415c6ba0ab556f151c97db5f8b4bd5866deecf531
46 rdf:type schema:PropertyValue
47 Ncbd44aed67aa4078b5e71b492f83f6da schema:familyName Lomet
48 schema:givenName David B.
49 rdf:type schema:Person
50 Nde65cdcc50f844f487bf0d54611e25cc schema:name Springer Nature - SN SciGraph project
51 rdf:type schema:Organization
52 Ne164039d05d141809fab81e670f0b6ca rdf:first sg:person.010155142175.75
53 rdf:rest Ned19752670aa4796a5c7820e015831c0
54 Ned19752670aa4796a5c7820e015831c0 rdf:first sg:person.012065731731.64
55 rdf:rest rdf:nil
56 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
57 schema:name Information and Computing Sciences
58 rdf:type schema:DefinedTerm
59 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
60 schema:name Information Systems
61 rdf:type schema:DefinedTerm
62 sg:person.010155142175.75 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
63 schema:familyName Faloutsos
64 schema:givenName Christos
65 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010155142175.75
66 rdf:type schema:Person
67 sg:person.012065731731.64 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
68 schema:familyName Swami
69 schema:givenName Arun
70 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012065731731.64
71 rdf:type schema:Person
72 sg:person.015021155467.25 schema:affiliation https://www.grid.ac/institutes/grid.481551.c
73 schema:familyName Agrawal
74 schema:givenName Rakesh
75 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015021155467.25
76 rdf:type schema:Person
77 https://www.grid.ac/institutes/grid.481551.c schema:alternateName IBM Research - Almaden
78 schema:name IBM Almaden Research Center, 650 Harry Road, 95120-6099 San Jose, CA
79 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...