Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2000-07

AUTHORS

Ada Wai-chee Fu, Polly Mei-shuen Chan, Yin-Ling Cheung, Yiu Sang Moon

ABSTRACT

Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^*$\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. More... »

PAGES

154-173

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/pl00010672

DOI

http://dx.doi.org/10.1007/pl00010672

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1025617435


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Chinese University of Hong Kong", 
          "id": "https://www.grid.ac/institutes/grid.10784.3a", 
          "name": [
            "Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fu", 
        "givenName": "Ada Wai-chee", 
        "id": "sg:person.010215700627.40", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010215700627.40"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chinese University of Hong Kong", 
          "id": "https://www.grid.ac/institutes/grid.10784.3a", 
          "name": [
            "Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chan", 
        "givenName": "Polly Mei-shuen", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chinese University of Hong Kong", 
          "id": "https://www.grid.ac/institutes/grid.10784.3a", 
          "name": [
            "Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cheung", 
        "givenName": "Yin-Ling", 
        "id": "sg:person.014052614745.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014052614745.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Chinese University of Hong Kong", 
          "id": "https://www.grid.ac/institutes/grid.10784.3a", 
          "name": [
            "Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Moon", 
        "givenName": "Yiu Sang", 
        "id": "sg:person.015651165147.54", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015651165147.54"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2000-07", 
    "datePublishedReg": "2000-07-01", 
    "description": " Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} $R^*$\\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/pl00010672", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1044889", 
        "issn": [
          "1066-8888", 
          "0949-877X"
        ], 
        "name": "The VLDB Journal", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "9"
      }
    ], 
    "name": "Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances", 
    "pagination": "154-173", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "7a0abcde180a56aa24ed852d891d4d2d6461c537e5b5a0496922558851c45cf0"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/pl00010672"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1025617435"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/pl00010672", 
      "https://app.dimensions.ai/details/publication/pub.1025617435"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T01:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8700_00000488.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007/PL00010672"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'


 

This table displays all metadata directly associated to this object as RDF triples.

81 TRIPLES      20 PREDICATES      27 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/pl00010672 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N6316c874c7894924b51227466ac356ac
4 schema:datePublished 2000-07
5 schema:datePublishedReg 2000-07-01
6 schema:description Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^*$\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation.
7 schema:genre research_article
8 schema:inLanguage en
9 schema:isAccessibleForFree true
10 schema:isPartOf N3c97cf03652b4e70adedc8b57f7568b3
11 Nc8f34c8f2c6b424c8d74b88bc6540b86
12 sg:journal.1044889
13 schema:name Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances
14 schema:pagination 154-173
15 schema:productId N5cd6ddd6c49343cfbb464e6a747c0bde
16 N6d120623f2ff4d23ac7d6ee6f3c9ec50
17 Na84bdd9b8a6340ea8729df65f91bdf8d
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025617435
19 https://doi.org/10.1007/pl00010672
20 schema:sdDatePublished 2019-04-11T01:54
21 schema:sdLicense https://scigraph.springernature.com/explorer/license/
22 schema:sdPublisher N0198a0da309047df8670adb2946cb7eb
23 schema:url http://link.springer.com/10.1007/PL00010672
24 sgo:license sg:explorer/license/
25 sgo:sdDataset articles
26 rdf:type schema:ScholarlyArticle
27 N0198a0da309047df8670adb2946cb7eb schema:name Springer Nature - SN SciGraph project
28 rdf:type schema:Organization
29 N338bf282a8e64242bb851712da9630be rdf:first Ne8ac49f939254a26bbfe9c0a86d3b6ae
30 rdf:rest Nde0bf3aa0ba74da0951222ae5ed76032
31 N3c97cf03652b4e70adedc8b57f7568b3 schema:volumeNumber 9
32 rdf:type schema:PublicationVolume
33 N5cd6ddd6c49343cfbb464e6a747c0bde schema:name readcube_id
34 schema:value 7a0abcde180a56aa24ed852d891d4d2d6461c537e5b5a0496922558851c45cf0
35 rdf:type schema:PropertyValue
36 N6316c874c7894924b51227466ac356ac rdf:first sg:person.010215700627.40
37 rdf:rest N338bf282a8e64242bb851712da9630be
38 N6d120623f2ff4d23ac7d6ee6f3c9ec50 schema:name dimensions_id
39 schema:value pub.1025617435
40 rdf:type schema:PropertyValue
41 Na84bdd9b8a6340ea8729df65f91bdf8d schema:name doi
42 schema:value 10.1007/pl00010672
43 rdf:type schema:PropertyValue
44 Nc0896892728649d7bbdd5c0f60b51815 rdf:first sg:person.015651165147.54
45 rdf:rest rdf:nil
46 Nc8f34c8f2c6b424c8d74b88bc6540b86 schema:issueNumber 2
47 rdf:type schema:PublicationIssue
48 Nde0bf3aa0ba74da0951222ae5ed76032 rdf:first sg:person.014052614745.20
49 rdf:rest Nc0896892728649d7bbdd5c0f60b51815
50 Ne8ac49f939254a26bbfe9c0a86d3b6ae schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
51 schema:familyName Chan
52 schema:givenName Polly Mei-shuen
53 rdf:type schema:Person
54 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
55 schema:name Information and Computing Sciences
56 rdf:type schema:DefinedTerm
57 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
58 schema:name Information Systems
59 rdf:type schema:DefinedTerm
60 sg:journal.1044889 schema:issn 0949-877X
61 1066-8888
62 schema:name The VLDB Journal
63 rdf:type schema:Periodical
64 sg:person.010215700627.40 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
65 schema:familyName Fu
66 schema:givenName Ada Wai-chee
67 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010215700627.40
68 rdf:type schema:Person
69 sg:person.014052614745.20 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
70 schema:familyName Cheung
71 schema:givenName Yin-Ling
72 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014052614745.20
73 rdf:type schema:Person
74 sg:person.015651165147.54 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
75 schema:familyName Moon
76 schema:givenName Yiu Sang
77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015651165147.54
78 rdf:type schema:Person
79 https://www.grid.ac/institutes/grid.10784.3a schema:alternateName Chinese University of Hong Kong
80 schema:name Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK
81 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...