# Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

Ontology type: schema:ScholarlyArticle      Open Access: True

### Article Info

DATE

2000-07

AUTHORS

Ada Wai-chee Fu, Polly Mei-shuen Chan, Yin-Ling Cheung, Yiu Sang Moon

ABSTRACT

Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^*$\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. More... »

PAGES

154-173

TITLE

The VLDB Journal

ISSUE

2

VOLUME

9

### Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/pl00010672

DOI

http://dx.doi.org/10.1007/pl00010672

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1025617435

Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service:

[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information Systems",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Chinese University of Hong Kong",
"id": "https://www.grid.ac/institutes/grid.10784.3a",
"name": [
"Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
],
"type": "Organization"
},
"familyName": "Fu",
"id": "sg:person.010215700627.40",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010215700627.40"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Chinese University of Hong Kong",
"id": "https://www.grid.ac/institutes/grid.10784.3a",
"name": [
"Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
],
"type": "Organization"
},
"familyName": "Chan",
"givenName": "Polly Mei-shuen",
"type": "Person"
},
{
"affiliation": {
"alternateName": "Chinese University of Hong Kong",
"id": "https://www.grid.ac/institutes/grid.10784.3a",
"name": [
"Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
],
"type": "Organization"
},
"familyName": "Cheung",
"givenName": "Yin-Ling",
"id": "sg:person.014052614745.20",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014052614745.20"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Chinese University of Hong Kong",
"id": "https://www.grid.ac/institutes/grid.10784.3a",
"name": [
"Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK"
],
"type": "Organization"
},
"familyName": "Moon",
"givenName": "Yiu Sang",
"id": "sg:person.015651165147.54",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015651165147.54"
],
"type": "Person"
}
],
"datePublished": "2000-07",
"datePublishedReg": "2000-07-01",
"description": " Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} $R^*$\\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation.",
"genre": "research_article",
"id": "sg:pub.10.1007/pl00010672",
"inLanguage": [
"en"
],
"isAccessibleForFree": true,
"isPartOf": [
{
"id": "sg:journal.1044889",
"issn": [
"1066-8888",
"0949-877X"
],
"name": "The VLDB Journal",
"type": "Periodical"
},
{
"issueNumber": "2",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
}
],
"name": "Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances",
"pagination": "154-173",
"productId": [
{
"type": "PropertyValue",
"value": [
"7a0abcde180a56aa24ed852d891d4d2d6461c537e5b5a0496922558851c45cf0"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/pl00010672"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1025617435"
]
}
],
"sameAs": [
"https://doi.org/10.1007/pl00010672",
"https://app.dimensions.ai/details/publication/pub.1025617435"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-11T01:54",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8700_00000488.jsonl",
"type": "ScholarlyArticle",
}
]

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/pl00010672'

This table displays all metadata directly associated to this object as RDF triples.

81 TRIPLES      20 PREDICATES      27 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
2 anzsrc-for:0806
3 schema:author N907e47f4b73445059aaaae5b2c1bc480
4 schema:datePublished 2000-07
5 schema:datePublishedReg 2000-07-01
6 schema:description Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $R^*$\end{document}-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation.
7 schema:genre research_article
8 schema:inLanguage en
9 schema:isAccessibleForFree true
10 schema:isPartOf N26467d71a2ab4484826374a6a5b2493f
11 N5248aebba7234992876415ed6524f97d
12 sg:journal.1044889
13 schema:name Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances
14 schema:pagination 154-173
15 schema:productId N38654e41d9194de3aa40f741c4a54e87
16 N4c2fee44fe044d8ab595577a5541533c
17 N6a9e716ce0734e18a6f1a4a2214800f7
18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025617435
19 https://doi.org/10.1007/pl00010672
20 schema:sdDatePublished 2019-04-11T01:54
22 schema:sdPublisher N1af7f650a6144c4999b89da546fdb9b4
25 sgo:sdDataset articles
26 rdf:type schema:ScholarlyArticle
27 N1af7f650a6144c4999b89da546fdb9b4 schema:name Springer Nature - SN SciGraph project
28 rdf:type schema:Organization
30 rdf:type schema:PublicationVolume
31 N38654e41d9194de3aa40f741c4a54e87 schema:name dimensions_id
32 schema:value pub.1025617435
33 rdf:type schema:PropertyValue
34 N4c2fee44fe044d8ab595577a5541533c schema:name doi
35 schema:value 10.1007/pl00010672
36 rdf:type schema:PropertyValue
37 N4cbe71eded4b4d54b912f74b8d1b1d43 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
38 schema:familyName Chan
39 schema:givenName Polly Mei-shuen
40 rdf:type schema:Person
41 N5080f41767fd4d839cff5dd0a235b311 rdf:first sg:person.015651165147.54
42 rdf:rest rdf:nil
43 N5248aebba7234992876415ed6524f97d schema:issueNumber 2
44 rdf:type schema:PublicationIssue
46 schema:value 7a0abcde180a56aa24ed852d891d4d2d6461c537e5b5a0496922558851c45cf0
47 rdf:type schema:PropertyValue
48 N7854044cd5474e5585f02d4bf3b60b7b rdf:first sg:person.014052614745.20
49 rdf:rest N5080f41767fd4d839cff5dd0a235b311
50 N907e47f4b73445059aaaae5b2c1bc480 rdf:first sg:person.010215700627.40
51 rdf:rest Nc0047072dbb644d3ace1f0b69e55cd1d
52 Nc0047072dbb644d3ace1f0b69e55cd1d rdf:first N4cbe71eded4b4d54b912f74b8d1b1d43
53 rdf:rest N7854044cd5474e5585f02d4bf3b60b7b
54 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
55 schema:name Information and Computing Sciences
56 rdf:type schema:DefinedTerm
57 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
58 schema:name Information Systems
59 rdf:type schema:DefinedTerm
60 sg:journal.1044889 schema:issn 0949-877X
61 1066-8888
62 schema:name The VLDB Journal
63 rdf:type schema:Periodical
64 sg:person.010215700627.40 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
65 schema:familyName Fu
67 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010215700627.40
68 rdf:type schema:Person
69 sg:person.014052614745.20 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
70 schema:familyName Cheung
71 schema:givenName Yin-Ling
72 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014052614745.20
73 rdf:type schema:Person
74 sg:person.015651165147.54 schema:affiliation https://www.grid.ac/institutes/grid.10784.3a
75 schema:familyName Moon
76 schema:givenName Yiu Sang
77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015651165147.54
78 rdf:type schema:Person
79 https://www.grid.ac/institutes/grid.10784.3a schema:alternateName Chinese University of Hong Kong
80 schema:name Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong; e-mail: adafu@cse.cuhk.edu.hk, HK
81 rdf:type schema:Organization