Ontology type: schema:Chapter Open Access: True
2007
AUTHORSDonald Metzler , Susan Dumais , Christopher Meek
ABSTRACTMeasuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency. More... »
PAGES16-27
Advances in Information Retrieval
ISBN
978-3-540-71494-1
978-3-540-71496-5
http://scigraph.springernature.com/pub.10.1007/978-3-540-71496-5_5
DOIhttp://dx.doi.org/10.1007/978-3-540-71496-5_5
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1030086294
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"name": [
"University of Massachusetts, Amherst, MA,"
],
"type": "Organization"
},
"familyName": "Metzler",
"givenName": "Donald",
"type": "Person"
},
{
"affiliation": {
"alternateName": "Microsoft (United States)",
"id": "https://www.grid.ac/institutes/grid.419815.0",
"name": [
"Microsoft Research, Redmond, WA,"
],
"type": "Organization"
},
"familyName": "Dumais",
"givenName": "Susan",
"id": "sg:person.014627200551.35",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014627200551.35"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Microsoft (United States)",
"id": "https://www.grid.ac/institutes/grid.419815.0",
"name": [
"Microsoft Research, Redmond, WA,"
],
"type": "Organization"
},
"familyName": "Meek",
"givenName": "Christopher",
"id": "sg:person.01352023432.48",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01352023432.48"
],
"type": "Person"
}
],
"citation": [
{
"id": "https://doi.org/10.1145/1099554.1099695",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002114215"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.3115/1220575.1220661",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1007502057"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/312624.312681",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1012032169"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1012153938"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/1135777.1135834",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1014998074"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/160688.160718",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1017494025"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/383952.384019",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1019714596"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/383952.383972",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1019923797"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/502585.502654",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1045668804"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/1135777.1135835",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1046305355"
],
"type": "CreativeWork"
}
],
"datePublished": "2007",
"datePublishedReg": "2007-01-01",
"description": "Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.",
"editor": [
{
"familyName": "Amati",
"givenName": "Giambattista",
"type": "Person"
},
{
"familyName": "Carpineto",
"givenName": "Claudio",
"type": "Person"
},
{
"familyName": "Romano",
"givenName": "Giovanni",
"type": "Person"
}
],
"genre": "chapter",
"id": "sg:pub.10.1007/978-3-540-71496-5_5",
"inLanguage": [
"en"
],
"isAccessibleForFree": true,
"isPartOf": {
"isbn": [
"978-3-540-71494-1",
"978-3-540-71496-5"
],
"name": "Advances in Information Retrieval",
"type": "Book"
},
"name": "Similarity Measures for Short Segments of Text",
"pagination": "16-27",
"productId": [
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/978-3-540-71496-5_5"
]
},
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"8f5404261a02e4cbbeedb044420f92181f8d100d32dc3015335c37f4f267b435"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1030086294"
]
}
],
"publisher": {
"location": "Berlin, Heidelberg",
"name": "Springer Berlin Heidelberg",
"type": "Organisation"
},
"sameAs": [
"https://doi.org/10.1007/978-3-540-71496-5_5",
"https://app.dimensions.ai/details/publication/pub.1030086294"
],
"sdDataset": "chapters",
"sdDatePublished": "2019-04-15T17:14",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8678_00000261.jsonl",
"type": "Chapter",
"url": "http://link.springer.com/10.1007/978-3-540-71496-5_5"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-71496-5_5'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-71496-5_5'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-71496-5_5'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-71496-5_5'
This table displays all metadata directly associated to this object as RDF triples.
120 TRIPLES
23 PREDICATES
37 URIs
20 LITERALS
8 BLANK NODES