Retrieving Information from a Distributed Heterogeneous Document Collection View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2000-10

AUTHORS

Christoph Baumgarten

ABSTRACT

This paper describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be partitioned into subcollections. Documents as well as subcollections have to be indexed, where indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure for ranking the documents that stems from the Probability Ranking Principle: For each subcollection, the subcollection's documents are ranked; the resulting ranked lists are combined into a final ranked list of documents, where the ordering is determined by the documents' probabilities of being relevant with respect to the user's query. Various probabilistic ranking methods may be involved in the distributed ranking process. A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The property that different ranking methods and indexing vocabularies can be used is important when the subcollections are heterogeneous with respect to their content. The model's applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model, experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness. More... »

PAGES

253-271

Identifiers

URI

http://scigraph.springernature.com/pub.10.1023/a:1026572910743

DOI

http://dx.doi.org/10.1023/a:1026572910743

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1012647516


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Eurospider Information Technology (Switzerland)", 
          "id": "https://www.grid.ac/institutes/grid.433769.c", 
          "name": [
            "Eurospider Information Technology AG, Zurich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Baumgarten", 
        "givenName": "Christoph", 
        "id": "sg:person.010275346552.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010275346552.41"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1108/eb026647", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000000284"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/312624.312685", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001915040"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/asi.4630270302", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002146988"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/290941.290976", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015564001"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/133160.133202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026033666"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/314516.314517", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033293817"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/215206.215328", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1033956857"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1108/eb046814", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037275209"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/160688.160692", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042921664"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/258525.258585", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098972612"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2000-10", 
    "datePublishedReg": "2000-10-01", 
    "description": "This paper describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be partitioned into subcollections. Documents as well as subcollections have to be indexed, where indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure for ranking the documents that stems from the Probability Ranking Principle: For each subcollection, the subcollection's documents are ranked; the resulting ranked lists are combined into a final ranked list of documents, where the ordering is determined by the documents' probabilities of being relevant with respect to the user's query. Various probabilistic ranking methods may be involved in the distributed ranking process. A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The property that different ranking methods and indexing vocabularies can be used is important when the subcollections are heterogeneous with respect to their content. The model's applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model, experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1023/a:1026572910743", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1023664", 
        "issn": [
          "1386-4564", 
          "1573-7659"
        ], 
        "name": "Information Retrieval Journal", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "3", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "3"
      }
    ], 
    "name": "Retrieving Information from a Distributed Heterogeneous Document Collection", 
    "pagination": "253-271", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "3c4157a9b59f6ccfe77f6fbbde0373620f86b3f75693cb16ef7d8dc081de657d"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1023/a:1026572910743"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1012647516"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1023/a:1026572910743", 
      "https://app.dimensions.ai/details/publication/pub.1012647516"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T21:41", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8687_00000536.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1023%2FA%3A1026572910743"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1026572910743'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1026572910743'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1026572910743'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1026572910743'


 

This table displays all metadata directly associated to this object as RDF triples.

91 TRIPLES      21 PREDICATES      37 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1023/a:1026572910743 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N5ad0ee7481c74319a568fb0358d06307
4 schema:citation https://doi.org/10.1002/asi.4630270302
5 https://doi.org/10.1108/eb026647
6 https://doi.org/10.1108/eb046814
7 https://doi.org/10.1145/133160.133202
8 https://doi.org/10.1145/160688.160692
9 https://doi.org/10.1145/215206.215328
10 https://doi.org/10.1145/258525.258585
11 https://doi.org/10.1145/290941.290976
12 https://doi.org/10.1145/312624.312685
13 https://doi.org/10.1145/314516.314517
14 schema:datePublished 2000-10
15 schema:datePublishedReg 2000-10-01
16 schema:description This paper describes a probabilistic model for optimum information retrieval in a distributed heterogeneous environment. The model assumes the collection of documents offered by the environment to be partitioned into subcollections. Documents as well as subcollections have to be indexed, where indexing methods using different indexing vocabularies can be employed. A query provided by a user is answered in terms of a ranked list of documents. The model determines a procedure for ranking the documents that stems from the Probability Ranking Principle: For each subcollection, the subcollection's documents are ranked; the resulting ranked lists are combined into a final ranked list of documents, where the ordering is determined by the documents' probabilities of being relevant with respect to the user's query. Various probabilistic ranking methods may be involved in the distributed ranking process. A criterion for effectively limiting the ranking process to a subset of subcollections extends the model. The property that different ranking methods and indexing vocabularies can be used is important when the subcollections are heterogeneous with respect to their content. The model's applicability is experimentally confirmed. When exploiting the degrees of freedom provided by the model, experiments showed evidence that the model even outperforms comparable models for the non-distributed case with respect to retrieval effectiveness.
17 schema:genre research_article
18 schema:inLanguage en
19 schema:isAccessibleForFree false
20 schema:isPartOf N5ad4a0d840d94bfaabd2729754c5c6c4
21 N71406950dc8f441fafa22505bda21f3d
22 sg:journal.1023664
23 schema:name Retrieving Information from a Distributed Heterogeneous Document Collection
24 schema:pagination 253-271
25 schema:productId N6e3292c13d8a4be0b55404c1917f4d82
26 Nc68ef91281254a6e8f203f15b20e36b2
27 Neac9ed70627e4f318adc8a0b4c21475e
28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012647516
29 https://doi.org/10.1023/a:1026572910743
30 schema:sdDatePublished 2019-04-10T21:41
31 schema:sdLicense https://scigraph.springernature.com/explorer/license/
32 schema:sdPublisher N1348801b3c5d4a1c8a5c0ccf9df3fa47
33 schema:url http://link.springer.com/10.1023%2FA%3A1026572910743
34 sgo:license sg:explorer/license/
35 sgo:sdDataset articles
36 rdf:type schema:ScholarlyArticle
37 N1348801b3c5d4a1c8a5c0ccf9df3fa47 schema:name Springer Nature - SN SciGraph project
38 rdf:type schema:Organization
39 N5ad0ee7481c74319a568fb0358d06307 rdf:first sg:person.010275346552.41
40 rdf:rest rdf:nil
41 N5ad4a0d840d94bfaabd2729754c5c6c4 schema:issueNumber 3
42 rdf:type schema:PublicationIssue
43 N6e3292c13d8a4be0b55404c1917f4d82 schema:name dimensions_id
44 schema:value pub.1012647516
45 rdf:type schema:PropertyValue
46 N71406950dc8f441fafa22505bda21f3d schema:volumeNumber 3
47 rdf:type schema:PublicationVolume
48 Nc68ef91281254a6e8f203f15b20e36b2 schema:name doi
49 schema:value 10.1023/a:1026572910743
50 rdf:type schema:PropertyValue
51 Neac9ed70627e4f318adc8a0b4c21475e schema:name readcube_id
52 schema:value 3c4157a9b59f6ccfe77f6fbbde0373620f86b3f75693cb16ef7d8dc081de657d
53 rdf:type schema:PropertyValue
54 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
55 schema:name Information and Computing Sciences
56 rdf:type schema:DefinedTerm
57 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
58 schema:name Artificial Intelligence and Image Processing
59 rdf:type schema:DefinedTerm
60 sg:journal.1023664 schema:issn 1386-4564
61 1573-7659
62 schema:name Information Retrieval Journal
63 rdf:type schema:Periodical
64 sg:person.010275346552.41 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
65 schema:familyName Baumgarten
66 schema:givenName Christoph
67 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010275346552.41
68 rdf:type schema:Person
69 https://doi.org/10.1002/asi.4630270302 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002146988
70 rdf:type schema:CreativeWork
71 https://doi.org/10.1108/eb026647 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000000284
72 rdf:type schema:CreativeWork
73 https://doi.org/10.1108/eb046814 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037275209
74 rdf:type schema:CreativeWork
75 https://doi.org/10.1145/133160.133202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026033666
76 rdf:type schema:CreativeWork
77 https://doi.org/10.1145/160688.160692 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042921664
78 rdf:type schema:CreativeWork
79 https://doi.org/10.1145/215206.215328 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033956857
80 rdf:type schema:CreativeWork
81 https://doi.org/10.1145/258525.258585 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098972612
82 rdf:type schema:CreativeWork
83 https://doi.org/10.1145/290941.290976 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015564001
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1145/312624.312685 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001915040
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1145/314516.314517 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033293817
88 rdf:type schema:CreativeWork
89 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
90 schema:name Eurospider Information Technology AG, Zurich, Switzerland
91 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...