Multilingual Information Retrieval Based on Document Alignment Techniques View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2002-03-15

AUTHORS

Martin Braschler , Peter Scäuble

ABSTRACT

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora. More... »

PAGES

183-197

Book

TITLE

Research and Advanced Technology for Digital Libraries

ISBN

978-3-540-65101-7
978-3-540-49653-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12

DOI

http://dx.doi.org/10.1007/3-540-49653-x_12

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1035398397


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Eurospider Information Technology (Switzerland)", 
          "id": "https://www.grid.ac/institutes/grid.433769.c", 
          "name": [
            "Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Braschler", 
        "givenName": "Martin", 
        "id": "sg:person.015363630667.99", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Swiss Federal Institute of Technology in Zurich", 
          "id": "https://www.grid.ac/institutes/grid.5801.c", 
          "name": [
            "Swiss Federal Institute of Technology (ETH), CH-8092, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sc\u00e4uble", 
        "givenName": "Peter", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/278459.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016936818"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243213", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047493755"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048194338"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1108/eb026939", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048761924"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/258525.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098972593"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2002-03-15", 
    "datePublishedReg": "2002-03-15", 
    "description": "A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.", 
    "editor": [
      {
        "familyName": "Nikolaou", 
        "givenName": "Christos", 
        "type": "Person"
      }, 
      {
        "familyName": "Stephanidis", 
        "givenName": "Constantine", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-49653-x_12", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-65101-7", 
        "978-3-540-49653-3"
      ], 
      "name": "Research and Advanced Technology for Digital Libraries", 
      "type": "Book"
    }, 
    "name": "Multilingual Information Retrieval Based on Document Alignment Techniques", 
    "pagination": "183-197", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-49653-x_12"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1035398397"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-49653-x_12", 
      "https://app.dimensions.ai/details/publication/pub.1035398397"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:42", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89789_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F3-540-49653-X_12"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'


 

This table displays all metadata directly associated to this object as RDF triples.

98 TRIPLES      23 PREDICATES      32 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-49653-x_12 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author N36ca7b1d62434ef79d07569ce9950ea5
4 schema:citation sg:pub.10.1007/978-1-4615-6163-7
5 https://doi.org/10.1108/eb026939
6 https://doi.org/10.1145/243199.243202
7 https://doi.org/10.1145/243199.243213
8 https://doi.org/10.1145/258525.258540
9 https://doi.org/10.1145/278459.258540
10 schema:datePublished 2002-03-15
11 schema:datePublishedReg 2002-03-15
12 schema:description A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.
13 schema:editor N4d83a9c3a61c4d608e202c04ecfc48a4
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree false
17 schema:isPartOf N4fe6fe6755c7470b87e328ea11e902d5
18 schema:name Multilingual Information Retrieval Based on Document Alignment Techniques
19 schema:pagination 183-197
20 schema:productId N2285e7ccf7154eb6aa7d517ea1554263
21 Nd0368a79281945a5946bb1aa3dec9225
22 Ne00b465f75a749e482c0427666301d3d
23 schema:publisher Naea88eb557b44d4da0f857e454fed6c8
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035398397
25 https://doi.org/10.1007/3-540-49653-x_12
26 schema:sdDatePublished 2019-04-16T05:42
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher N15e1bcbfecea424ab36267dfae22c582
29 schema:url https://link.springer.com/10.1007%2F3-540-49653-X_12
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N15e1bcbfecea424ab36267dfae22c582 schema:name Springer Nature - SN SciGraph project
34 rdf:type schema:Organization
35 N2285e7ccf7154eb6aa7d517ea1554263 schema:name dimensions_id
36 schema:value pub.1035398397
37 rdf:type schema:PropertyValue
38 N36ca7b1d62434ef79d07569ce9950ea5 rdf:first sg:person.015363630667.99
39 rdf:rest N3c12302459ea4c409c59c94104f38ca3
40 N3c12302459ea4c409c59c94104f38ca3 rdf:first N64a3f5044f65475fa9be393860e40ea5
41 rdf:rest rdf:nil
42 N4d83a9c3a61c4d608e202c04ecfc48a4 rdf:first Nc49df41338f548bca948c012c316c6ce
43 rdf:rest Ncc271cd7790846869cd52a06af96e8cb
44 N4fe6fe6755c7470b87e328ea11e902d5 schema:isbn 978-3-540-49653-3
45 978-3-540-65101-7
46 schema:name Research and Advanced Technology for Digital Libraries
47 rdf:type schema:Book
48 N64a3f5044f65475fa9be393860e40ea5 schema:affiliation https://www.grid.ac/institutes/grid.5801.c
49 schema:familyName Scäuble
50 schema:givenName Peter
51 rdf:type schema:Person
52 N975c40f6e99446768804e5b61bce9f16 schema:familyName Stephanidis
53 schema:givenName Constantine
54 rdf:type schema:Person
55 Naea88eb557b44d4da0f857e454fed6c8 schema:location Berlin, Heidelberg
56 schema:name Springer Berlin Heidelberg
57 rdf:type schema:Organisation
58 Nc49df41338f548bca948c012c316c6ce schema:familyName Nikolaou
59 schema:givenName Christos
60 rdf:type schema:Person
61 Ncc271cd7790846869cd52a06af96e8cb rdf:first N975c40f6e99446768804e5b61bce9f16
62 rdf:rest rdf:nil
63 Nd0368a79281945a5946bb1aa3dec9225 schema:name doi
64 schema:value 10.1007/3-540-49653-x_12
65 rdf:type schema:PropertyValue
66 Ne00b465f75a749e482c0427666301d3d schema:name readcube_id
67 schema:value a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc
68 rdf:type schema:PropertyValue
69 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
70 schema:name Language, Communication and Culture
71 rdf:type schema:DefinedTerm
72 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
73 schema:name Linguistics
74 rdf:type schema:DefinedTerm
75 sg:person.015363630667.99 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
76 schema:familyName Braschler
77 schema:givenName Martin
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99
79 rdf:type schema:Person
80 sg:pub.10.1007/978-1-4615-6163-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028655623
81 https://doi.org/10.1007/978-1-4615-6163-7
82 rdf:type schema:CreativeWork
83 https://doi.org/10.1108/eb026939 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048761924
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1145/243199.243202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048194338
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1145/243199.243213 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047493755
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1145/258525.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098972593
90 rdf:type schema:CreativeWork
91 https://doi.org/10.1145/278459.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016936818
92 rdf:type schema:CreativeWork
93 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
94 schema:name Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Zürich, Switzerland
95 rdf:type schema:Organization
96 https://www.grid.ac/institutes/grid.5801.c schema:alternateName Swiss Federal Institute of Technology in Zurich
97 schema:name Swiss Federal Institute of Technology (ETH), CH-8092, Zürich, Switzerland
98 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...