Multilingual Information Retrieval Based on Document Alignment Techniques View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2002-03-15

AUTHORS

Martin Braschler , Peter Scäuble

ABSTRACT

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora. More... »

PAGES

183-197

Book

TITLE

Research and Advanced Technology for Digital Libraries

ISBN

978-3-540-65101-7
978-3-540-49653-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12

DOI

http://dx.doi.org/10.1007/3-540-49653-x_12

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1035398397


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Eurospider Information Technology (Switzerland)", 
          "id": "https://www.grid.ac/institutes/grid.433769.c", 
          "name": [
            "Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Braschler", 
        "givenName": "Martin", 
        "id": "sg:person.015363630667.99", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Swiss Federal Institute of Technology in Zurich", 
          "id": "https://www.grid.ac/institutes/grid.5801.c", 
          "name": [
            "Swiss Federal Institute of Technology (ETH), CH-8092, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sc\u00e4uble", 
        "givenName": "Peter", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/278459.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016936818"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243213", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047493755"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048194338"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1108/eb026939", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048761924"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/258525.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098972593"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2002-03-15", 
    "datePublishedReg": "2002-03-15", 
    "description": "A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.", 
    "editor": [
      {
        "familyName": "Nikolaou", 
        "givenName": "Christos", 
        "type": "Person"
      }, 
      {
        "familyName": "Stephanidis", 
        "givenName": "Constantine", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-49653-x_12", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-65101-7", 
        "978-3-540-49653-3"
      ], 
      "name": "Research and Advanced Technology for Digital Libraries", 
      "type": "Book"
    }, 
    "name": "Multilingual Information Retrieval Based on Document Alignment Techniques", 
    "pagination": "183-197", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-49653-x_12"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1035398397"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-49653-x_12", 
      "https://app.dimensions.ai/details/publication/pub.1035398397"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:42", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89789_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F3-540-49653-X_12"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'


 

This table displays all metadata directly associated to this object as RDF triples.

98 TRIPLES      23 PREDICATES      32 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-49653-x_12 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author Neab69ead472e41fe9df648d726789874
4 schema:citation sg:pub.10.1007/978-1-4615-6163-7
5 https://doi.org/10.1108/eb026939
6 https://doi.org/10.1145/243199.243202
7 https://doi.org/10.1145/243199.243213
8 https://doi.org/10.1145/258525.258540
9 https://doi.org/10.1145/278459.258540
10 schema:datePublished 2002-03-15
11 schema:datePublishedReg 2002-03-15
12 schema:description A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.
13 schema:editor N593af93f3f6642889c91658c53c48529
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree false
17 schema:isPartOf N12cdd9afcb244f4998e7af1013ee5393
18 schema:name Multilingual Information Retrieval Based on Document Alignment Techniques
19 schema:pagination 183-197
20 schema:productId N2ba66c843fb141998248c5160b6201d8
21 N6418d5fe242849149bd4c62b80a967d3
22 N8f44cf70abf2475985a128ed1bd45b67
23 schema:publisher N90428ddbedf64931bf92563fde74f047
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035398397
25 https://doi.org/10.1007/3-540-49653-x_12
26 schema:sdDatePublished 2019-04-16T05:42
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher N0e6dd388845a43faac71274b0308d9ca
29 schema:url https://link.springer.com/10.1007%2F3-540-49653-X_12
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N08635fd037a54ec2900fda7ed9999a87 schema:affiliation https://www.grid.ac/institutes/grid.5801.c
34 schema:familyName Scäuble
35 schema:givenName Peter
36 rdf:type schema:Person
37 N0e6dd388845a43faac71274b0308d9ca schema:name Springer Nature - SN SciGraph project
38 rdf:type schema:Organization
39 N12cdd9afcb244f4998e7af1013ee5393 schema:isbn 978-3-540-49653-3
40 978-3-540-65101-7
41 schema:name Research and Advanced Technology for Digital Libraries
42 rdf:type schema:Book
43 N204683a67b634db0a2f5581df8a629de schema:familyName Stephanidis
44 schema:givenName Constantine
45 rdf:type schema:Person
46 N2ba66c843fb141998248c5160b6201d8 schema:name dimensions_id
47 schema:value pub.1035398397
48 rdf:type schema:PropertyValue
49 N4b7075cc9f7143d2bd8e5bdf70574870 rdf:first N08635fd037a54ec2900fda7ed9999a87
50 rdf:rest rdf:nil
51 N593af93f3f6642889c91658c53c48529 rdf:first N6a565bda63504d198f61dc3770471bb5
52 rdf:rest Nc7d8525e69a6415f905f4b05e77ac0c7
53 N6418d5fe242849149bd4c62b80a967d3 schema:name doi
54 schema:value 10.1007/3-540-49653-x_12
55 rdf:type schema:PropertyValue
56 N6a565bda63504d198f61dc3770471bb5 schema:familyName Nikolaou
57 schema:givenName Christos
58 rdf:type schema:Person
59 N8f44cf70abf2475985a128ed1bd45b67 schema:name readcube_id
60 schema:value a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc
61 rdf:type schema:PropertyValue
62 N90428ddbedf64931bf92563fde74f047 schema:location Berlin, Heidelberg
63 schema:name Springer Berlin Heidelberg
64 rdf:type schema:Organisation
65 Nc7d8525e69a6415f905f4b05e77ac0c7 rdf:first N204683a67b634db0a2f5581df8a629de
66 rdf:rest rdf:nil
67 Neab69ead472e41fe9df648d726789874 rdf:first sg:person.015363630667.99
68 rdf:rest N4b7075cc9f7143d2bd8e5bdf70574870
69 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
70 schema:name Language, Communication and Culture
71 rdf:type schema:DefinedTerm
72 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
73 schema:name Linguistics
74 rdf:type schema:DefinedTerm
75 sg:person.015363630667.99 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
76 schema:familyName Braschler
77 schema:givenName Martin
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99
79 rdf:type schema:Person
80 sg:pub.10.1007/978-1-4615-6163-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028655623
81 https://doi.org/10.1007/978-1-4615-6163-7
82 rdf:type schema:CreativeWork
83 https://doi.org/10.1108/eb026939 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048761924
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1145/243199.243202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048194338
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1145/243199.243213 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047493755
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1145/258525.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098972593
90 rdf:type schema:CreativeWork
91 https://doi.org/10.1145/278459.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016936818
92 rdf:type schema:CreativeWork
93 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
94 schema:name Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Zürich, Switzerland
95 rdf:type schema:Organization
96 https://www.grid.ac/institutes/grid.5801.c schema:alternateName Swiss Federal Institute of Technology in Zurich
97 schema:name Swiss Federal Institute of Technology (ETH), CH-8092, Zürich, Switzerland
98 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...