Multilingual Information Retrieval Based on Document Alignment Techniques View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2002-03-15

AUTHORS

Martin Braschler , Peter Scäuble

ABSTRACT

A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora. More... »

PAGES

183-197

Book

TITLE

Research and Advanced Technology for Digital Libraries

ISBN

978-3-540-65101-7
978-3-540-49653-3

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12

DOI

http://dx.doi.org/10.1007/3-540-49653-x_12

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1035398397


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Linguistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Language, Communication and Culture", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Eurospider Information Technology (Switzerland)", 
          "id": "https://www.grid.ac/institutes/grid.433769.c", 
          "name": [
            "Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Braschler", 
        "givenName": "Martin", 
        "id": "sg:person.015363630667.99", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Swiss Federal Institute of Technology in Zurich", 
          "id": "https://www.grid.ac/institutes/grid.5801.c", 
          "name": [
            "Swiss Federal Institute of Technology (ETH), CH-8092, Z\u00fcrich, Switzerland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Sc\u00e4uble", 
        "givenName": "Peter", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/278459.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016936818"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-4615-6163-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028655623", 
          "https://doi.org/10.1007/978-1-4615-6163-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243213", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047493755"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/243199.243202", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048194338"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1108/eb026939", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048761924"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/258525.258540", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1098972593"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2002-03-15", 
    "datePublishedReg": "2002-03-15", 
    "description": "A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.", 
    "editor": [
      {
        "familyName": "Nikolaou", 
        "givenName": "Christos", 
        "type": "Person"
      }, 
      {
        "familyName": "Stephanidis", 
        "givenName": "Constantine", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-49653-x_12", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-65101-7", 
        "978-3-540-49653-3"
      ], 
      "name": "Research and Advanced Technology for Digital Libraries", 
      "type": "Book"
    }, 
    "name": "Multilingual Information Retrieval Based on Document Alignment Techniques", 
    "pagination": "183-197", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-49653-x_12"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1035398397"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-49653-x_12", 
      "https://app.dimensions.ai/details/publication/pub.1035398397"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T05:42", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89789_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F3-540-49653-X_12"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-49653-x_12'


 

This table displays all metadata directly associated to this object as RDF triples.

98 TRIPLES      23 PREDICATES      32 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-49653-x_12 schema:about anzsrc-for:20
2 anzsrc-for:2004
3 schema:author N37c7512346fe448a924d23b2dcdb8694
4 schema:citation sg:pub.10.1007/978-1-4615-6163-7
5 https://doi.org/10.1108/eb026939
6 https://doi.org/10.1145/243199.243202
7 https://doi.org/10.1145/243199.243213
8 https://doi.org/10.1145/258525.258540
9 https://doi.org/10.1145/278459.258540
10 schema:datePublished 2002-03-15
11 schema:datePublishedReg 2002-03-15
12 schema:description A multilingual information retrieval method is presented where the user formulates the query in his/her preferred language to retrieve relevant information from a multilingual document collection. This multilingual retrieval method involves mono- and cross-language searches as well as merging their results. We adopt a corpus based approach where documents of different languages are associated if they cover a similar story. The resulting comparable corpus enables two novel techniques we have developed. First, it enables Cross-Language Information Retrieval (CLIR) which does not lack vocabulary coverage as we observed in the case of approaches that are based on automatic Machine Translation (MT). Second, aligned documents of this corpus facilitate to merge the results of mono- and cross-language searches. Using the TREC CLIR data, excellent results are obtained. In addition, our evaluation of the document alignments gives us new insights about the usefulness of comparable corpora.
13 schema:editor N9a8019baa16f45fcbce824435b068418
14 schema:genre chapter
15 schema:inLanguage en
16 schema:isAccessibleForFree false
17 schema:isPartOf N663469154e6e4d2db9fbd2407212774c
18 schema:name Multilingual Information Retrieval Based on Document Alignment Techniques
19 schema:pagination 183-197
20 schema:productId N3bdf8a871e984dac99958bb18c0dd4f9
21 N5b201db2fd154eb3855b7a92cf756efb
22 N801f6bbcb0f84603a87614423079dbd2
23 schema:publisher Nfa967ce2eb97484bbea99b415b7c0531
24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035398397
25 https://doi.org/10.1007/3-540-49653-x_12
26 schema:sdDatePublished 2019-04-16T05:42
27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
28 schema:sdPublisher N0561b26dd98f4fafaede6f967f9f9434
29 schema:url https://link.springer.com/10.1007%2F3-540-49653-X_12
30 sgo:license sg:explorer/license/
31 sgo:sdDataset chapters
32 rdf:type schema:Chapter
33 N0561b26dd98f4fafaede6f967f9f9434 schema:name Springer Nature - SN SciGraph project
34 rdf:type schema:Organization
35 N233b85748826479ab06e95ce5b16f639 schema:familyName Stephanidis
36 schema:givenName Constantine
37 rdf:type schema:Person
38 N37c7512346fe448a924d23b2dcdb8694 rdf:first sg:person.015363630667.99
39 rdf:rest Nd6251d151e994bedba8d38c4a102a3ed
40 N3bdf8a871e984dac99958bb18c0dd4f9 schema:name doi
41 schema:value 10.1007/3-540-49653-x_12
42 rdf:type schema:PropertyValue
43 N5b201db2fd154eb3855b7a92cf756efb schema:name readcube_id
44 schema:value a89f4106f36faa9e7111cbe8b6aec1c2e7e4c9f60638b9c6b2d78c74ccc54efc
45 rdf:type schema:PropertyValue
46 N663469154e6e4d2db9fbd2407212774c schema:isbn 978-3-540-49653-3
47 978-3-540-65101-7
48 schema:name Research and Advanced Technology for Digital Libraries
49 rdf:type schema:Book
50 N801f6bbcb0f84603a87614423079dbd2 schema:name dimensions_id
51 schema:value pub.1035398397
52 rdf:type schema:PropertyValue
53 N8910658bfc12453a80636fea3f8e9ac0 schema:familyName Nikolaou
54 schema:givenName Christos
55 rdf:type schema:Person
56 N9a8019baa16f45fcbce824435b068418 rdf:first N8910658bfc12453a80636fea3f8e9ac0
57 rdf:rest Ne96e46003f1f401badab009ec1e476d2
58 Nd6251d151e994bedba8d38c4a102a3ed rdf:first Nff2437df3cba4a87a252d14a1fa5f6fa
59 rdf:rest rdf:nil
60 Ne96e46003f1f401badab009ec1e476d2 rdf:first N233b85748826479ab06e95ce5b16f639
61 rdf:rest rdf:nil
62 Nfa967ce2eb97484bbea99b415b7c0531 schema:location Berlin, Heidelberg
63 schema:name Springer Berlin Heidelberg
64 rdf:type schema:Organisation
65 Nff2437df3cba4a87a252d14a1fa5f6fa schema:affiliation https://www.grid.ac/institutes/grid.5801.c
66 schema:familyName Scäuble
67 schema:givenName Peter
68 rdf:type schema:Person
69 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
70 schema:name Language, Communication and Culture
71 rdf:type schema:DefinedTerm
72 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
73 schema:name Linguistics
74 rdf:type schema:DefinedTerm
75 sg:person.015363630667.99 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
76 schema:familyName Braschler
77 schema:givenName Martin
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99
79 rdf:type schema:Person
80 sg:pub.10.1007/978-1-4615-6163-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028655623
81 https://doi.org/10.1007/978-1-4615-6163-7
82 rdf:type schema:CreativeWork
83 https://doi.org/10.1108/eb026939 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048761924
84 rdf:type schema:CreativeWork
85 https://doi.org/10.1145/243199.243202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048194338
86 rdf:type schema:CreativeWork
87 https://doi.org/10.1145/243199.243213 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047493755
88 rdf:type schema:CreativeWork
89 https://doi.org/10.1145/258525.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1098972593
90 rdf:type schema:CreativeWork
91 https://doi.org/10.1145/278459.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016936818
92 rdf:type schema:CreativeWork
93 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
94 schema:name Eurospider Information Technology AG, Schaffhauserstr. 18, CH-8006, Zürich, Switzerland
95 rdf:type schema:Organization
96 https://www.grid.ac/institutes/grid.5801.c schema:alternateName Swiss Federal Institute of Technology in Zurich
97 schema:name Swiss Federal Institute of Technology (ETH), CH-8092, Zürich, Switzerland
98 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...