Using Corpus-Based Approaches in a System for Multilingual Information Retrieval View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2000-10

AUTHORS

Martin Braschler, Peter Schäuble

ABSTRACT

We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison. More... »

PAGES

273-284

References to SciGraph publications

  • 2002-03-15. Multilingual Information Retrieval Based on Document Alignment Techniques in RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES
  • 1993. The Various Roles of Information Structures in INFORMATION AND CLASSIFICATION
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1023/a:1026525127581

    DOI

    http://dx.doi.org/10.1023/a:1026525127581

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1033079871


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Linguistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Language, Communication and Culture", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Eurospider Information Technology (Switzerland)", 
              "id": "https://www.grid.ac/institutes/grid.433769.c", 
              "name": [
                "Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Z\u00fcrich, Switzerland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Braschler", 
            "givenName": "Martin", 
            "id": "sg:person.015363630667.99", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Eurospider Information Technology (Switzerland)", 
              "id": "https://www.grid.ac/institutes/grid.433769.c", 
              "name": [
                "Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Z\u00fcrich, Switzerland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sch\u00e4uble", 
            "givenName": "Peter", 
            "id": "sg:person.0670254567.14", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0670254567.14"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/278459.258540", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016936818"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/290941.291017", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022973258"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243206", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032416718"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-49653-x_12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035398397", 
              "https://doi.org/10.1007/3-540-49653-x_12"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-49653-x_12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035398397", 
              "https://doi.org/10.1007/3-540-49653-x_12"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243213", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047493755"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-50974-2_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047839351", 
              "https://doi.org/10.1007/978-3-642-50974-2_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243202", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1048194338"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2000-10", 
        "datePublishedReg": "2000-10-01", 
        "description": "We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1023/a:1026525127581", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1023664", 
            "issn": [
              "1386-4564", 
              "1573-7659"
            ], 
            "name": "Information Retrieval Journal", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "3"
          }
        ], 
        "name": "Using Corpus-Based Approaches in a System for Multilingual Information Retrieval", 
        "pagination": "273-284", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "9d965e2d25dcccd86bb95210754c012c1a1706c452100100e13bf98846692cd9"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1023/a:1026525127581"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1033079871"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1023/a:1026525127581", 
          "https://app.dimensions.ai/details/publication/pub.1033079871"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T16:47", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8669_00000537.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1023%2FA%3A1026525127581"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'


     

    This table displays all metadata directly associated to this object as RDF triples.

    91 TRIPLES      21 PREDICATES      34 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1023/a:1026525127581 schema:about anzsrc-for:20
    2 anzsrc-for:2004
    3 schema:author N8d2b78e146b2462bb2e0b62eb4ddfdd7
    4 schema:citation sg:pub.10.1007/3-540-49653-x_12
    5 sg:pub.10.1007/978-3-642-50974-2_28
    6 https://doi.org/10.1145/243199.243202
    7 https://doi.org/10.1145/243199.243206
    8 https://doi.org/10.1145/243199.243213
    9 https://doi.org/10.1145/278459.258540
    10 https://doi.org/10.1145/290941.291017
    11 schema:datePublished 2000-10
    12 schema:datePublishedReg 2000-10-01
    13 schema:description We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.
    14 schema:genre research_article
    15 schema:inLanguage en
    16 schema:isAccessibleForFree false
    17 schema:isPartOf N0a97e10d34b346549ff010f3231e43e6
    18 Na5329c92b11046e185043d50d1a2252b
    19 sg:journal.1023664
    20 schema:name Using Corpus-Based Approaches in a System for Multilingual Information Retrieval
    21 schema:pagination 273-284
    22 schema:productId N43944b6b465349a591a6cf51f0e07015
    23 N4debd6b2bcd449c8a7bbc68772f0e55b
    24 Nbc42c6de10cc48ea978c6545338eaf23
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033079871
    26 https://doi.org/10.1023/a:1026525127581
    27 schema:sdDatePublished 2019-04-10T16:47
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher N2d3cdf3467534391b052887abaef18c6
    30 schema:url http://link.springer.com/10.1023%2FA%3A1026525127581
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset articles
    33 rdf:type schema:ScholarlyArticle
    34 N0a97e10d34b346549ff010f3231e43e6 schema:issueNumber 3
    35 rdf:type schema:PublicationIssue
    36 N2d3cdf3467534391b052887abaef18c6 schema:name Springer Nature - SN SciGraph project
    37 rdf:type schema:Organization
    38 N43944b6b465349a591a6cf51f0e07015 schema:name readcube_id
    39 schema:value 9d965e2d25dcccd86bb95210754c012c1a1706c452100100e13bf98846692cd9
    40 rdf:type schema:PropertyValue
    41 N4debd6b2bcd449c8a7bbc68772f0e55b schema:name doi
    42 schema:value 10.1023/a:1026525127581
    43 rdf:type schema:PropertyValue
    44 N8d2b78e146b2462bb2e0b62eb4ddfdd7 rdf:first sg:person.015363630667.99
    45 rdf:rest Nc6d37b276ade43389abab1479fddd30a
    46 Na5329c92b11046e185043d50d1a2252b schema:volumeNumber 3
    47 rdf:type schema:PublicationVolume
    48 Nbc42c6de10cc48ea978c6545338eaf23 schema:name dimensions_id
    49 schema:value pub.1033079871
    50 rdf:type schema:PropertyValue
    51 Nc6d37b276ade43389abab1479fddd30a rdf:first sg:person.0670254567.14
    52 rdf:rest rdf:nil
    53 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
    54 schema:name Language, Communication and Culture
    55 rdf:type schema:DefinedTerm
    56 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
    57 schema:name Linguistics
    58 rdf:type schema:DefinedTerm
    59 sg:journal.1023664 schema:issn 1386-4564
    60 1573-7659
    61 schema:name Information Retrieval Journal
    62 rdf:type schema:Periodical
    63 sg:person.015363630667.99 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
    64 schema:familyName Braschler
    65 schema:givenName Martin
    66 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99
    67 rdf:type schema:Person
    68 sg:person.0670254567.14 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
    69 schema:familyName Schäuble
    70 schema:givenName Peter
    71 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0670254567.14
    72 rdf:type schema:Person
    73 sg:pub.10.1007/3-540-49653-x_12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035398397
    74 https://doi.org/10.1007/3-540-49653-x_12
    75 rdf:type schema:CreativeWork
    76 sg:pub.10.1007/978-3-642-50974-2_28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047839351
    77 https://doi.org/10.1007/978-3-642-50974-2_28
    78 rdf:type schema:CreativeWork
    79 https://doi.org/10.1145/243199.243202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048194338
    80 rdf:type schema:CreativeWork
    81 https://doi.org/10.1145/243199.243206 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032416718
    82 rdf:type schema:CreativeWork
    83 https://doi.org/10.1145/243199.243213 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047493755
    84 rdf:type schema:CreativeWork
    85 https://doi.org/10.1145/278459.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016936818
    86 rdf:type schema:CreativeWork
    87 https://doi.org/10.1145/290941.291017 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022973258
    88 rdf:type schema:CreativeWork
    89 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
    90 schema:name Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Zürich, Switzerland
    91 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...