Using Corpus-Based Approaches in a System for Multilingual Information Retrieval View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2000-10

AUTHORS

Martin Braschler, Peter Schäuble

ABSTRACT

We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison. More... »

PAGES

273-284

References to SciGraph publications

  • 2002-03-15. Multilingual Information Retrieval Based on Document Alignment Techniques in RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES
  • 1993. The Various Roles of Information Structures in INFORMATION AND CLASSIFICATION
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1023/a:1026525127581

    DOI

    http://dx.doi.org/10.1023/a:1026525127581

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1033079871


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/2004", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Linguistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/20", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Language, Communication and Culture", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Eurospider Information Technology (Switzerland)", 
              "id": "https://www.grid.ac/institutes/grid.433769.c", 
              "name": [
                "Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Z\u00fcrich, Switzerland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Braschler", 
            "givenName": "Martin", 
            "id": "sg:person.015363630667.99", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Eurospider Information Technology (Switzerland)", 
              "id": "https://www.grid.ac/institutes/grid.433769.c", 
              "name": [
                "Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Z\u00fcrich, Switzerland"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sch\u00e4uble", 
            "givenName": "Peter", 
            "id": "sg:person.0670254567.14", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0670254567.14"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/278459.258540", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016936818"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/290941.291017", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022973258"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243206", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1032416718"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-49653-x_12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035398397", 
              "https://doi.org/10.1007/3-540-49653-x_12"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-49653-x_12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035398397", 
              "https://doi.org/10.1007/3-540-49653-x_12"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243213", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047493755"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-50974-2_28", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1047839351", 
              "https://doi.org/10.1007/978-3-642-50974-2_28"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/243199.243202", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1048194338"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2000-10", 
        "datePublishedReg": "2000-10-01", 
        "description": "We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1023/a:1026525127581", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1023664", 
            "issn": [
              "1386-4564", 
              "1573-7659"
            ], 
            "name": "Information Retrieval Journal", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "3"
          }
        ], 
        "name": "Using Corpus-Based Approaches in a System for Multilingual Information Retrieval", 
        "pagination": "273-284", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "9d965e2d25dcccd86bb95210754c012c1a1706c452100100e13bf98846692cd9"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1023/a:1026525127581"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1033079871"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1023/a:1026525127581", 
          "https://app.dimensions.ai/details/publication/pub.1033079871"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T16:47", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8669_00000537.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1023%2FA%3A1026525127581"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1026525127581'


     

    This table displays all metadata directly associated to this object as RDF triples.

    91 TRIPLES      21 PREDICATES      34 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1023/a:1026525127581 schema:about anzsrc-for:20
    2 anzsrc-for:2004
    3 schema:author N0a4a5074049b4b1bae1ecbaf2ee31cdb
    4 schema:citation sg:pub.10.1007/3-540-49653-x_12
    5 sg:pub.10.1007/978-3-642-50974-2_28
    6 https://doi.org/10.1145/243199.243202
    7 https://doi.org/10.1145/243199.243206
    8 https://doi.org/10.1145/243199.243213
    9 https://doi.org/10.1145/278459.258540
    10 https://doi.org/10.1145/290941.291017
    11 schema:datePublished 2000-10
    12 schema:datePublishedReg 2000-10-01
    13 schema:description We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.
    14 schema:genre research_article
    15 schema:inLanguage en
    16 schema:isAccessibleForFree false
    17 schema:isPartOf N20f19b9b0b23401daa21e07b2689fba4
    18 Nf1b57975320d4b7182abc7457d79557d
    19 sg:journal.1023664
    20 schema:name Using Corpus-Based Approaches in a System for Multilingual Information Retrieval
    21 schema:pagination 273-284
    22 schema:productId N20378075937e4cbaaf5b278dce7ef92c
    23 Nbbe3abdb764a4c0faac9fa355fc66294
    24 Nfbfa034b0f23457e994d99a692962821
    25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033079871
    26 https://doi.org/10.1023/a:1026525127581
    27 schema:sdDatePublished 2019-04-10T16:47
    28 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    29 schema:sdPublisher Nbf73d472ee7642599fb8d9852862f49d
    30 schema:url http://link.springer.com/10.1023%2FA%3A1026525127581
    31 sgo:license sg:explorer/license/
    32 sgo:sdDataset articles
    33 rdf:type schema:ScholarlyArticle
    34 N0a4a5074049b4b1bae1ecbaf2ee31cdb rdf:first sg:person.015363630667.99
    35 rdf:rest Ne80d1e4c5f28490db859f61aada04d85
    36 N20378075937e4cbaaf5b278dce7ef92c schema:name doi
    37 schema:value 10.1023/a:1026525127581
    38 rdf:type schema:PropertyValue
    39 N20f19b9b0b23401daa21e07b2689fba4 schema:issueNumber 3
    40 rdf:type schema:PublicationIssue
    41 Nbbe3abdb764a4c0faac9fa355fc66294 schema:name dimensions_id
    42 schema:value pub.1033079871
    43 rdf:type schema:PropertyValue
    44 Nbf73d472ee7642599fb8d9852862f49d schema:name Springer Nature - SN SciGraph project
    45 rdf:type schema:Organization
    46 Ne80d1e4c5f28490db859f61aada04d85 rdf:first sg:person.0670254567.14
    47 rdf:rest rdf:nil
    48 Nf1b57975320d4b7182abc7457d79557d schema:volumeNumber 3
    49 rdf:type schema:PublicationVolume
    50 Nfbfa034b0f23457e994d99a692962821 schema:name readcube_id
    51 schema:value 9d965e2d25dcccd86bb95210754c012c1a1706c452100100e13bf98846692cd9
    52 rdf:type schema:PropertyValue
    53 anzsrc-for:20 schema:inDefinedTermSet anzsrc-for:
    54 schema:name Language, Communication and Culture
    55 rdf:type schema:DefinedTerm
    56 anzsrc-for:2004 schema:inDefinedTermSet anzsrc-for:
    57 schema:name Linguistics
    58 rdf:type schema:DefinedTerm
    59 sg:journal.1023664 schema:issn 1386-4564
    60 1573-7659
    61 schema:name Information Retrieval Journal
    62 rdf:type schema:Periodical
    63 sg:person.015363630667.99 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
    64 schema:familyName Braschler
    65 schema:givenName Martin
    66 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015363630667.99
    67 rdf:type schema:Person
    68 sg:person.0670254567.14 schema:affiliation https://www.grid.ac/institutes/grid.433769.c
    69 schema:familyName Schäuble
    70 schema:givenName Peter
    71 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0670254567.14
    72 rdf:type schema:Person
    73 sg:pub.10.1007/3-540-49653-x_12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035398397
    74 https://doi.org/10.1007/3-540-49653-x_12
    75 rdf:type schema:CreativeWork
    76 sg:pub.10.1007/978-3-642-50974-2_28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047839351
    77 https://doi.org/10.1007/978-3-642-50974-2_28
    78 rdf:type schema:CreativeWork
    79 https://doi.org/10.1145/243199.243202 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048194338
    80 rdf:type schema:CreativeWork
    81 https://doi.org/10.1145/243199.243206 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032416718
    82 rdf:type schema:CreativeWork
    83 https://doi.org/10.1145/243199.243213 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047493755
    84 rdf:type schema:CreativeWork
    85 https://doi.org/10.1145/278459.258540 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016936818
    86 rdf:type schema:CreativeWork
    87 https://doi.org/10.1145/290941.291017 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022973258
    88 rdf:type schema:CreativeWork
    89 https://www.grid.ac/institutes/grid.433769.c schema:alternateName Eurospider Information Technology (Switzerland)
    90 schema:name Eurospider Information Technology AG, Schaffhauserstrasse 18, CH-8006, Zürich, Switzerland
    91 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...