Semantic textual similarity between sentences using bilingual word semantics View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2019-03-09

AUTHORS

Md. Shajalal, Masaki Aono

ABSTRACT

Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods. More... »

PAGES

1-10

References to SciGraph publications

  • 2007. Similarity Measures for Short Segments of Text in ADVANCES IN INFORMATION RETRIEVAL
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4

    DOI

    http://dx.doi.org/10.1007/s13748-019-00180-4

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1112672612


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1702", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Cognitive Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Psychology and Cognitive Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Bangladesh Agricultural University", 
              "id": "https://www.grid.ac/institutes/grid.411511.1", 
              "name": [
                "Department of Computer Science and Mathematics, Bangladesh Agricultural University, 2202, Mymensingh, Bangladesh"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Shajalal", 
            "givenName": "Md.", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Toyohashi University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412804.b", 
              "name": [
                "Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Aono", 
            "givenName": "Masaki", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/2806416.2806475", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011865814"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2388676.2388784", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022038979"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-71496-5_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030086294", 
              "https://doi.org/10.1007/978-3-540-71496-5_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.eswa.2008.11.022", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033320061"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1111/j.1467-9868.2011.00771.x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035785610"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1111/j.1467-9868.2005.00503.x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043971564"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tkde.2006.130", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061661517"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1561/1500000035", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1068001295"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/iccv.2015.474", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094714066"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s15-2022", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099096237"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s15-2045", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099096261"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s16-1081", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099151417"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2001", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731697"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2019", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731715"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2021", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731717"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2026", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731722"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2030", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731726"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2019-03-09", 
        "datePublishedReg": "2019-03-09", 
        "description": "Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1007/s13748-019-00180-4", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1136525", 
            "issn": [
              "2192-6352", 
              "2192-6360"
            ], 
            "name": "Progress in Artificial Intelligence", 
            "type": "Periodical"
          }
        ], 
        "name": "Semantic textual similarity between sentences using bilingual word semantics", 
        "pagination": "1-10", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "48e656ba99922bca08ae20730377794efd0fbfbae1f63785f35297bc06fe510f"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s13748-019-00180-4"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1112672612"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s13748-019-00180-4", 
          "https://app.dimensions.ai/details/publication/pub.1112672612"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T11:18", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000354_0000000354/records_11701_00000002.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1007%2Fs13748-019-00180-4"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'


     

    This table displays all metadata directly associated to this object as RDF triples.

    115 TRIPLES      21 PREDICATES      41 URIs      16 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s13748-019-00180-4 schema:about anzsrc-for:17
    2 anzsrc-for:1702
    3 schema:author N377116cc3ba84ddc90ff08e15e9292e4
    4 schema:citation sg:pub.10.1007/978-3-540-71496-5_5
    5 https://doi.org/10.1016/j.eswa.2008.11.022
    6 https://doi.org/10.1109/iccv.2015.474
    7 https://doi.org/10.1109/tkde.2006.130
    8 https://doi.org/10.1111/j.1467-9868.2005.00503.x
    9 https://doi.org/10.1111/j.1467-9868.2011.00771.x
    10 https://doi.org/10.1145/2388676.2388784
    11 https://doi.org/10.1145/2806416.2806475
    12 https://doi.org/10.1561/1500000035
    13 https://doi.org/10.18653/v1/s15-2022
    14 https://doi.org/10.18653/v1/s15-2045
    15 https://doi.org/10.18653/v1/s16-1081
    16 https://doi.org/10.18653/v1/s17-2001
    17 https://doi.org/10.18653/v1/s17-2019
    18 https://doi.org/10.18653/v1/s17-2021
    19 https://doi.org/10.18653/v1/s17-2026
    20 https://doi.org/10.18653/v1/s17-2030
    21 schema:datePublished 2019-03-09
    22 schema:datePublishedReg 2019-03-09
    23 schema:description Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.
    24 schema:genre research_article
    25 schema:inLanguage en
    26 schema:isAccessibleForFree false
    27 schema:isPartOf sg:journal.1136525
    28 schema:name Semantic textual similarity between sentences using bilingual word semantics
    29 schema:pagination 1-10
    30 schema:productId N2205522699464984b35a21489777c2a5
    31 N53af9ed1551a4c9e90c28b3891568bb5
    32 N95b3276c5881415480415539105e477b
    33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112672612
    34 https://doi.org/10.1007/s13748-019-00180-4
    35 schema:sdDatePublished 2019-04-11T11:18
    36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    37 schema:sdPublisher N7d108525f1424edf981d86ccaf2f0bc2
    38 schema:url https://link.springer.com/10.1007%2Fs13748-019-00180-4
    39 sgo:license sg:explorer/license/
    40 sgo:sdDataset articles
    41 rdf:type schema:ScholarlyArticle
    42 N146b3cb98c714d66984a49c80f4ebcd0 schema:affiliation https://www.grid.ac/institutes/grid.412804.b
    43 schema:familyName Aono
    44 schema:givenName Masaki
    45 rdf:type schema:Person
    46 N2205522699464984b35a21489777c2a5 schema:name readcube_id
    47 schema:value 48e656ba99922bca08ae20730377794efd0fbfbae1f63785f35297bc06fe510f
    48 rdf:type schema:PropertyValue
    49 N2e33e426eadc424aa6f7de6cdf7e6c4b schema:affiliation https://www.grid.ac/institutes/grid.411511.1
    50 schema:familyName Shajalal
    51 schema:givenName Md.
    52 rdf:type schema:Person
    53 N377116cc3ba84ddc90ff08e15e9292e4 rdf:first N2e33e426eadc424aa6f7de6cdf7e6c4b
    54 rdf:rest N54b5e26de15446debca7b314f2dafca7
    55 N53af9ed1551a4c9e90c28b3891568bb5 schema:name dimensions_id
    56 schema:value pub.1112672612
    57 rdf:type schema:PropertyValue
    58 N54b5e26de15446debca7b314f2dafca7 rdf:first N146b3cb98c714d66984a49c80f4ebcd0
    59 rdf:rest rdf:nil
    60 N7d108525f1424edf981d86ccaf2f0bc2 schema:name Springer Nature - SN SciGraph project
    61 rdf:type schema:Organization
    62 N95b3276c5881415480415539105e477b schema:name doi
    63 schema:value 10.1007/s13748-019-00180-4
    64 rdf:type schema:PropertyValue
    65 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
    66 schema:name Psychology and Cognitive Sciences
    67 rdf:type schema:DefinedTerm
    68 anzsrc-for:1702 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Cognitive Sciences
    70 rdf:type schema:DefinedTerm
    71 sg:journal.1136525 schema:issn 2192-6352
    72 2192-6360
    73 schema:name Progress in Artificial Intelligence
    74 rdf:type schema:Periodical
    75 sg:pub.10.1007/978-3-540-71496-5_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030086294
    76 https://doi.org/10.1007/978-3-540-71496-5_5
    77 rdf:type schema:CreativeWork
    78 https://doi.org/10.1016/j.eswa.2008.11.022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033320061
    79 rdf:type schema:CreativeWork
    80 https://doi.org/10.1109/iccv.2015.474 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094714066
    81 rdf:type schema:CreativeWork
    82 https://doi.org/10.1109/tkde.2006.130 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061661517
    83 rdf:type schema:CreativeWork
    84 https://doi.org/10.1111/j.1467-9868.2005.00503.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1043971564
    85 rdf:type schema:CreativeWork
    86 https://doi.org/10.1111/j.1467-9868.2011.00771.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1035785610
    87 rdf:type schema:CreativeWork
    88 https://doi.org/10.1145/2388676.2388784 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022038979
    89 rdf:type schema:CreativeWork
    90 https://doi.org/10.1145/2806416.2806475 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011865814
    91 rdf:type schema:CreativeWork
    92 https://doi.org/10.1561/1500000035 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068001295
    93 rdf:type schema:CreativeWork
    94 https://doi.org/10.18653/v1/s15-2022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099096237
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.18653/v1/s15-2045 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099096261
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.18653/v1/s16-1081 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099151417
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.18653/v1/s17-2001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731697
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.18653/v1/s17-2019 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731715
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.18653/v1/s17-2021 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731717
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.18653/v1/s17-2026 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731722
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.18653/v1/s17-2030 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731726
    109 rdf:type schema:CreativeWork
    110 https://www.grid.ac/institutes/grid.411511.1 schema:alternateName Bangladesh Agricultural University
    111 schema:name Department of Computer Science and Mathematics, Bangladesh Agricultural University, 2202, Mymensingh, Bangladesh
    112 rdf:type schema:Organization
    113 https://www.grid.ac/institutes/grid.412804.b schema:alternateName Toyohashi University of Technology
    114 schema:name Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan
    115 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...