Semantic textual similarity between sentences using bilingual word semantics View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2019-03-09

AUTHORS

Md. Shajalal, Masaki Aono

ABSTRACT

Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods. More... »

PAGES

1-10

References to SciGraph publications

  • 2007. Similarity Measures for Short Segments of Text in ADVANCES IN INFORMATION RETRIEVAL
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4

    DOI

    http://dx.doi.org/10.1007/s13748-019-00180-4

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1112672612


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1702", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Cognitive Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Psychology and Cognitive Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Bangladesh Agricultural University", 
              "id": "https://www.grid.ac/institutes/grid.411511.1", 
              "name": [
                "Department of Computer Science and Mathematics, Bangladesh Agricultural University, 2202, Mymensingh, Bangladesh"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Shajalal", 
            "givenName": "Md.", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Toyohashi University of Technology", 
              "id": "https://www.grid.ac/institutes/grid.412804.b", 
              "name": [
                "Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Aono", 
            "givenName": "Masaki", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/2806416.2806475", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011865814"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/2388676.2388784", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022038979"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-540-71496-5_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030086294", 
              "https://doi.org/10.1007/978-3-540-71496-5_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.eswa.2008.11.022", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033320061"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1111/j.1467-9868.2011.00771.x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035785610"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1111/j.1467-9868.2005.00503.x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043971564"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tkde.2006.130", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061661517"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1561/1500000035", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1068001295"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/iccv.2015.474", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094714066"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s15-2022", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099096237"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s15-2045", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099096261"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s16-1081", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099151417"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2001", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731697"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2019", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731715"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2021", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731717"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2026", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731722"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.18653/v1/s17-2030", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100731726"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2019-03-09", 
        "datePublishedReg": "2019-03-09", 
        "description": "Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1007/s13748-019-00180-4", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1136525", 
            "issn": [
              "2192-6352", 
              "2192-6360"
            ], 
            "name": "Progress in Artificial Intelligence", 
            "type": "Periodical"
          }
        ], 
        "name": "Semantic textual similarity between sentences using bilingual word semantics", 
        "pagination": "1-10", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "48e656ba99922bca08ae20730377794efd0fbfbae1f63785f35297bc06fe510f"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s13748-019-00180-4"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1112672612"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s13748-019-00180-4", 
          "https://app.dimensions.ai/details/publication/pub.1112672612"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T11:18", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000354_0000000354/records_11701_00000002.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1007%2Fs13748-019-00180-4"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s13748-019-00180-4'


     

    This table displays all metadata directly associated to this object as RDF triples.

    115 TRIPLES      21 PREDICATES      41 URIs      16 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s13748-019-00180-4 schema:about anzsrc-for:17
    2 anzsrc-for:1702
    3 schema:author N4ff361548f2741c7a455b23fc3a075ee
    4 schema:citation sg:pub.10.1007/978-3-540-71496-5_5
    5 https://doi.org/10.1016/j.eswa.2008.11.022
    6 https://doi.org/10.1109/iccv.2015.474
    7 https://doi.org/10.1109/tkde.2006.130
    8 https://doi.org/10.1111/j.1467-9868.2005.00503.x
    9 https://doi.org/10.1111/j.1467-9868.2011.00771.x
    10 https://doi.org/10.1145/2388676.2388784
    11 https://doi.org/10.1145/2806416.2806475
    12 https://doi.org/10.1561/1500000035
    13 https://doi.org/10.18653/v1/s15-2022
    14 https://doi.org/10.18653/v1/s15-2045
    15 https://doi.org/10.18653/v1/s16-1081
    16 https://doi.org/10.18653/v1/s17-2001
    17 https://doi.org/10.18653/v1/s17-2019
    18 https://doi.org/10.18653/v1/s17-2021
    19 https://doi.org/10.18653/v1/s17-2026
    20 https://doi.org/10.18653/v1/s17-2030
    21 schema:datePublished 2019-03-09
    22 schema:datePublishedReg 2019-03-09
    23 schema:description Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.
    24 schema:genre research_article
    25 schema:inLanguage en
    26 schema:isAccessibleForFree false
    27 schema:isPartOf sg:journal.1136525
    28 schema:name Semantic textual similarity between sentences using bilingual word semantics
    29 schema:pagination 1-10
    30 schema:productId N5742f0c14e164efd95045c874a6443b8
    31 N873ee04718944443bdd5a52d4af0fa95
    32 Na0ef9e1183d44b80ab118de11fd8abe2
    33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112672612
    34 https://doi.org/10.1007/s13748-019-00180-4
    35 schema:sdDatePublished 2019-04-11T11:18
    36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    37 schema:sdPublisher N9bbea8e60bd84669a59910b8276893d5
    38 schema:url https://link.springer.com/10.1007%2Fs13748-019-00180-4
    39 sgo:license sg:explorer/license/
    40 sgo:sdDataset articles
    41 rdf:type schema:ScholarlyArticle
    42 N35e4b9031ad74cc09e7e016c060dd236 schema:affiliation https://www.grid.ac/institutes/grid.412804.b
    43 schema:familyName Aono
    44 schema:givenName Masaki
    45 rdf:type schema:Person
    46 N4ff361548f2741c7a455b23fc3a075ee rdf:first N84b5ba40420d4f158421b02ba77074ef
    47 rdf:rest N68970c85d2fc428099edf3674fa3bbbe
    48 N5742f0c14e164efd95045c874a6443b8 schema:name dimensions_id
    49 schema:value pub.1112672612
    50 rdf:type schema:PropertyValue
    51 N68970c85d2fc428099edf3674fa3bbbe rdf:first N35e4b9031ad74cc09e7e016c060dd236
    52 rdf:rest rdf:nil
    53 N84b5ba40420d4f158421b02ba77074ef schema:affiliation https://www.grid.ac/institutes/grid.411511.1
    54 schema:familyName Shajalal
    55 schema:givenName Md.
    56 rdf:type schema:Person
    57 N873ee04718944443bdd5a52d4af0fa95 schema:name doi
    58 schema:value 10.1007/s13748-019-00180-4
    59 rdf:type schema:PropertyValue
    60 N9bbea8e60bd84669a59910b8276893d5 schema:name Springer Nature - SN SciGraph project
    61 rdf:type schema:Organization
    62 Na0ef9e1183d44b80ab118de11fd8abe2 schema:name readcube_id
    63 schema:value 48e656ba99922bca08ae20730377794efd0fbfbae1f63785f35297bc06fe510f
    64 rdf:type schema:PropertyValue
    65 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
    66 schema:name Psychology and Cognitive Sciences
    67 rdf:type schema:DefinedTerm
    68 anzsrc-for:1702 schema:inDefinedTermSet anzsrc-for:
    69 schema:name Cognitive Sciences
    70 rdf:type schema:DefinedTerm
    71 sg:journal.1136525 schema:issn 2192-6352
    72 2192-6360
    73 schema:name Progress in Artificial Intelligence
    74 rdf:type schema:Periodical
    75 sg:pub.10.1007/978-3-540-71496-5_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030086294
    76 https://doi.org/10.1007/978-3-540-71496-5_5
    77 rdf:type schema:CreativeWork
    78 https://doi.org/10.1016/j.eswa.2008.11.022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033320061
    79 rdf:type schema:CreativeWork
    80 https://doi.org/10.1109/iccv.2015.474 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094714066
    81 rdf:type schema:CreativeWork
    82 https://doi.org/10.1109/tkde.2006.130 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061661517
    83 rdf:type schema:CreativeWork
    84 https://doi.org/10.1111/j.1467-9868.2005.00503.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1043971564
    85 rdf:type schema:CreativeWork
    86 https://doi.org/10.1111/j.1467-9868.2011.00771.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1035785610
    87 rdf:type schema:CreativeWork
    88 https://doi.org/10.1145/2388676.2388784 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022038979
    89 rdf:type schema:CreativeWork
    90 https://doi.org/10.1145/2806416.2806475 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011865814
    91 rdf:type schema:CreativeWork
    92 https://doi.org/10.1561/1500000035 schema:sameAs https://app.dimensions.ai/details/publication/pub.1068001295
    93 rdf:type schema:CreativeWork
    94 https://doi.org/10.18653/v1/s15-2022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099096237
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.18653/v1/s15-2045 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099096261
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.18653/v1/s16-1081 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099151417
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.18653/v1/s17-2001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731697
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.18653/v1/s17-2019 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731715
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.18653/v1/s17-2021 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731717
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.18653/v1/s17-2026 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731722
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.18653/v1/s17-2030 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100731726
    109 rdf:type schema:CreativeWork
    110 https://www.grid.ac/institutes/grid.411511.1 schema:alternateName Bangladesh Agricultural University
    111 schema:name Department of Computer Science and Mathematics, Bangladesh Agricultural University, 2202, Mymensingh, Bangladesh
    112 rdf:type schema:Organization
    113 https://www.grid.ac/institutes/grid.412804.b schema:alternateName Toyohashi University of Technology
    114 schema:name Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan
    115 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...