Understanding Synonymous Referring Expressions via Contrastive Features View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2022-08-09

AUTHORS

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

ABSTRACT

Referring expression comprehension aims to localize objects identified by natural language descriptions. This is a challenging task as it requires understanding of both visual and language domains. One nature is that each object can be described by synonymous sentences with paraphrases, and such varieties in languages have critical impact on learning a comprehension model. While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences. To this end, we develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels, where features extracted from synonymous sentences to describe the same object should be closer to each other after mapping to the visual domain. We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets, and demonstrate that our method performs favorably against the state-of-the-art approaches. Furthermore, since the varieties in expressions become larger across datasets when they describe objects in different ways, we present the cross-dataset and transfer learning settings to validate the ability of our learned transferable features. More... »

PAGES

2501-2516

References to SciGraph publications

  • 2020-09-24. UNITER: UNiversal Image-TExt Representation Learning in COMPUTER VISION – ECCV 2020
  • 2016-09-17. Grounding of Textual Phrases in Images by Reconstruction in COMPUTER VISION – ECCV 2016
  • 2020-11-13. Propagating Over Phrase Relations for One-Stage Visual Grounding in COMPUTER VISION – ECCV 2020
  • 2020-09-24. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks in COMPUTER VISION – ECCV 2020
  • 2020-11-13. Improving One-Stage Visual Grounding by Recursive Sub-query Construction in COMPUTER VISION – ECCV 2020
  • 2016-09-17. Modeling Context Between Objects for Referring Expression Understanding in COMPUTER VISION – ECCV 2016
  • 2016-09-17. Modeling Context in Referring Expressions in COMPUTER VISION – ECCV 2016
  • 2014. Microsoft COCO: Common Objects in Context in COMPUTER VISION – ECCV 2014
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s11263-022-01647-z

    DOI

    http://dx.doi.org/10.1007/s11263-022-01647-z

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1150098893


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of California, Merced, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.266096.d", 
              "name": [
                "University of California, Merced, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chen", 
            "givenName": "Yi-Wen", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Phiar, Redwood City, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.505069.f", 
              "name": [
                "Phiar, Redwood City, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Tsai", 
            "givenName": "Yi-Hsuan", 
            "id": "sg:person.013455650265.29", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013455650265.29"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of California, Merced, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.266096.d", 
              "name": [
                "University of California, Merced, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Yang", 
            "givenName": "Ming-Hsuan", 
            "id": "sg:person.014034722075.82", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014034722075.82"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-319-46475-6_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042471266", 
              "https://doi.org/10.1007/978-3-319-46475-6_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-030-58529-7_35", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1132590143", 
              "https://doi.org/10.1007/978-3-030-58529-7_35"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-030-58568-6_23", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1132591258", 
              "https://doi.org/10.1007/978-3-030-58568-6_23"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-030-58577-8_7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1131136428", 
              "https://doi.org/10.1007/978-3-030-58577-8_7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-46448-0_49", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1039098005", 
              "https://doi.org/10.1007/978-3-319-46448-0_49"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-46493-0_48", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1029888027", 
              "https://doi.org/10.1007/978-3-319-46493-0_48"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-10602-1_48", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1045321436", 
              "https://doi.org/10.1007/978-3-319-10602-1_48"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-030-58577-8_8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1131132046", 
              "https://doi.org/10.1007/978-3-030-58577-8_8"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2022-08-09", 
        "datePublishedReg": "2022-08-09", 
        "description": "Referring expression comprehension aims to localize objects identified by natural language descriptions. This is a challenging task as it requires understanding of both visual and language domains. One nature is that each object can be described by synonymous sentences with paraphrases, and such varieties in languages have critical impact on learning a comprehension model. While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences. To this end, we develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels, where features extracted from synonymous sentences to describe the same object should be closer to each other after mapping to the visual domain. We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets, and demonstrate that our method performs favorably against the state-of-the-art approaches. Furthermore, since the varieties in expressions become larger across datasets when they describe objects in different ways, we present the cross-dataset and transfer learning settings to validate the ability of our learned transferable features.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s11263-022-01647-z", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.3132922", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1032807", 
            "issn": [
              "0920-5691", 
              "1573-1405"
            ], 
            "name": "International Journal of Computer Vision", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "10", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "130"
          }
        ], 
        "keywords": [
          "comprehension model", 
          "contrastive features", 
          "visual domain", 
          "expression comprehension", 
          "language domains", 
          "referring expressions", 
          "natural language descriptions", 
          "same object", 
          "end trainable framework", 
          "sentences", 
          "language descriptions", 
          "trainable framework", 
          "prior work", 
          "objects", 
          "comprehension", 
          "benchmark datasets", 
          "transferable features", 
          "task", 
          "different ways", 
          "art approaches", 
          "language", 
          "Extensive experiments", 
          "instance level", 
          "challenging task", 
          "domain", 
          "paraphrases", 
          "ability", 
          "critical impact", 
          "features", 
          "understanding", 
          "dataset", 
          "setting", 
          "model", 
          "variety", 
          "way", 
          "framework", 
          "nature", 
          "impact", 
          "images", 
          "experiments", 
          "work", 
          "end", 
          "levels", 
          "approach", 
          "description", 
          "such varieties", 
          "state", 
          "expression", 
          "method", 
          "transfer", 
          "algorithm", 
          "synonymous sentences", 
          "properties"
        ], 
        "name": "Understanding Synonymous Referring Expressions via Contrastive Features", 
        "pagination": "2501-2516", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1150098893"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s11263-022-01647-z"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s11263-022-01647-z", 
          "https://app.dimensions.ai/details/publication/pub.1150098893"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-11-24T21:09", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/article/article_939.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s11263-022-01647-z"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11263-022-01647-z'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11263-022-01647-z'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11263-022-01647-z'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11263-022-01647-z'


     

    This table displays all metadata directly associated to this object as RDF triples.

    160 TRIPLES      21 PREDICATES      85 URIs      69 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s11263-022-01647-z schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N6dff4c4872bb4cf0b2c704ab7f3e00f0
    4 schema:citation sg:pub.10.1007/978-3-030-58529-7_35
    5 sg:pub.10.1007/978-3-030-58568-6_23
    6 sg:pub.10.1007/978-3-030-58577-8_7
    7 sg:pub.10.1007/978-3-030-58577-8_8
    8 sg:pub.10.1007/978-3-319-10602-1_48
    9 sg:pub.10.1007/978-3-319-46448-0_49
    10 sg:pub.10.1007/978-3-319-46475-6_5
    11 sg:pub.10.1007/978-3-319-46493-0_48
    12 schema:datePublished 2022-08-09
    13 schema:datePublishedReg 2022-08-09
    14 schema:description Referring expression comprehension aims to localize objects identified by natural language descriptions. This is a challenging task as it requires understanding of both visual and language domains. One nature is that each object can be described by synonymous sentences with paraphrases, and such varieties in languages have critical impact on learning a comprehension model. While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences. To this end, we develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels, where features extracted from synonymous sentences to describe the same object should be closer to each other after mapping to the visual domain. We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets, and demonstrate that our method performs favorably against the state-of-the-art approaches. Furthermore, since the varieties in expressions become larger across datasets when they describe objects in different ways, we present the cross-dataset and transfer learning settings to validate the ability of our learned transferable features.
    15 schema:genre article
    16 schema:isAccessibleForFree true
    17 schema:isPartOf N0fe329faf23f478f9f809bb37e819944
    18 Nc24dce7dbf0441d3886fc9937f7b20f3
    19 sg:journal.1032807
    20 schema:keywords Extensive experiments
    21 ability
    22 algorithm
    23 approach
    24 art approaches
    25 benchmark datasets
    26 challenging task
    27 comprehension
    28 comprehension model
    29 contrastive features
    30 critical impact
    31 dataset
    32 description
    33 different ways
    34 domain
    35 end
    36 end trainable framework
    37 experiments
    38 expression
    39 expression comprehension
    40 features
    41 framework
    42 images
    43 impact
    44 instance level
    45 language
    46 language descriptions
    47 language domains
    48 levels
    49 method
    50 model
    51 natural language descriptions
    52 nature
    53 objects
    54 paraphrases
    55 prior work
    56 properties
    57 referring expressions
    58 same object
    59 sentences
    60 setting
    61 state
    62 such varieties
    63 synonymous sentences
    64 task
    65 trainable framework
    66 transfer
    67 transferable features
    68 understanding
    69 variety
    70 visual domain
    71 way
    72 work
    73 schema:name Understanding Synonymous Referring Expressions via Contrastive Features
    74 schema:pagination 2501-2516
    75 schema:productId N10f00a8eb31542feae86340c9bb8828c
    76 Ndababb8278f54ea8a29e6cdd1bdd8905
    77 schema:sameAs https://app.dimensions.ai/details/publication/pub.1150098893
    78 https://doi.org/10.1007/s11263-022-01647-z
    79 schema:sdDatePublished 2022-11-24T21:09
    80 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    81 schema:sdPublisher Nfc68f1019ac94dccb6b8bd638baa6036
    82 schema:url https://doi.org/10.1007/s11263-022-01647-z
    83 sgo:license sg:explorer/license/
    84 sgo:sdDataset articles
    85 rdf:type schema:ScholarlyArticle
    86 N0fe329faf23f478f9f809bb37e819944 schema:volumeNumber 130
    87 rdf:type schema:PublicationVolume
    88 N10f00a8eb31542feae86340c9bb8828c schema:name dimensions_id
    89 schema:value pub.1150098893
    90 rdf:type schema:PropertyValue
    91 N18c3276baefa4b61b3a85d834d1b6f03 rdf:first sg:person.013455650265.29
    92 rdf:rest N8e39e355b347403eb0fe63cf6df0625b
    93 N4335cc3b853145bdbc0b8eb4ed470881 schema:affiliation grid-institutes:grid.266096.d
    94 schema:familyName Chen
    95 schema:givenName Yi-Wen
    96 rdf:type schema:Person
    97 N6dff4c4872bb4cf0b2c704ab7f3e00f0 rdf:first N4335cc3b853145bdbc0b8eb4ed470881
    98 rdf:rest N18c3276baefa4b61b3a85d834d1b6f03
    99 N8e39e355b347403eb0fe63cf6df0625b rdf:first sg:person.014034722075.82
    100 rdf:rest rdf:nil
    101 Nc24dce7dbf0441d3886fc9937f7b20f3 schema:issueNumber 10
    102 rdf:type schema:PublicationIssue
    103 Ndababb8278f54ea8a29e6cdd1bdd8905 schema:name doi
    104 schema:value 10.1007/s11263-022-01647-z
    105 rdf:type schema:PropertyValue
    106 Nfc68f1019ac94dccb6b8bd638baa6036 schema:name Springer Nature - SN SciGraph project
    107 rdf:type schema:Organization
    108 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    109 schema:name Information and Computing Sciences
    110 rdf:type schema:DefinedTerm
    111 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    112 schema:name Artificial Intelligence and Image Processing
    113 rdf:type schema:DefinedTerm
    114 sg:grant.3132922 http://pending.schema.org/fundedItem sg:pub.10.1007/s11263-022-01647-z
    115 rdf:type schema:MonetaryGrant
    116 sg:journal.1032807 schema:issn 0920-5691
    117 1573-1405
    118 schema:name International Journal of Computer Vision
    119 schema:publisher Springer Nature
    120 rdf:type schema:Periodical
    121 sg:person.013455650265.29 schema:affiliation grid-institutes:grid.505069.f
    122 schema:familyName Tsai
    123 schema:givenName Yi-Hsuan
    124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013455650265.29
    125 rdf:type schema:Person
    126 sg:person.014034722075.82 schema:affiliation grid-institutes:grid.266096.d
    127 schema:familyName Yang
    128 schema:givenName Ming-Hsuan
    129 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014034722075.82
    130 rdf:type schema:Person
    131 sg:pub.10.1007/978-3-030-58529-7_35 schema:sameAs https://app.dimensions.ai/details/publication/pub.1132590143
    132 https://doi.org/10.1007/978-3-030-58529-7_35
    133 rdf:type schema:CreativeWork
    134 sg:pub.10.1007/978-3-030-58568-6_23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1132591258
    135 https://doi.org/10.1007/978-3-030-58568-6_23
    136 rdf:type schema:CreativeWork
    137 sg:pub.10.1007/978-3-030-58577-8_7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1131136428
    138 https://doi.org/10.1007/978-3-030-58577-8_7
    139 rdf:type schema:CreativeWork
    140 sg:pub.10.1007/978-3-030-58577-8_8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1131132046
    141 https://doi.org/10.1007/978-3-030-58577-8_8
    142 rdf:type schema:CreativeWork
    143 sg:pub.10.1007/978-3-319-10602-1_48 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045321436
    144 https://doi.org/10.1007/978-3-319-10602-1_48
    145 rdf:type schema:CreativeWork
    146 sg:pub.10.1007/978-3-319-46448-0_49 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039098005
    147 https://doi.org/10.1007/978-3-319-46448-0_49
    148 rdf:type schema:CreativeWork
    149 sg:pub.10.1007/978-3-319-46475-6_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042471266
    150 https://doi.org/10.1007/978-3-319-46475-6_5
    151 rdf:type schema:CreativeWork
    152 sg:pub.10.1007/978-3-319-46493-0_48 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029888027
    153 https://doi.org/10.1007/978-3-319-46493-0_48
    154 rdf:type schema:CreativeWork
    155 grid-institutes:grid.266096.d schema:alternateName University of California, Merced, CA, USA
    156 schema:name University of California, Merced, CA, USA
    157 rdf:type schema:Organization
    158 grid-institutes:grid.505069.f schema:alternateName Phiar, Redwood City, CA, USA
    159 schema:name Phiar, Redwood City, CA, USA
    160 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...