Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-04-12

AUTHORS

Mercedes Arguello Casteleiro, George Demetriou, Warren Read, Maria Jesus Fernandez Prieto, Nava Maroto, Diego Maseda Fernandez, Goran Nenadic, Julie Klein, John Keane, Robert Stevens

ABSTRACT

BACKGROUND: Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. METHODS: We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. RESULTS: In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. CONCLUSIONS: This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature. More... »

PAGES

13

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s13326-018-0181-1

DOI

http://dx.doi.org/10.1186/s13326-018-0181-1

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1103241971

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/29650041


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Biological Ontologies", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Cardiovascular Diseases", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Deep Learning", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Annotation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "PubMed", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "ROC Curve", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "School of Computer Science, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Casteleiro", 
        "givenName": "Mercedes Arguello", 
        "id": "sg:person.01347347662.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01347347662.23"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Computer Science, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Demetriou", 
        "givenName": "George", 
        "id": "sg:person.01252445567.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01252445567.41"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Computer Science, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Read", 
        "givenName": "Warren", 
        "id": "sg:person.01204332367.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01204332367.18"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Salford Languages, University of Salford, Salford, UK", 
          "id": "http://www.grid.ac/institutes/grid.8752.8", 
          "name": [
            "Salford Languages, University of Salford, Salford, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Prieto", 
        "givenName": "Maria Jesus Fernandez", 
        "id": "sg:person.07740356561.03", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07740356561.03"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Departamento de Ling\u00fc\u00edstica Aplicada a la Ciencia y a la Tecnolog\u00eda, Universidad Polit\u00e9cnica de Madrid, Madrid, Spain", 
          "id": "http://www.grid.ac/institutes/grid.5690.a", 
          "name": [
            "Departamento de Ling\u00fc\u00edstica Aplicada a la Ciencia y a la Tecnolog\u00eda, Universidad Polit\u00e9cnica de Madrid, Madrid, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Maroto", 
        "givenName": "Nava", 
        "id": "sg:person.015076224433.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015076224433.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Midcheshire Hospital Foundation Trust NHS, Crewe, England UK", 
          "id": "http://www.grid.ac/institutes/None", 
          "name": [
            "Midcheshire Hospital Foundation Trust NHS, Crewe, England UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fernandez", 
        "givenName": "Diego Maseda", 
        "id": "sg:person.011635167403.00", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011635167403.00"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Manchester Institute of Biotechnology, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK", 
            "Manchester Institute of Biotechnology, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nenadic", 
        "givenName": "Goran", 
        "id": "sg:person.01070526367.22", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01070526367.22"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Universite Toulouse III Paul Sabatier, route de Narbonne, Toulouse, France", 
          "id": "http://www.grid.ac/institutes/grid.15781.3a", 
          "name": [
            "Institut National de la Sant\u00e9 et de la Recherche Medicale (INSERM) U1048, Toulouse, France", 
            "Universite Toulouse III Paul Sabatier, route de Narbonne, Toulouse, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Klein", 
        "givenName": "Julie", 
        "id": "sg:person.012753044447.84", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012753044447.84"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Manchester Institute of Biotechnology, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK", 
            "Manchester Institute of Biotechnology, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Keane", 
        "givenName": "John", 
        "id": "sg:person.0741055360.36", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0741055360.36"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "School of Computer Science, University of Manchester, Manchester, UK", 
          "id": "http://www.grid.ac/institutes/grid.5379.8", 
          "name": [
            "School of Computer Science, University of Manchester, Manchester, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Stevens", 
        "givenName": "Robert", 
        "id": "sg:person.0653547307.62", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0653547307.62"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1186/1471-2105-7-372", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010861916", 
          "https://doi.org/10.1186/1471-2105-7-372"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-9-s3-s6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018816330", 
          "https://doi.org/10.1186/1471-2105-9-s3-s6"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-10-349", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002595718", 
          "https://doi.org/10.1186/1471-2105-10-349"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg1768", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011616497", 
          "https://doi.org/10.1038/nrg1768"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-s1-s17", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051824257", 
          "https://doi.org/10.1186/1471-2105-6-s1-s17"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2008-9-s2-s2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041411233", 
          "https://doi.org/10.1186/gb-2008-9-s2-s2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1010920819831", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003442924", 
          "https://doi.org/10.1023/a:1010920819831"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg3337", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026913895", 
          "https://doi.org/10.1038/nrg3337"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.3758/bf03204766", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035784262", 
          "https://doi.org/10.3758/bf03204766"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13326-016-0078-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038195706", 
          "https://doi.org/10.1186/s13326-016-0078-9"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature14539", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010020120", 
          "https://doi.org/10.1038/nature14539"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/2041-1480-4-28", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012441090", 
          "https://doi.org/10.1186/2041-1480-4-28"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-6-s1-s3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005869659", 
          "https://doi.org/10.1186/1471-2105-6-s1-s3"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-04-12", 
    "datePublishedReg": "2018-04-12", 
    "description": "BACKGROUND: Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created.\nMETHODS: We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14\u00a0M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels.\nRESULTS: In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%.\nCONCLUSIONS: This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/s13326-018-0181-1", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.2763606", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1043573", 
        "issn": [
          "2041-1480"
        ], 
        "name": "Journal of Biomedical Semantics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "9"
      }
    ], 
    "keywords": [
      "gene/protein names", 
      "word embeddings", 
      "term variants", 
      "UniProtKB entries", 
      "protein names", 
      "deep learning algorithms", 
      "Disease Ontology", 
      "detection task", 
      "deep learning", 
      "domain knowledge", 
      "learning algorithm", 
      "automatic identification", 
      "challenging task", 
      "ontology", 
      "biomedical literature", 
      "computational form", 
      "class expressions", 
      "biomedical publications", 
      "CBOW", 
      "performance improvement", 
      "embedding", 
      "pairs of terms", 
      "biological knowledge", 
      "task", 
      "PubMed articles", 
      "skip", 
      "target terms", 
      "supply context", 
      "combination of terms", 
      "algorithm", 
      "learning", 
      "knowledge", 
      "labels", 
      "experiments", 
      "context", 
      "list", 
      "millions", 
      "name", 
      "terms", 
      "scientific literature", 
      "ambiguity", 
      "genes/proteins", 
      "variants", 
      "entry", 
      "class", 
      "interest", 
      "literature", 
      "improvement", 
      "free-text terms", 
      "identification", 
      "results", 
      "article", 
      "pairs", 
      "publications", 
      "combination", 
      "Abstract", 
      "form", 
      "titles/abstracts", 
      "raters", 
      "protein classes", 
      "study", 
      "variability", 
      "expression", 
      "Experiment I", 
      "Experiment II", 
      "genes", 
      "protein", 
      "term variability", 
      "acceptable alternative free-text terms", 
      "alternative free-text terms", 
      "CVDO", 
      "PubMed titles/abstracts", 
      "unique UniProtKB entries", 
      "synonym detection task", 
      "CVDO protein class expressions", 
      "protein class expressions", 
      "gene/protein synonym detection task", 
      "protein synonym detection task", 
      "CVDO supplies context", 
      "outperforms CBOW", 
      "pertinent term variants"
    ], 
    "name": "Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature", 
    "pagination": "13", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1103241971"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s13326-018-0181-1"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "29650041"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s13326-018-0181-1", 
      "https://app.dimensions.ai/details/publication/pub.1103241971"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-11-01T18:33", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/article/article_785.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/s13326-018-0181-1"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13326-018-0181-1'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13326-018-0181-1'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13326-018-0181-1'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13326-018-0181-1'


 

This table displays all metadata directly associated to this object as RDF triples.

302 TRIPLES      22 PREDICATES      126 URIs      105 LITERALS      14 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s13326-018-0181-1 schema:about N17f94e911aa74f8d91caf9f9d5dc731e
2 N46c79629903c477383948b7c0d729aea
3 N4cdf35512548480791f62af081cb6ff1
4 N89529d8d8f784287940d19df83e7521e
5 N910cd540d01f4375807175fdd2d3eb7c
6 Na208911de5f546d8b0f7a2ab2e06b525
7 Nd7cb4098932849c99c3bfa85ca5f3529
8 anzsrc-for:08
9 anzsrc-for:0801
10 schema:author Na3d07dc30f33453ab0df4ebb61b2e40a
11 schema:citation sg:pub.10.1023/a:1010920819831
12 sg:pub.10.1038/nature14539
13 sg:pub.10.1038/nrg1768
14 sg:pub.10.1038/nrg3337
15 sg:pub.10.1186/1471-2105-10-349
16 sg:pub.10.1186/1471-2105-6-s1-s17
17 sg:pub.10.1186/1471-2105-6-s1-s3
18 sg:pub.10.1186/1471-2105-7-372
19 sg:pub.10.1186/1471-2105-9-s3-s6
20 sg:pub.10.1186/2041-1480-4-28
21 sg:pub.10.1186/gb-2008-9-s2-s2
22 sg:pub.10.1186/s13326-016-0078-9
23 sg:pub.10.3758/bf03204766
24 schema:datePublished 2018-04-12
25 schema:datePublishedReg 2018-04-12
26 schema:description BACKGROUND: Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created. METHODS: We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14 M PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels. RESULTS: In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%. CONCLUSIONS: This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.
27 schema:genre article
28 schema:inLanguage en
29 schema:isAccessibleForFree true
30 schema:isPartOf Nceebf014c7194c3d953b9c5f2b763295
31 Nd419dff9aaa9472788adedfadc60f758
32 sg:journal.1043573
33 schema:keywords Abstract
34 CBOW
35 CVDO
36 CVDO protein class expressions
37 CVDO supplies context
38 Disease Ontology
39 Experiment I
40 Experiment II
41 PubMed articles
42 PubMed titles/abstracts
43 UniProtKB entries
44 acceptable alternative free-text terms
45 algorithm
46 alternative free-text terms
47 ambiguity
48 article
49 automatic identification
50 biological knowledge
51 biomedical literature
52 biomedical publications
53 challenging task
54 class
55 class expressions
56 combination
57 combination of terms
58 computational form
59 context
60 deep learning
61 deep learning algorithms
62 detection task
63 domain knowledge
64 embedding
65 entry
66 experiments
67 expression
68 form
69 free-text terms
70 gene/protein names
71 gene/protein synonym detection task
72 genes
73 genes/proteins
74 identification
75 improvement
76 interest
77 knowledge
78 labels
79 learning
80 learning algorithm
81 list
82 literature
83 millions
84 name
85 ontology
86 outperforms CBOW
87 pairs
88 pairs of terms
89 performance improvement
90 pertinent term variants
91 protein
92 protein class expressions
93 protein classes
94 protein names
95 protein synonym detection task
96 publications
97 raters
98 results
99 scientific literature
100 skip
101 study
102 supply context
103 synonym detection task
104 target terms
105 task
106 term variability
107 term variants
108 terms
109 titles/abstracts
110 unique UniProtKB entries
111 variability
112 variants
113 word embeddings
114 schema:name Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature
115 schema:pagination 13
116 schema:productId N3efc78df91e84c84a8eec159e124ef7d
117 Nd2785e8c1bc64c7196f2cf3c6f37fb8a
118 Ne94f3e822af34e758c907d1bda318129
119 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103241971
120 https://doi.org/10.1186/s13326-018-0181-1
121 schema:sdDatePublished 2021-11-01T18:33
122 schema:sdLicense https://scigraph.springernature.com/explorer/license/
123 schema:sdPublisher Nac45ab8859544d0d83263b2aa69d7e3e
124 schema:url https://doi.org/10.1186/s13326-018-0181-1
125 sgo:license sg:explorer/license/
126 sgo:sdDataset articles
127 rdf:type schema:ScholarlyArticle
128 N17f94e911aa74f8d91caf9f9d5dc731e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
129 schema:name Molecular Sequence Annotation
130 rdf:type schema:DefinedTerm
131 N3efc78df91e84c84a8eec159e124ef7d schema:name pubmed_id
132 schema:value 29650041
133 rdf:type schema:PropertyValue
134 N46c79629903c477383948b7c0d729aea schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
135 schema:name PubMed
136 rdf:type schema:DefinedTerm
137 N4cdf35512548480791f62af081cb6ff1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
138 schema:name ROC Curve
139 rdf:type schema:DefinedTerm
140 N529b2eb338cf4df2b1996a77285b36ff rdf:first sg:person.015076224433.44
141 rdf:rest N8a170898f1eb48ff9ac46923986c0393
142 N816a866a051141e29094c4d1b97f9696 rdf:first sg:person.0741055360.36
143 rdf:rest Nb36f7326ea3f430e8ee4d0217de9ebcd
144 N89529d8d8f784287940d19df83e7521e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
145 schema:name Humans
146 rdf:type schema:DefinedTerm
147 N8a170898f1eb48ff9ac46923986c0393 rdf:first sg:person.011635167403.00
148 rdf:rest Nf3ae556d8033485bb081e94ec56d7132
149 N910cd540d01f4375807175fdd2d3eb7c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
150 schema:name Deep Learning
151 rdf:type schema:DefinedTerm
152 Na208911de5f546d8b0f7a2ab2e06b525 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
153 schema:name Cardiovascular Diseases
154 rdf:type schema:DefinedTerm
155 Na3d07dc30f33453ab0df4ebb61b2e40a rdf:first sg:person.01347347662.23
156 rdf:rest Ne37cd0f5ee174a08aaffe1ccfd3b8345
157 Nac45ab8859544d0d83263b2aa69d7e3e schema:name Springer Nature - SN SciGraph project
158 rdf:type schema:Organization
159 Nb36f7326ea3f430e8ee4d0217de9ebcd rdf:first sg:person.0653547307.62
160 rdf:rest rdf:nil
161 Nc1108c3696a74edf8f6bd0e72e2b565b rdf:first sg:person.01204332367.18
162 rdf:rest Nea489dec3d6e47d08608867aa9a512e9
163 Nc5cba16f752b42b9b338b595fe09016e rdf:first sg:person.012753044447.84
164 rdf:rest N816a866a051141e29094c4d1b97f9696
165 Nceebf014c7194c3d953b9c5f2b763295 schema:volumeNumber 9
166 rdf:type schema:PublicationVolume
167 Nd2785e8c1bc64c7196f2cf3c6f37fb8a schema:name dimensions_id
168 schema:value pub.1103241971
169 rdf:type schema:PropertyValue
170 Nd419dff9aaa9472788adedfadc60f758 schema:issueNumber 1
171 rdf:type schema:PublicationIssue
172 Nd7cb4098932849c99c3bfa85ca5f3529 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
173 schema:name Biological Ontologies
174 rdf:type schema:DefinedTerm
175 Ne37cd0f5ee174a08aaffe1ccfd3b8345 rdf:first sg:person.01252445567.41
176 rdf:rest Nc1108c3696a74edf8f6bd0e72e2b565b
177 Ne94f3e822af34e758c907d1bda318129 schema:name doi
178 schema:value 10.1186/s13326-018-0181-1
179 rdf:type schema:PropertyValue
180 Nea489dec3d6e47d08608867aa9a512e9 rdf:first sg:person.07740356561.03
181 rdf:rest N529b2eb338cf4df2b1996a77285b36ff
182 Nf3ae556d8033485bb081e94ec56d7132 rdf:first sg:person.01070526367.22
183 rdf:rest Nc5cba16f752b42b9b338b595fe09016e
184 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
185 schema:name Information and Computing Sciences
186 rdf:type schema:DefinedTerm
187 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
188 schema:name Artificial Intelligence and Image Processing
189 rdf:type schema:DefinedTerm
190 sg:grant.2763606 http://pending.schema.org/fundedItem sg:pub.10.1186/s13326-018-0181-1
191 rdf:type schema:MonetaryGrant
192 sg:journal.1043573 schema:issn 2041-1480
193 schema:name Journal of Biomedical Semantics
194 schema:publisher Springer Nature
195 rdf:type schema:Periodical
196 sg:person.01070526367.22 schema:affiliation grid-institutes:grid.5379.8
197 schema:familyName Nenadic
198 schema:givenName Goran
199 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01070526367.22
200 rdf:type schema:Person
201 sg:person.011635167403.00 schema:affiliation grid-institutes:None
202 schema:familyName Fernandez
203 schema:givenName Diego Maseda
204 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011635167403.00
205 rdf:type schema:Person
206 sg:person.01204332367.18 schema:affiliation grid-institutes:grid.5379.8
207 schema:familyName Read
208 schema:givenName Warren
209 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01204332367.18
210 rdf:type schema:Person
211 sg:person.01252445567.41 schema:affiliation grid-institutes:grid.5379.8
212 schema:familyName Demetriou
213 schema:givenName George
214 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01252445567.41
215 rdf:type schema:Person
216 sg:person.012753044447.84 schema:affiliation grid-institutes:grid.15781.3a
217 schema:familyName Klein
218 schema:givenName Julie
219 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012753044447.84
220 rdf:type schema:Person
221 sg:person.01347347662.23 schema:affiliation grid-institutes:grid.5379.8
222 schema:familyName Casteleiro
223 schema:givenName Mercedes Arguello
224 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01347347662.23
225 rdf:type schema:Person
226 sg:person.015076224433.44 schema:affiliation grid-institutes:grid.5690.a
227 schema:familyName Maroto
228 schema:givenName Nava
229 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015076224433.44
230 rdf:type schema:Person
231 sg:person.0653547307.62 schema:affiliation grid-institutes:grid.5379.8
232 schema:familyName Stevens
233 schema:givenName Robert
234 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0653547307.62
235 rdf:type schema:Person
236 sg:person.0741055360.36 schema:affiliation grid-institutes:grid.5379.8
237 schema:familyName Keane
238 schema:givenName John
239 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0741055360.36
240 rdf:type schema:Person
241 sg:person.07740356561.03 schema:affiliation grid-institutes:grid.8752.8
242 schema:familyName Prieto
243 schema:givenName Maria Jesus Fernandez
244 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07740356561.03
245 rdf:type schema:Person
246 sg:pub.10.1023/a:1010920819831 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003442924
247 https://doi.org/10.1023/a:1010920819831
248 rdf:type schema:CreativeWork
249 sg:pub.10.1038/nature14539 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010020120
250 https://doi.org/10.1038/nature14539
251 rdf:type schema:CreativeWork
252 sg:pub.10.1038/nrg1768 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011616497
253 https://doi.org/10.1038/nrg1768
254 rdf:type schema:CreativeWork
255 sg:pub.10.1038/nrg3337 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026913895
256 https://doi.org/10.1038/nrg3337
257 rdf:type schema:CreativeWork
258 sg:pub.10.1186/1471-2105-10-349 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002595718
259 https://doi.org/10.1186/1471-2105-10-349
260 rdf:type schema:CreativeWork
261 sg:pub.10.1186/1471-2105-6-s1-s17 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051824257
262 https://doi.org/10.1186/1471-2105-6-s1-s17
263 rdf:type schema:CreativeWork
264 sg:pub.10.1186/1471-2105-6-s1-s3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005869659
265 https://doi.org/10.1186/1471-2105-6-s1-s3
266 rdf:type schema:CreativeWork
267 sg:pub.10.1186/1471-2105-7-372 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010861916
268 https://doi.org/10.1186/1471-2105-7-372
269 rdf:type schema:CreativeWork
270 sg:pub.10.1186/1471-2105-9-s3-s6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018816330
271 https://doi.org/10.1186/1471-2105-9-s3-s6
272 rdf:type schema:CreativeWork
273 sg:pub.10.1186/2041-1480-4-28 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012441090
274 https://doi.org/10.1186/2041-1480-4-28
275 rdf:type schema:CreativeWork
276 sg:pub.10.1186/gb-2008-9-s2-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041411233
277 https://doi.org/10.1186/gb-2008-9-s2-s2
278 rdf:type schema:CreativeWork
279 sg:pub.10.1186/s13326-016-0078-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038195706
280 https://doi.org/10.1186/s13326-016-0078-9
281 rdf:type schema:CreativeWork
282 sg:pub.10.3758/bf03204766 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035784262
283 https://doi.org/10.3758/bf03204766
284 rdf:type schema:CreativeWork
285 grid-institutes:None schema:alternateName Midcheshire Hospital Foundation Trust NHS, Crewe, England UK
286 schema:name Midcheshire Hospital Foundation Trust NHS, Crewe, England UK
287 rdf:type schema:Organization
288 grid-institutes:grid.15781.3a schema:alternateName Universite Toulouse III Paul Sabatier, route de Narbonne, Toulouse, France
289 schema:name Institut National de la Santé et de la Recherche Medicale (INSERM) U1048, Toulouse, France
290 Universite Toulouse III Paul Sabatier, route de Narbonne, Toulouse, France
291 rdf:type schema:Organization
292 grid-institutes:grid.5379.8 schema:alternateName Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
293 School of Computer Science, University of Manchester, Manchester, UK
294 schema:name Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
295 School of Computer Science, University of Manchester, Manchester, UK
296 rdf:type schema:Organization
297 grid-institutes:grid.5690.a schema:alternateName Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, Universidad Politécnica de Madrid, Madrid, Spain
298 schema:name Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología, Universidad Politécnica de Madrid, Madrid, Spain
299 rdf:type schema:Organization
300 grid-institutes:grid.8752.8 schema:alternateName Salford Languages, University of Salford, Salford, UK
301 schema:name Salford Languages, University of Salford, Salford, UK
302 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...