Chemlistem: chemical named entity recognition using recurrent neural networks View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-12-06

AUTHORS

Peter Corbett, John Boyle

ABSTRACT

Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks—a type of recurrent neural net. The second system eschews the rich feature set—and even tokenisation—in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%). More... »

PAGES

59

References to SciGraph publications

  • 2016-05-28. A survey of transfer learning in JOURNAL OF BIG DATA
  • 2008-11-19. Cascaded classifiers for confidence-based chemical named entity recognition in BMC BIOINFORMATICS
  • 2015-05-27. Deep learning in NATURE
  • 2015-01-19. tmChem: a high performance approach for chemical named entity recognition and normalization in JOURNAL OF CHEMINFORMATICS
  • 2011-10-14. OSCAR4: a flexible architecture for chemical text-mining in JOURNAL OF CHEMINFORMATICS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s13321-018-0313-8

    DOI

    http://dx.doi.org/10.1186/s13321-018-0313-8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1110434409

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/30523437


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/03", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Chemical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0303", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Macromolecular and Materials Chemistry", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK", 
              "id": "http://www.grid.ac/institutes/grid.431456.1", 
              "name": [
                "Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Corbett", 
            "givenName": "Peter", 
            "id": "sg:person.010641656713.56", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010641656713.56"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK", 
              "id": "http://www.grid.ac/institutes/grid.431456.1", 
              "name": [
                "Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Boyle", 
            "givenName": "John", 
            "id": "sg:person.01110033460.10", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01110033460.10"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/nature14539", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1010020120", 
              "https://doi.org/10.1038/nature14539"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1758-2946-7-s1-s3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043621969", 
              "https://doi.org/10.1186/1758-2946-7-s1-s3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s40537-016-0043-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046078126", 
              "https://doi.org/10.1186/s40537-016-0043-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-9-s11-s4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009242068", 
              "https://doi.org/10.1186/1471-2105-9-s11-s4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1758-2946-3-41", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1012459844", 
              "https://doi.org/10.1186/1758-2946-3-41"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-12-06", 
        "datePublishedReg": "2018-12-06", 
        "description": "Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as \u201cdeep learning\u201d we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks\u2014a type of recurrent neural net. The second system eschews the rich feature set\u2014and even tokenisation\u2014in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s13321-018-0313-8", 
        "inLanguage": "en", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1042252", 
            "issn": [
              "1758-2946"
            ], 
            "name": "Journal of Cheminformatics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "10"
          }
        ], 
        "keywords": [
          "entity recognition", 
          "F-score", 
          "bidirectional long short-term memory network", 
          "long short-term memory network", 
          "multiple LSTM layers", 
          "deep learning framework", 
          "short-term memory network", 
          "artificial neural network technique", 
          "entity recognition system", 
          "recurrent neural network", 
          "Conditional Random Fields", 
          "highest F-score", 
          "neural network technique", 
          "term memory network", 
          "recurrent neural nets", 
          "neural word embeddings", 
          "character labeling", 
          "deep learning", 
          "transfer learning", 
          "sequence of tags", 
          "learning framework", 
          "recognition system", 
          "LSTM layers", 
          "neural network", 
          "network technique", 
          "character embeddings", 
          "word embeddings", 
          "rich features", 
          "neural nets", 
          "token features", 
          "memory network", 
          "traditional CRF", 
          "random fields", 
          "first system", 
          "embedding", 
          "network", 
          "learning", 
          "test data", 
          "recognition", 
          "tokenisation", 
          "system", 
          "second system", 
          "competition entry", 
          "features", 
          "framework", 
          "third system", 
          "nets", 
          "scores", 
          "tags", 
          "CRF", 
          "ensemble", 
          "technique", 
          "idioms", 
          "data", 
          "success", 
          "top group", 
          "field", 
          "sequence", 
          "favor", 
          "results", 
          "group", 
          "alternative", 
          "labeling", 
          "entry", 
          "layer", 
          "types", 
          "approach", 
          "chemicals"
        ], 
        "name": "Chemlistem: chemical named entity recognition using recurrent neural networks", 
        "pagination": "59", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1110434409"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s13321-018-0313-8"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "30523437"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s13321-018-0313-8", 
          "https://app.dimensions.ai/details/publication/pub.1110434409"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-06-01T22:19", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/article/article_758.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s13321-018-0313-8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13321-018-0313-8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13321-018-0313-8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13321-018-0313-8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13321-018-0313-8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    156 TRIPLES      22 PREDICATES      99 URIs      86 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s13321-018-0313-8 schema:about anzsrc-for:03
    2 anzsrc-for:0303
    3 schema:author Nfd6668d4675a408a80fe97b71bd61b7e
    4 schema:citation sg:pub.10.1038/nature14539
    5 sg:pub.10.1186/1471-2105-9-s11-s4
    6 sg:pub.10.1186/1758-2946-3-41
    7 sg:pub.10.1186/1758-2946-7-s1-s3
    8 sg:pub.10.1186/s40537-016-0043-6
    9 schema:datePublished 2018-12-06
    10 schema:datePublishedReg 2018-12-06
    11 schema:description Chemical named entity recognition (NER) has traditionally been dominated by conditional random fields (CRF)-based approaches but given the success of the artificial neural network techniques known as “deep learning” we decided to examine them as an alternative to CRFs. We present here several chemical named entity recognition systems. The first system translates the traditional CRF-based idioms into a deep learning framework, using rich per-token features and neural word embeddings, and producing a sequence of tags using bidirectional long short term memory (LSTM) networks—a type of recurrent neural net. The second system eschews the rich feature set—and even tokenisation—in favour of character labelling using neural character embeddings and multiple LSTM layers. The third system is an ensemble that combines the results of the first two systems. Our original BioCreative V.5 competition entry was placed in the top group with the highest F scores, and subsequent using transfer learning have achieved a final F score of 90.33% on the test data (precision 91.47%, recall 89.21%).
    12 schema:genre article
    13 schema:inLanguage en
    14 schema:isAccessibleForFree true
    15 schema:isPartOf N8cfcda5b988d4f5d9d8150c412722aef
    16 Nbcff4a17c5a943439fee24a79553281f
    17 sg:journal.1042252
    18 schema:keywords CRF
    19 Conditional Random Fields
    20 F-score
    21 LSTM layers
    22 alternative
    23 approach
    24 artificial neural network technique
    25 bidirectional long short-term memory network
    26 character embeddings
    27 character labeling
    28 chemicals
    29 competition entry
    30 data
    31 deep learning
    32 deep learning framework
    33 embedding
    34 ensemble
    35 entity recognition
    36 entity recognition system
    37 entry
    38 favor
    39 features
    40 field
    41 first system
    42 framework
    43 group
    44 highest F-score
    45 idioms
    46 labeling
    47 layer
    48 learning
    49 learning framework
    50 long short-term memory network
    51 memory network
    52 multiple LSTM layers
    53 nets
    54 network
    55 network technique
    56 neural nets
    57 neural network
    58 neural network technique
    59 neural word embeddings
    60 random fields
    61 recognition
    62 recognition system
    63 recurrent neural nets
    64 recurrent neural network
    65 results
    66 rich features
    67 scores
    68 second system
    69 sequence
    70 sequence of tags
    71 short-term memory network
    72 success
    73 system
    74 tags
    75 technique
    76 term memory network
    77 test data
    78 third system
    79 token features
    80 tokenisation
    81 top group
    82 traditional CRF
    83 transfer learning
    84 types
    85 word embeddings
    86 schema:name Chemlistem: chemical named entity recognition using recurrent neural networks
    87 schema:pagination 59
    88 schema:productId N5f74f6cce80a4a6bb3b9f8d41aa7ab49
    89 Nb24896cc884e480b86f90ee1fca72902
    90 Nc91d9a26baca4891aebb1f28f82a9f8b
    91 schema:sameAs https://app.dimensions.ai/details/publication/pub.1110434409
    92 https://doi.org/10.1186/s13321-018-0313-8
    93 schema:sdDatePublished 2022-06-01T22:19
    94 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    95 schema:sdPublisher N2f637145e4724ed083a289747ebea73d
    96 schema:url https://doi.org/10.1186/s13321-018-0313-8
    97 sgo:license sg:explorer/license/
    98 sgo:sdDataset articles
    99 rdf:type schema:ScholarlyArticle
    100 N2f637145e4724ed083a289747ebea73d schema:name Springer Nature - SN SciGraph project
    101 rdf:type schema:Organization
    102 N5f74f6cce80a4a6bb3b9f8d41aa7ab49 schema:name pubmed_id
    103 schema:value 30523437
    104 rdf:type schema:PropertyValue
    105 N72233708e7054159bfd77869251b0e53 rdf:first sg:person.01110033460.10
    106 rdf:rest rdf:nil
    107 N8cfcda5b988d4f5d9d8150c412722aef schema:volumeNumber 10
    108 rdf:type schema:PublicationVolume
    109 Nb24896cc884e480b86f90ee1fca72902 schema:name doi
    110 schema:value 10.1186/s13321-018-0313-8
    111 rdf:type schema:PropertyValue
    112 Nbcff4a17c5a943439fee24a79553281f schema:issueNumber 1
    113 rdf:type schema:PublicationIssue
    114 Nc91d9a26baca4891aebb1f28f82a9f8b schema:name dimensions_id
    115 schema:value pub.1110434409
    116 rdf:type schema:PropertyValue
    117 Nfd6668d4675a408a80fe97b71bd61b7e rdf:first sg:person.010641656713.56
    118 rdf:rest N72233708e7054159bfd77869251b0e53
    119 anzsrc-for:03 schema:inDefinedTermSet anzsrc-for:
    120 schema:name Chemical Sciences
    121 rdf:type schema:DefinedTerm
    122 anzsrc-for:0303 schema:inDefinedTermSet anzsrc-for:
    123 schema:name Macromolecular and Materials Chemistry
    124 rdf:type schema:DefinedTerm
    125 sg:journal.1042252 schema:issn 1758-2946
    126 schema:name Journal of Cheminformatics
    127 schema:publisher Springer Nature
    128 rdf:type schema:Periodical
    129 sg:person.010641656713.56 schema:affiliation grid-institutes:grid.431456.1
    130 schema:familyName Corbett
    131 schema:givenName Peter
    132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010641656713.56
    133 rdf:type schema:Person
    134 sg:person.01110033460.10 schema:affiliation grid-institutes:grid.431456.1
    135 schema:familyName Boyle
    136 schema:givenName John
    137 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01110033460.10
    138 rdf:type schema:Person
    139 sg:pub.10.1038/nature14539 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010020120
    140 https://doi.org/10.1038/nature14539
    141 rdf:type schema:CreativeWork
    142 sg:pub.10.1186/1471-2105-9-s11-s4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009242068
    143 https://doi.org/10.1186/1471-2105-9-s11-s4
    144 rdf:type schema:CreativeWork
    145 sg:pub.10.1186/1758-2946-3-41 schema:sameAs https://app.dimensions.ai/details/publication/pub.1012459844
    146 https://doi.org/10.1186/1758-2946-3-41
    147 rdf:type schema:CreativeWork
    148 sg:pub.10.1186/1758-2946-7-s1-s3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043621969
    149 https://doi.org/10.1186/1758-2946-7-s1-s3
    150 rdf:type schema:CreativeWork
    151 sg:pub.10.1186/s40537-016-0043-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046078126
    152 https://doi.org/10.1186/s40537-016-0043-6
    153 rdf:type schema:CreativeWork
    154 grid-institutes:grid.431456.1 schema:alternateName Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK
    155 schema:name Data Science Group, Technology Department, The Royal Society of Chemistry, Cambridge, UK
    156 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...