Identifying Clinical Terms in Free-Text Notes Using Ontology-Guided Machine Learning View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2019-04-02

AUTHORS

Aryan Arbabi , David R. Adams , Sanja Fidler , Michael Brudno

ABSTRACT

Objective: Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications and its accuracy has a large impact on electronic health record analysis. The mining of such terms is complicated by the broad use of synonyms and non-standard terms in medical documents. Here we presented a machine learning model for concept recognition in large unstructured text which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology.Materials and Methods: We present a neural dictionary model which can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called Neural Concept Recognizer (NCR), uses a convolutional neural network and utilizes the taxonomy structure to encode input phrases, then rank medical concepts based on the similarity in that space. It also utilizes the biomedical ontology structure to optimize the embedding of various terms and has fewer training constraints than previous methods. We train our model on two biomedical ontologies, the Human Phenotype Ontology (HPO) and SNOMED-CT.Results: We tested our model trained on HPO on two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We also tested our model trained on the SNOMED-CT on 2000 MIMIC-III ICU discharge summaries. The results of our experiments show the high accuracy of our model, as well as the value of utilizing the taxonomy structure of the ontology in concept recognition.Conclusion: Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. Also, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to get for biomedical ontologies. Without relying on a large-scale labeled training data or requiring any custom training, our model can efficiently generalize to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies. More... »

PAGES

19-34

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-030-17083-7_2

DOI

http://dx.doi.org/10.1007/978-3-030-17083-7_2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1113486636


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Vector Institute, Toronto, ON, Canada", 
          "id": "http://www.grid.ac/institutes/grid.494618.6", 
          "name": [
            "Department of Computer Science, University of Toronto, Toronto, ON, Canada", 
            "Center for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada", 
            "Vector Institute, Toronto, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Arbabi", 
        "givenName": "Aryan", 
        "id": "sg:person.0617604506.60", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0617604506.60"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Section on Human Biochemical Genetics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA", 
          "id": "http://www.grid.ac/institutes/grid.280128.1", 
          "name": [
            "Section on Human Biochemical Genetics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Adams", 
        "givenName": "David R.", 
        "id": "sg:person.013251634647.97", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013251634647.97"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Vector Institute, Toronto, ON, Canada", 
          "id": "http://www.grid.ac/institutes/grid.494618.6", 
          "name": [
            "Department of Computer Science, University of Toronto, Toronto, ON, Canada", 
            "Vector Institute, Toronto, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fidler", 
        "givenName": "Sanja", 
        "id": "sg:person.07721552373.76", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07721552373.76"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Vector Institute, Toronto, ON, Canada", 
          "id": "http://www.grid.ac/institutes/grid.494618.6", 
          "name": [
            "Department of Computer Science, University of Toronto, Toronto, ON, Canada", 
            "Center for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada", 
            "Vector Institute, Toronto, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Brudno", 
        "givenName": "Michael", 
        "id": "sg:person.01253563237.25", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2019-04-02", 
    "datePublishedReg": "2019-04-02", 
    "description": "Objective: Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications and its accuracy has a large impact on electronic health record analysis. The mining of such terms is complicated by the broad use of synonyms and non-standard terms in medical documents. Here we presented a machine learning model for concept recognition in large unstructured text which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology.Materials and Methods: We present a neural dictionary model which can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called Neural Concept Recognizer (NCR), uses a convolutional neural network and utilizes the taxonomy structure to encode input phrases, then rank medical concepts based on the similarity in that space. It also utilizes the biomedical ontology structure to optimize the embedding of various terms and has fewer training constraints than previous methods. We train our model on two biomedical ontologies, the Human Phenotype Ontology (HPO) and SNOMED-CT.Results: We tested our model trained on HPO on two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We also tested our model trained on the SNOMED-CT on 2000 MIMIC-III ICU discharge summaries. The results of our experiments show the high accuracy of our model, as well as the value of utilizing the taxonomy structure of the ontology in concept recognition.Conclusion: Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. Also, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to get for biomedical ontologies. Without relying on a large-scale labeled training data or requiring any custom training, our model can efficiently generalize to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies.", 
    "editor": [
      {
        "familyName": "Cowen", 
        "givenName": "Lenore J.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-030-17083-7_2", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-030-17082-0", 
        "978-3-030-17083-7"
      ], 
      "name": "Research in Computational Molecular Biology", 
      "type": "Book"
    }, 
    "keywords": [
      "Human Phenotype Ontology", 
      "unstructured text", 
      "concept recognition", 
      "biomedical ontologies", 
      "taxonomy structure", 
      "medical concepts", 
      "SNOMED CT", 
      "convolutional neural network", 
      "electronic health record analysis", 
      "free-text notes", 
      "rule-based model", 
      "health record analysis", 
      "specific ontology", 
      "reference ontology", 
      "ontology structure", 
      "different data sets", 
      "class of concepts", 
      "automatic recognition", 
      "most machine", 
      "training data", 
      "neural network", 
      "custom training", 
      "input phrases", 
      "PubMed abstracts", 
      "non-standard terms", 
      "medical documents", 
      "ontology", 
      "ontological structure", 
      "dictionary model", 
      "training constraints", 
      "large corpus", 
      "Phenotype Ontology", 
      "previous methods", 
      "machine", 
      "data sets", 
      "recognizer", 
      "high accuracy", 
      "recognition", 
      "discharge summaries", 
      "text", 
      "accuracy", 
      "research applications", 
      "mining", 
      "concept", 
      "network", 
      "embedding", 
      "clinical terms", 
      "documents", 
      "corpus", 
      "record analysis", 
      "model", 
      "phrases", 
      "constraints", 
      "set", 
      "broad use", 
      "method", 
      "applications", 
      "terms", 
      "important component", 
      "space", 
      "training", 
      "use", 
      "synonym", 
      "similarity", 
      "data", 
      "experiments", 
      "class", 
      "custom", 
      "large impact", 
      "structure", 
      "components", 
      "results", 
      "state", 
      "such terms", 
      "analysis", 
      "summary", 
      "impact", 
      "clinical reports", 
      "values", 
      "note", 
      "Abstract", 
      "report", 
      "materials", 
      "new synonym", 
      "large unstructured text", 
      "unobserved synonyms", 
      "neural dictionary model", 
      "Neural Concept Recognizer", 
      "Concept Recognizer", 
      "biomedical ontology structure", 
      "MIMIC-III ICU discharge summaries", 
      "ICU discharge summaries", 
      "popular medical concept recognizers", 
      "medical concept recognizers", 
      "unseen synonyms", 
      "art methods custom", 
      "methods custom"
    ], 
    "name": "Identifying Clinical Terms in Free-Text Notes Using Ontology-Guided Machine Learning", 
    "pagination": "19-34", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1113486636"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-030-17083-7_2"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-030-17083-7_2", 
      "https://app.dimensions.ai/details/publication/pub.1113486636"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:19", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_335.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-030-17083-7_2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-17083-7_2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-17083-7_2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-17083-7_2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-17083-7_2'


 

This table displays all metadata directly associated to this object as RDF triples.

183 TRIPLES      23 PREDICATES      122 URIs      115 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-030-17083-7_2 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N16053878ed0247ed84f27ea0eac0e47e
4 schema:datePublished 2019-04-02
5 schema:datePublishedReg 2019-04-02
6 schema:description Objective: Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications and its accuracy has a large impact on electronic health record analysis. The mining of such terms is complicated by the broad use of synonyms and non-standard terms in medical documents. Here we presented a machine learning model for concept recognition in large unstructured text which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology.Materials and Methods: We present a neural dictionary model which can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called Neural Concept Recognizer (NCR), uses a convolutional neural network and utilizes the taxonomy structure to encode input phrases, then rank medical concepts based on the similarity in that space. It also utilizes the biomedical ontology structure to optimize the embedding of various terms and has fewer training constraints than previous methods. We train our model on two biomedical ontologies, the Human Phenotype Ontology (HPO) and SNOMED-CT.Results: We tested our model trained on HPO on two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We also tested our model trained on the SNOMED-CT on 2000 MIMIC-III ICU discharge summaries. The results of our experiments show the high accuracy of our model, as well as the value of utilizing the taxonomy structure of the ontology in concept recognition.Conclusion: Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. Also, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to get for biomedical ontologies. Without relying on a large-scale labeled training data or requiring any custom training, our model can efficiently generalize to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies.
7 schema:editor N110586b87516400ea2ce2232174dbc87
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf N431dec0bde95473291b8d698d2f00556
12 schema:keywords Abstract
13 Concept Recognizer
14 Human Phenotype Ontology
15 ICU discharge summaries
16 MIMIC-III ICU discharge summaries
17 Neural Concept Recognizer
18 Phenotype Ontology
19 PubMed abstracts
20 SNOMED CT
21 accuracy
22 analysis
23 applications
24 art methods custom
25 automatic recognition
26 biomedical ontologies
27 biomedical ontology structure
28 broad use
29 class
30 class of concepts
31 clinical reports
32 clinical terms
33 components
34 concept
35 concept recognition
36 constraints
37 convolutional neural network
38 corpus
39 custom
40 custom training
41 data
42 data sets
43 dictionary model
44 different data sets
45 discharge summaries
46 documents
47 electronic health record analysis
48 embedding
49 experiments
50 free-text notes
51 health record analysis
52 high accuracy
53 impact
54 important component
55 input phrases
56 large corpus
57 large impact
58 large unstructured text
59 machine
60 materials
61 medical concept recognizers
62 medical concepts
63 medical documents
64 method
65 methods custom
66 mining
67 model
68 most machine
69 network
70 neural dictionary model
71 neural network
72 new synonym
73 non-standard terms
74 note
75 ontological structure
76 ontology
77 ontology structure
78 phrases
79 popular medical concept recognizers
80 previous methods
81 recognition
82 recognizer
83 record analysis
84 reference ontology
85 report
86 research applications
87 results
88 rule-based model
89 set
90 similarity
91 space
92 specific ontology
93 state
94 structure
95 such terms
96 summary
97 synonym
98 taxonomy structure
99 terms
100 text
101 training
102 training constraints
103 training data
104 unobserved synonyms
105 unseen synonyms
106 unstructured text
107 use
108 values
109 schema:name Identifying Clinical Terms in Free-Text Notes Using Ontology-Guided Machine Learning
110 schema:pagination 19-34
111 schema:productId N5757f775ac024ae29289271a6ffa5581
112 Nf0bbe2405be6459f8e86bbf1104d5128
113 schema:publisher Nadf291d4c6f048cbb29da715452e51c2
114 schema:sameAs https://app.dimensions.ai/details/publication/pub.1113486636
115 https://doi.org/10.1007/978-3-030-17083-7_2
116 schema:sdDatePublished 2022-01-01T19:19
117 schema:sdLicense https://scigraph.springernature.com/explorer/license/
118 schema:sdPublisher N5f6d57a96b224a9eb5b50eb9bf02408b
119 schema:url https://doi.org/10.1007/978-3-030-17083-7_2
120 sgo:license sg:explorer/license/
121 sgo:sdDataset chapters
122 rdf:type schema:Chapter
123 N110586b87516400ea2ce2232174dbc87 rdf:first N302816aa176644f884dd02dbdb17f8d5
124 rdf:rest rdf:nil
125 N16053878ed0247ed84f27ea0eac0e47e rdf:first sg:person.0617604506.60
126 rdf:rest Nb25501844bf04ea9ac022399ba95c877
127 N302816aa176644f884dd02dbdb17f8d5 schema:familyName Cowen
128 schema:givenName Lenore J.
129 rdf:type schema:Person
130 N431dec0bde95473291b8d698d2f00556 schema:isbn 978-3-030-17082-0
131 978-3-030-17083-7
132 schema:name Research in Computational Molecular Biology
133 rdf:type schema:Book
134 N4b471e8d4aca4788a6af6f597707026d rdf:first sg:person.07721552373.76
135 rdf:rest N8cca183f5ab44e038974ddab2b097533
136 N5757f775ac024ae29289271a6ffa5581 schema:name doi
137 schema:value 10.1007/978-3-030-17083-7_2
138 rdf:type schema:PropertyValue
139 N5f6d57a96b224a9eb5b50eb9bf02408b schema:name Springer Nature - SN SciGraph project
140 rdf:type schema:Organization
141 N8cca183f5ab44e038974ddab2b097533 rdf:first sg:person.01253563237.25
142 rdf:rest rdf:nil
143 Nadf291d4c6f048cbb29da715452e51c2 schema:name Springer Nature
144 rdf:type schema:Organisation
145 Nb25501844bf04ea9ac022399ba95c877 rdf:first sg:person.013251634647.97
146 rdf:rest N4b471e8d4aca4788a6af6f597707026d
147 Nf0bbe2405be6459f8e86bbf1104d5128 schema:name dimensions_id
148 schema:value pub.1113486636
149 rdf:type schema:PropertyValue
150 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
151 schema:name Information and Computing Sciences
152 rdf:type schema:DefinedTerm
153 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
154 schema:name Artificial Intelligence and Image Processing
155 rdf:type schema:DefinedTerm
156 sg:person.01253563237.25 schema:affiliation grid-institutes:grid.494618.6
157 schema:familyName Brudno
158 schema:givenName Michael
159 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01253563237.25
160 rdf:type schema:Person
161 sg:person.013251634647.97 schema:affiliation grid-institutes:grid.280128.1
162 schema:familyName Adams
163 schema:givenName David R.
164 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013251634647.97
165 rdf:type schema:Person
166 sg:person.0617604506.60 schema:affiliation grid-institutes:grid.494618.6
167 schema:familyName Arbabi
168 schema:givenName Aryan
169 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0617604506.60
170 rdf:type schema:Person
171 sg:person.07721552373.76 schema:affiliation grid-institutes:grid.494618.6
172 schema:familyName Fidler
173 schema:givenName Sanja
174 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07721552373.76
175 rdf:type schema:Person
176 grid-institutes:grid.280128.1 schema:alternateName Section on Human Biochemical Genetics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
177 schema:name Section on Human Biochemical Genetics, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
178 rdf:type schema:Organization
179 grid-institutes:grid.494618.6 schema:alternateName Vector Institute, Toronto, ON, Canada
180 schema:name Center for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada
181 Department of Computer Science, University of Toronto, Toronto, ON, Canada
182 Vector Institute, Toronto, ON, Canada
183 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...