YEARS

2010-2018

AUTHORS

Hua Xu

TITLE

Interactive machine learning methods for clinical natural language processing

ABSTRACT

DESCRIPTION (provided by applicant): Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing. In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims. In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.

FUNDED PUBLICATIONS

  • Parsing clinical text: how good are the state-of-the-art parsers?
  • Applying active learning to assertion classification of concepts in clinical text.
  • A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
  • A new clustering method for detecting rare senses of abbreviations in clinical notes.
  • A study of active learning methods for named entity recognition in clinical text.
  • Identifying the status of genetic lesions in cancer clinical trial documents using machine learning.
  • Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages.
  • Applying active learning to supervised word sense disambiguation in MEDLINE.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    27 TRIPLES      17 PREDICATES      28 URIs      9 LITERALS

    Subject Predicate Object
    1 grants:6df28516e5bd570e16660476139761eb sg:abstract DESCRIPTION (provided by applicant): Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing. In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims. In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.
    2 sg:endYear 2018
    3 sg:fundingAmount 2598537.0
    4 sg:fundingCurrency USD
    5 sg:hasContribution contributions:66c8851bbede56b8f88d2229fd7a7a59
    6 sg:hasFieldOfResearchCode anzsrc-for:01
    7 anzsrc-for:0104
    8 anzsrc-for:08
    9 anzsrc-for:0801
    10 sg:hasFundedPublication articles:046c94d527abd62df0dcc4d23b218654
    11 articles:0cf355928b6e188650e56d7674cd75b3
    12 articles:37905255b19fa1eb2d392c2b3cb5b6d5
    13 articles:8a64941b7d0ac490f2499cd3ef955104
    14 articles:8bae5f500b1d193843dd15a073b14b3d
    15 articles:97ed29153ea170d84146a5ce1e205842
    16 articles:df7c8c30d967187d9aef93a7a37cd1d3
    17 articles:f046a255c816aceda340db00f8751206
    18 sg:hasFundingOrganization grid-institutes:grid.280285.5
    19 sg:hasRecipientOrganization grid-institutes:grid.267308.8
    20 sg:language English
    21 sg:license http://scigraph.springernature.com/explorer/license/
    22 sg:scigraphId 6df28516e5bd570e16660476139761eb
    23 sg:startYear 2010
    24 sg:title Interactive machine learning methods for clinical natural language processing
    25 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=9132834
    26 rdf:type sg:Grant
    27 rdfs:label Grant: Interactive machine learning methods for clinical natural language processing
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/6df28516e5bd570e16660476139761eb'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/6df28516e5bd570e16660476139761eb'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/6df28516e5bd570e16660476139761eb'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/6df28516e5bd570e16660476139761eb'






    Preview window. Press ESC to close (or click here)


    ...