YEARS

2011-2015

AUTHORS

Vanathi Gopalakrishnan

TITLE

Bayesian Rule Learning Methods for Disease Prediction and Biomarker Discovery

ABSTRACT

DESCRIPTION (provided by applicant): The problem: High-throughput biomedical data from biomarker profiling studies aimed at early detection of diseases like lung cancer are accumulating rapidly. Although many popular machine learning methods have been utilized for analysis of such high-dimensional datasets, no single method has consistently outperformed others. Moreover, scientists have the need to simultaneously address two related tasks: disease prediction and biomarker discovery, using the same sets of data and tools. One way, as undertaken in this project, to address this need is to find the most accurate classifier for the disease from a given set of profiles and present the discriminative markers used in that model to the scientist for further verification. The large space of possible models coupled with the small sample size of the data make it hard to accurately estimate predictive accuracy. The solution: This project will develop, evaluate and refine novel Bayesian Rule Learning (BRL) methods that are algorithmically efficient, result in parsimonious models and accurately estimate predictive uncertainty from sparse biomedical datasets. BRL methods utilize a Bayesian score to evaluate rule models, thereby quantifying the uncertainty in the validity of the rule itself. This novel technique that combines the mathematical rigor of Bayesian network learning with rule-based modeling opens up a hitherto underexplored area of fundamental research in informatics involving such hybrid methodologies. Rules enable modular representation of knowledge and collaboration with scientists, as it is easier to present the model and extract markers both visually and computationally. Rule-based inference is also simpler and more tractable. The Bayesian approach enables prior knowledge to be incorporated and evaluated in a continual fashion with a human in the loop. The latter is very important for refinement of both tools and models. The specific aims: This project will test the hypothesis that the BRL methods developed and extended herein produce more accurate and parsimonious models for disease state prediction than other state-of-the-art machine learning methods. This project evaluates BRL methods and models using existing proteomic datasets for three diverse diseases - rare, neurodegenerative Amyotrophic Lateral Sclerosis (ALS), and the two most common cancers in the world, lung and breast cancers. Experimental verification will be performed using a new set of retrospectively collected breast cancer sera samples to evaluate model generalizability. The significance: This project will produce: (1) a novel biomedical data mining tool for analyzing data from biomarker profiling studies of any disease, (2) methodological insights into the applicability of this tool and current machine learning methods for such tasks, and (3) new data for research on the early detection of breast cancer. It has potential to help develop new diagnostic tests for early detection of ALS, lung and breast cancers and lays a firm foundation for building modeling frameworks that can incorporate both prior knowledge and data to provide the technological capability for combining evidence from multiple, heterogeneous sources.

FUNDED PUBLICATIONS

  • Creating a pipeline of talent for informatics: STEM initiative for high school students in computer science, biology, and biomedical informatics.
  • Novel MRI-derived quantitative biomarker for cardiac function applied to classifying ischemic cardiomyopathy within a Bayesian rule learning framework.
  • cMRI-BED: A novel informatics framework for cardiac MRI biomarker extraction and discovery applied to pediatric cardiomyopathy classification.
  • Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids.
  • Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids
  • Context-sensitive markov models for peptide scoring and identification from tandem mass spectrometry.
  • Evaluation of a 4-protein serum biomarker panel-biglycan, annexin-A6, myeloperoxidase, and protein S100-A9 (B-AMP)-for the detection of esophageal adenocarcinoma.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    26 TRIPLES      17 PREDICATES      27 URIs      9 LITERALS

    Subject Predicate Object
    1 grants:0d15483433f78b9f4dd32356fd96341a sg:abstract DESCRIPTION (provided by applicant): The problem: High-throughput biomedical data from biomarker profiling studies aimed at early detection of diseases like lung cancer are accumulating rapidly. Although many popular machine learning methods have been utilized for analysis of such high-dimensional datasets, no single method has consistently outperformed others. Moreover, scientists have the need to simultaneously address two related tasks: disease prediction and biomarker discovery, using the same sets of data and tools. One way, as undertaken in this project, to address this need is to find the most accurate classifier for the disease from a given set of profiles and present the discriminative markers used in that model to the scientist for further verification. The large space of possible models coupled with the small sample size of the data make it hard to accurately estimate predictive accuracy. The solution: This project will develop, evaluate and refine novel Bayesian Rule Learning (BRL) methods that are algorithmically efficient, result in parsimonious models and accurately estimate predictive uncertainty from sparse biomedical datasets. BRL methods utilize a Bayesian score to evaluate rule models, thereby quantifying the uncertainty in the validity of the rule itself. This novel technique that combines the mathematical rigor of Bayesian network learning with rule-based modeling opens up a hitherto underexplored area of fundamental research in informatics involving such hybrid methodologies. Rules enable modular representation of knowledge and collaboration with scientists, as it is easier to present the model and extract markers both visually and computationally. Rule-based inference is also simpler and more tractable. The Bayesian approach enables prior knowledge to be incorporated and evaluated in a continual fashion with a human in the loop. The latter is very important for refinement of both tools and models. The specific aims: This project will test the hypothesis that the BRL methods developed and extended herein produce more accurate and parsimonious models for disease state prediction than other state-of-the-art machine learning methods. This project evaluates BRL methods and models using existing proteomic datasets for three diverse diseases - rare, neurodegenerative Amyotrophic Lateral Sclerosis (ALS), and the two most common cancers in the world, lung and breast cancers. Experimental verification will be performed using a new set of retrospectively collected breast cancer sera samples to evaluate model generalizability. The significance: This project will produce: (1) a novel biomedical data mining tool for analyzing data from biomarker profiling studies of any disease, (2) methodological insights into the applicability of this tool and current machine learning methods for such tasks, and (3) new data for research on the early detection of breast cancer. It has potential to help develop new diagnostic tests for early detection of ALS, lung and breast cancers and lays a firm foundation for building modeling frameworks that can incorporate both prior knowledge and data to provide the technological capability for combining evidence from multiple, heterogeneous sources.
    2 sg:endYear 2015
    3 sg:fundingAmount 1084444.0
    4 sg:fundingCurrency USD
    5 sg:hasContribution contributions:74f2a81007b4481ca2bd5992d663dcf4
    6 sg:hasFieldOfResearchCode anzsrc-for:01
    7 anzsrc-for:0104
    8 anzsrc-for:08
    9 anzsrc-for:0801
    10 sg:hasFundedPublication articles:4e1416cf2006f5be50dee3f8d393a52a
    11 articles:5559362c150dfdb6c4b4ca9311f1b47c
    12 articles:6648da651b3231896eed9eace5763621
    13 articles:db51f1b2b951cb76a5d5cc7c34943794
    14 articles:e45427ce5eace199583b993f765ba487
    15 articles:f25a4f305255078421a370a6438ba3f0
    16 articles:f6e41537809bef7091dab2e64e36b9cc
    17 sg:hasFundingOrganization grid-institutes:grid.280285.5
    18 sg:hasRecipientOrganization grid-institutes:grid.21925.3d
    19 sg:language English
    20 sg:license http://scigraph.springernature.com/explorer/license/
    21 sg:scigraphId 0d15483433f78b9f4dd32356fd96341a
    22 sg:startYear 2011
    23 sg:title Bayesian Rule Learning Methods for Disease Prediction and Biomarker Discovery
    24 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=8497719
    25 rdf:type sg:Grant
    26 rdfs:label Grant: Bayesian Rule Learning Methods for Disease Prediction and Biomarker Discovery
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/0d15483433f78b9f4dd32356fd96341a'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/0d15483433f78b9f4dd32356fd96341a'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/0d15483433f78b9f4dd32356fd96341a'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/0d15483433f78b9f4dd32356fd96341a'






    Preview window. Press ESC to close (or click here)


    ...