YEARS

-

AUTHORS

James D Malley

TITLE

Statistical Learning for Biomedical Data

ABSTRACT

This projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. Hence, as the data set grows the methods do optimally classify cases and subjects, for example. As routinely applied to data collected by clinicians or biomedical researchers, these new techniques require modifications and enhancements appropriate to data collected from these alternate sources. In particular, we address the problem of (1) greatly unbalanced data sets, where the researcher typically has only a handful of positive cases and a great many negative cases, (2) the issue of accurate estimates of prediction error rates, where the researcher typically has a relatively small data set upon which to do both model fitting and testing, and (3) the interpretation of the means by which the prediction engine operates and the development of practical prognostic factors. These three problems are essential questions facing the use of modern prediction engines, but have been only lightly studied by the machine learning community. On the other hand, the rigorous methods of the mathematical statistics community have demonstrated the unusual versatility and flexibility of these methods. We have applied these statistical learning machine schemes to a wide variety of biological datasets, such as a 1,000K SNP data set on childhood-onset schizophrenia. At the invitaion of Cambridge University Press we are writing a textbook on "Statistical Learning for Biological Data"; completion of text and publication is anticipated in 2009

FUNDED PUBLICATIONS

  • Practical experiences on the necessity of external validation.
  • Multiple neural network classification scheme for detection of colonic polyps in CT colonography data sets.
  • Support vector machines committee classification method for computer-aided polyp detection in CT colonography.
  • Immunogenetic risk and protective factors for the idiopathic inflammatory myopathies: distinct HLA-A, -B, -Cw, -DRB1 and -DQA1 allelic profiles and motifs define clinicopathologic groups in caucasians.
  • Immunogenetic differences between Caucasian women with and those without silicone implants in whom myositis develops.
  • Predictor correlation impacts machine learning algorithms: implications for genomic studies.
  • Computer-assisted detection of colonic polyps with CT colonography using neural networks and binary classification trees.
  • Short-term prediction of mortality in patients with systemic lupus erythematosus: classification of outcomes using random forests.
  • HLA polymorphisms in African Americans with idiopathic inflammatory myopathy: allelic profiles distinguish patients with different clinical phenotypes and myositis autoantibodies.
  • Immunogenetic risk and protective factors for juvenile dermatomyositis in Caucasians.
  • Immunogenetic risk and protective factors for the idiopathic inflammatory myopathies: distinct HLA-A, -B, -Cw, -DRB1, and -DQA1 allelic profiles distinguish European American patients with different myositis autoantibodies.
  • Evaluating interventions to improve gait in cerebral palsy: a meta-analysis of spatiotemporal measures.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    29 TRIPLES      15 PREDICATES      29 URIs      7 LITERALS

    Subject Predicate Object
    1 grants:4f86b8a3987221f5d8c609e1d95e87a6 sg:abstract This projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. Hence, as the data set grows the methods do optimally classify cases and subjects, for example. As routinely applied to data collected by clinicians or biomedical researchers, these new techniques require modifications and enhancements appropriate to data collected from these alternate sources. In particular, we address the problem of (1) greatly unbalanced data sets, where the researcher typically has only a handful of positive cases and a great many negative cases, (2) the issue of accurate estimates of prediction error rates, where the researcher typically has a relatively small data set upon which to do both model fitting and testing, and (3) the interpretation of the means by which the prediction engine operates and the development of practical prognostic factors. These three problems are essential questions facing the use of modern prediction engines, but have been only lightly studied by the machine learning community. On the other hand, the rigorous methods of the mathematical statistics community have demonstrated the unusual versatility and flexibility of these methods. We have applied these statistical learning machine schemes to a wide variety of biological datasets, such as a 1,000K SNP data set on childhood-onset schizophrenia. At the invitaion of Cambridge University Press we are writing a textbook on "Statistical Learning for Biological Data"; completion of text and publication is anticipated in 2009
    2 sg:fundingAmount 260182.0
    3 sg:fundingCurrency USD
    4 sg:hasContribution contributions:2cfbc6b8b46fac186b3964d6130ca85b
    5 sg:hasFieldOfResearchCode anzsrc-for:01
    6 anzsrc-for:0104
    7 anzsrc-for:08
    8 anzsrc-for:0801
    9 sg:hasFundedPublication articles:2767dbf951eb954fb35ddce1062550d6
    10 articles:361214d7c113b942cb401bab17581009
    11 articles:4a75d50e20377b34a29c8b0b13062a1d
    12 articles:65f9797ba2ef0e0c7f3887fbb52d9d63
    13 articles:737ba490ae0f7652b09298a146b5afb4
    14 articles:74172120a44cc603e723f061055ee02b
    15 articles:a2f15fa07a15b59c13321c295c11d824
    16 articles:a321109a45ff0d9ea53e2efdcf619786
    17 articles:b0f5474f0e061e40d53083646e033ecc
    18 articles:bd13d515ab586bda07a64ba2653db47b
    19 articles:c51acf44fde6c65e51286b31e0dea319
    20 articles:ee65b59710fc31383917fa4fe9c04e9d
    21 sg:hasFundingOrganization grid-institutes:grid.410422.1
    22 sg:hasRecipientOrganization grid-institutes:grid.410422.1
    23 sg:language English
    24 sg:license http://scigraph.springernature.com/explorer/license/
    25 sg:scigraphId 4f86b8a3987221f5d8c609e1d95e87a6
    26 sg:title Statistical Learning for Biomedical Data
    27 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=7733765
    28 rdf:type sg:Grant
    29 rdfs:label Grant: Statistical Learning for Biomedical Data
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'






    Preview window. Press ESC to close (or click here)


    ...