
AUTHORS TITLEStatistical Learning for Biomedical Data
ABSTRACTThis projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. Hence, as the data set grows the methods do optimally classify cases and subjects, for example. As routinely applied to data collected by clinicians or biomedical researchers, these new techniques require modifications and enhancements appropriate to data collected from these alternate sources. In particular, we address the problem of (1) greatly unbalanced data sets, where the researcher typically has only a handful of positive cases and a great many negative cases, (2) the issue of accurate estimates of prediction error rates, where the researcher typically has a relatively small data set upon which to do both model fitting and testing, and (3) the interpretation of the means by which the prediction engine operates and the development of practical prognostic factors. These three problems are essential questions facing the use of modern prediction engines, but have been only lightly studied by the machine learning community. On the other hand, the rigorous methods of the mathematical statistics community have demonstrated the unusual versatility and flexibility of these methods. We have applied these statistical learning machine schemes to a wide variety of biological datasets, such as a 1,000K SNP data set on childhoodonset schizophrenia. At the invitaion of Cambridge University Press we are writing a textbook on "Statistical Learning for Biological Data"; completion of text and publication is anticipated in 2009
FUNDED PUBLICATIONS
Download the RDF metadata as: jsonld nt turtle xml License info
29 TRIPLES 15 PREDICATES 29 URIs 7 LITERALS
Subject  Predicate  Object  

1  grants:4f86b8a3987221f5d8c609e1d95e87a6  sg:abstract  This projects studies statistical learning machines as applied to biomedical and clinical prediction, probabilitiy assignment, regresssion, and ranking problems. The algorithms involved include Random Forests, support vector machines, neural networks, and variations of the boosting algorithm. These are all recently developed techniques orginally constructed by the machine learning community, and which are only now starting to see applications in biomedical problems. These methods were not designed through familiar parametric statistical reasoning, but using the more advanced methods of nonparametric density estimation, are known to be provably Bayes risk consistent. Hence, as the data set grows the methods do optimally classify cases and subjects, for example. As routinely applied to data collected by clinicians or biomedical researchers, these new techniques require modifications and enhancements appropriate to data collected from these alternate sources. In particular, we address the problem of (1) greatly unbalanced data sets, where the researcher typically has only a handful of positive cases and a great many negative cases, (2) the issue of accurate estimates of prediction error rates, where the researcher typically has a relatively small data set upon which to do both model fitting and testing, and (3) the interpretation of the means by which the prediction engine operates and the development of practical prognostic factors. These three problems are essential questions facing the use of modern prediction engines, but have been only lightly studied by the machine learning community. On the other hand, the rigorous methods of the mathematical statistics community have demonstrated the unusual versatility and flexibility of these methods. We have applied these statistical learning machine schemes to a wide variety of biological datasets, such as a 1,000K SNP data set on childhoodonset schizophrenia. At the invitaion of Cambridge University Press we are writing a textbook on "Statistical Learning for Biological Data"; completion of text and publication is anticipated in 2009 
2  ″  sg:fundingAmount  260182.0 
3  ″  sg:fundingCurrency  USD 
4  ″  sg:hasContribution  contributions:2cfbc6b8b46fac186b3964d6130ca85b 
5  ″  sg:hasFieldOfResearchCode  anzsrcfor:01 
6  ″  ″  anzsrcfor:0104 
7  ″  ″  anzsrcfor:08 
8  ″  ″  anzsrcfor:0801 
9  ″  sg:hasFundedPublication  articles:2767dbf951eb954fb35ddce1062550d6 
10  ″  ″  articles:361214d7c113b942cb401bab17581009 
11  ″  ″  articles:4a75d50e20377b34a29c8b0b13062a1d 
12  ″  ″  articles:65f9797ba2ef0e0c7f3887fbb52d9d63 
13  ″  ″  articles:737ba490ae0f7652b09298a146b5afb4 
14  ″  ″  articles:74172120a44cc603e723f061055ee02b 
15  ″  ″  articles:a2f15fa07a15b59c13321c295c11d824 
16  ″  ″  articles:a321109a45ff0d9ea53e2efdcf619786 
17  ″  ″  articles:b0f5474f0e061e40d53083646e033ecc 
18  ″  ″  articles:bd13d515ab586bda07a64ba2653db47b 
19  ″  ″  articles:c51acf44fde6c65e51286b31e0dea319 
20  ″  ″  articles:ee65b59710fc31383917fa4fe9c04e9d 
21  ″  sg:hasFundingOrganization  gridinstitutes:grid.410422.1 
22  ″  sg:hasRecipientOrganization  gridinstitutes:grid.410422.1 
23  ″  sg:language  English 
24  ″  sg:license  http://scigraph.springernature.com/explorer/license/ 
25  ″  sg:scigraphId  4f86b8a3987221f5d8c609e1d95e87a6 
26  ″  sg:title  Statistical Learning for Biomedical Data 
27  ″  sg:webpage  http://projectreporter.nih.gov/project_info_description.cfm?aid=7733765 
28  ″  rdf:type  sg:Grant 
29  ″  rdfs:label  Grant: Statistical Learning for Biomedical Data 
JSONLD is a popular JSON format for linked data.
curl H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'
NTriples is a linebased linked data format ideal for batch operations .
curl H 'Accept: application/ntriples' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'
Turtle is a humanreadable linked data format.
curl H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'
RDF/XML is a standard XML format for linked data.
curl H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/4f86b8a3987221f5d8c609e1d95e87a6'