YEARS

2008-2010

AUTHORS

Kellie J Archer

TITLE

Recursive partitioning and ensemble methods for classifying an ordinal response

ABSTRACT

DESCRIPTION (provided by applicant): Classification methods applied to microarray data have largely been those developed by the machine learning community, since the large p (number of covariates) problem is inherent in high-throughput genomic experiments. The random forest (RF) methodology has been demonstrated to be competitive with other machine learning approaches (e.g., neural networks and support vector machines). Apart from improved accuracy, a clear advantage of the RF method in comparison to most machine learning approaches is that variable importance measures are provided by the algorithm. Therefore, one can assess the relative importance each gene has on the predictive model. In a large number of applications, the class to be predicted may be inherently ordinal. Examples of ordinal responses include TNM stage (I,II,III, IV);drug toxicity (none, mild, moderate, severe);or response to treatment classified as complete response, partial response, stable disease, and progressive disease. These responses are ordinal;while there is an inherent ordering among the responses, there is no known underlying numerical relationship between them. While one can apply standard nominal response methods to ordinal response data, in so doing one loses the ordered information inherent in the data. Since ordinal classification methods have been largely neglected in the machine learning literature, the specific aims of this proposal are to (1) extend the recursive partitioning and RF methodologies for predicting an ordinal response by developing computational tools for the R programming environment;(2) evaluate the proposed ordinal classification methods against alternative methods using simulated, benchmark, and gene expression datasets;(3) develop and evaluate methods for assessing variable importance when interest is in predicting an ordinal response. Novel splitting criteria for classification tree growing and methods for estimating variable importance are proposed, which appropriately take the nature of the ordinal response into consideration. In addition, the Generalized Gini index and ordered twoing methods will be studied under the ensemble learning framework, which has not been previously conducted. This project is significant to the scientific community since the ordinal classification methods to be made available from this project will be broadly applicable to a variety of health, social, and behavioral research fields, which commonly collect responses on an ordinal scale.

FUNDED PUBLICATIONS

  • Ordinal response prediction using bootstrap aggregation, with application to a high-throughput methylation data set.
  • Detection call algorithms for high-throughput gene expression microarray data.
  • High-throughput DNA methylation datasets for evaluating false discovery rate methodologies.
  • articles:c93e63445670f5cc5875a5c762f51776
  • L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    24 TRIPLES      17 PREDICATES      25 URIs      9 LITERALS

    Subject Predicate Object
    1 grants:8ddff6548e5cc69372dd3489e106d673 sg:abstract DESCRIPTION (provided by applicant): Classification methods applied to microarray data have largely been those developed by the machine learning community, since the large p (number of covariates) problem is inherent in high-throughput genomic experiments. The random forest (RF) methodology has been demonstrated to be competitive with other machine learning approaches (e.g., neural networks and support vector machines). Apart from improved accuracy, a clear advantage of the RF method in comparison to most machine learning approaches is that variable importance measures are provided by the algorithm. Therefore, one can assess the relative importance each gene has on the predictive model. In a large number of applications, the class to be predicted may be inherently ordinal. Examples of ordinal responses include TNM stage (I,II,III, IV);drug toxicity (none, mild, moderate, severe);or response to treatment classified as complete response, partial response, stable disease, and progressive disease. These responses are ordinal;while there is an inherent ordering among the responses, there is no known underlying numerical relationship between them. While one can apply standard nominal response methods to ordinal response data, in so doing one loses the ordered information inherent in the data. Since ordinal classification methods have been largely neglected in the machine learning literature, the specific aims of this proposal are to (1) extend the recursive partitioning and RF methodologies for predicting an ordinal response by developing computational tools for the R programming environment;(2) evaluate the proposed ordinal classification methods against alternative methods using simulated, benchmark, and gene expression datasets;(3) develop and evaluate methods for assessing variable importance when interest is in predicting an ordinal response. Novel splitting criteria for classification tree growing and methods for estimating variable importance are proposed, which appropriately take the nature of the ordinal response into consideration. In addition, the Generalized Gini index and ordered twoing methods will be studied under the ensemble learning framework, which has not been previously conducted. This project is significant to the scientific community since the ordinal classification methods to be made available from this project will be broadly applicable to a variety of health, social, and behavioral research fields, which commonly collect responses on an ordinal scale.
    2 sg:endYear 2010
    3 sg:fundingAmount 230013.0
    4 sg:fundingCurrency USD
    5 sg:hasContribution contributions:963e57baf92b1cd511a08e05b185c800
    6 sg:hasFieldOfResearchCode anzsrc-for:01
    7 anzsrc-for:0104
    8 anzsrc-for:08
    9 anzsrc-for:0801
    10 sg:hasFundedPublication articles:539e1cac68d0e362a086f51ee53f666a
    11 articles:6616eb9851117a972a96cf4b61bc9c12
    12 articles:a528a92604da5b8279ffd086c3bae3fc
    13 articles:c93e63445670f5cc5875a5c762f51776
    14 articles:d8d508a9569416f7fecb36a1e91b9cb9
    15 sg:hasFundingOrganization grid-institutes:grid.280285.5
    16 sg:hasRecipientOrganization grid-institutes:grid.224260.0
    17 sg:language English
    18 sg:license http://scigraph.springernature.com/explorer/license/
    19 sg:scigraphId 8ddff6548e5cc69372dd3489e106d673
    20 sg:startYear 2008
    21 sg:title Recursive partitioning and ensemble methods for classifying an ordinal response
    22 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=8049892
    23 rdf:type sg:Grant
    24 rdfs:label Grant: Recursive partitioning and ensemble methods for classifying an ordinal response
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/8ddff6548e5cc69372dd3489e106d673'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/8ddff6548e5cc69372dd3489e106d673'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/8ddff6548e5cc69372dd3489e106d673'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/8ddff6548e5cc69372dd3489e106d673'






    Preview window. Press ESC to close (or click here)


    ...