YEARS

2010-2016

AUTHORS

Yufeng Liu, Yichao Wu

TITLE

Flexible statistical machine learning techniques for cancer-related data

ABSTRACT

DESCRIPTION (provided by applicant): Gene expression provides a snapshot of the cellular changes that promote tumor malignancy. Quantitative gene expression analysis, especially as implemented by DNA microarrays, has identified many new important cancer related genes and led to the development of new genomic-based clinical tests. For the quantitative aspect of gene expression analysis, many statistical methods have been used to study human tumors and to classify them into groups that can be used to predict clinical behavior. Despite progress, with the rapid advance of technology, massive and complex data are being generated in cancer research. Analyzing such data becomes more and more challenging. These challenges call for novel statistical learning methods, especially for high dimensional and noisy data. The goal of this project is to develop a host of new statistical learning techniques for solving complicated learning problems. In particular, this project develops (1) novel techniques to assess statistical significance of clustering for high dimensional data; (2) several novel predictive models including classification and regression which are expected to yield highly competitive accuracy and interpretability; (3) new methods for high dimensional biomarker/variable selection; (4) new approaches to estimate high dimensional covariance/precision matrix for biological network construction. These new developments are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability. The research team will apply the proposed techniques to cancer research data analysis. The success of this project will be important in bridging statistical machine learning and cancer research.

FUNDED PUBLICATIONS

  • FDM: a graph-based statistical method to detect differential transcription using RNA-seq data.
  • Simultaneous multiple non-crossing quantile regression estimation using kernel constraints.
  • Variable Selection for Sparse High-Dimensional Nonlinear Regression Models by Combining Nonnegative Garrote and Sure Independence Screening.
  • Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.
  • An ordinary differential equation based solution path algorithm.
  • A robust method for transcript quantification with RNA-seq data.
  • Functional robust support vector machines for sparse and irregular longitudinal data.
  • Mutations in isocitrate dehydrogenase 1 and 2 occur frequently in intrahepatic cholangiocarcinomas and share hypermethylation targets with glioblastomas
  • Robust penalized logistic regression with truncated loss functions.
  • Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.
  • Coordinate great circle descent algorithm with application to single-index models.
  • Utility-based Weighted Multicategory Robust Support Vector Machines.
  • Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures.
  • A Generic Path Algorithm for Regularized Statistical Estimation.
  • Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation.
  • Nonlinear Vertex Discriminant Analysis with Reproducing Kernels.
  • Probability estimation with machine learning methods for dichotomous and multicategory outcome: applications.
  • Two-Dimensional Solution Surface for Weighted Support Vector Machines.
  • High XRCC1 protein expression is associated with poorer survival in patients with head and neck squamous cell carcinoma.
  • SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression.
  • Simultaneous Multiple Response Regression and Inverse Covariance Matrix Estimation via Penalized Gaussian Maximum Likelihood.
  • Adaptively Weighted Large Margin Classifiers.
  • Comprehensive genomic characterization of squamous cell lung cancers
  • Mutations in isocitrate dehydrogenase 1 and 2 occur frequently in intrahepatic cholangiocarcinomas and share hypermethylation targets with glioblastomas.
  • Variable Selection in Nonparametric Classification via Measurement Error Model Selection Likelihoods.
  • Variable selection in large margin classifier-based probability estimation with high-dimensional predictors.
  • MARGINAL EMPIRICAL LIKELIHOOD AND SURE INDEPENDENCE FEATURE SCREENING.
  • Automatic structure recovery for additive models.
  • DiffSplice: the genome-wide detection of differential splicing events with RNA-seq.
  • Multicategory Composite Least Squares Classifiers.
  • Effective dimension reduction for sparse functional data.
  • ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM.
  • R/DWD: distance-weighted discrimination for classification, visualization and batch adjustment.
  • Linear or Nonlinear? Automatic Structure Discovery for Partially Linear Models.
  • Non-crossing large-margin probability estimation and its application to robust SVM via preconditioning.
  • Comprehensive genomic characterization of squamous cell lung cancers.
  • Parametrically guided estimation in nonparametric varying coefficient models with quasi-likelihood.
  • Multiple Response Regression for Gaussian Mixture Models with Known Labels.
  • Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types.
  • Homogeneity Pursuit.
  • Weighted Distance Weighted Discrimination and Its Asymptotic Properties.
  • LOCAL KERNEL CANONICAL CORRELATION ANALYSIS WITH APPLICATION TO VIRTUAL DRUG SCREENING.
  • SigFuge: single gene clustering of RNA-seq reveals differential isoform usage among cancer samples.
  • Bidirectional discrimination with application to data visualization.
  • Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension.
  • Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures
  • Kernel Continuum Regression.
  • Robust Model-Free Multiclass Probability Estimation.
  • Probability-enhanced sufficient dimension reduction for binary classification.
  • Hard or Soft Classification? Large-margin Unified Machines.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    72 TRIPLES      17 PREDICATES      73 URIs      9 LITERALS

    Subject Predicate Object
    1 grants:02e0cff2eac53e0482455064602f94c2 sg:abstract DESCRIPTION (provided by applicant): Gene expression provides a snapshot of the cellular changes that promote tumor malignancy. Quantitative gene expression analysis, especially as implemented by DNA microarrays, has identified many new important cancer related genes and led to the development of new genomic-based clinical tests. For the quantitative aspect of gene expression analysis, many statistical methods have been used to study human tumors and to classify them into groups that can be used to predict clinical behavior. Despite progress, with the rapid advance of technology, massive and complex data are being generated in cancer research. Analyzing such data becomes more and more challenging. These challenges call for novel statistical learning methods, especially for high dimensional and noisy data. The goal of this project is to develop a host of new statistical learning techniques for solving complicated learning problems. In particular, this project develops (1) novel techniques to assess statistical significance of clustering for high dimensional data; (2) several novel predictive models including classification and regression which are expected to yield highly competitive accuracy and interpretability; (3) new methods for high dimensional biomarker/variable selection; (4) new approaches to estimate high dimensional covariance/precision matrix for biological network construction. These new developments are expected to allow scientists to analyze complex cancer genomic data with accurate prediction accuracy and increased interpretability. The research team will apply the proposed techniques to cancer research data analysis. The success of this project will be important in bridging statistical machine learning and cancer research.
    2 sg:endYear 2016
    3 sg:fundingAmount 1457160.0
    4 sg:fundingCurrency USD
    5 sg:hasContribution contributions:905699f7741d6fcd970bbd4ebf3f14af
    6 contributions:cb28fc2632ac8a2ea6bdb038d84fcf62
    7 sg:hasFieldOfResearchCode anzsrc-for:01
    8 anzsrc-for:0104
    9 anzsrc-for:08
    10 anzsrc-for:0801
    11 anzsrc-for:11
    12 anzsrc-for:1112
    13 sg:hasFundedPublication articles:01efd2d38d69cf99b372e1f9f095e20f
    14 articles:03cd071facf0262d4194670e90a11bb9
    15 articles:089b23a2aa5eea2d3569552b48341736
    16 articles:0c3dc9f089da4f9c266a40ce01eb7d03
    17 articles:11e1d6f49eabcb6295a41eb943912186
    18 articles:1243526c46f8b3ffc2de112fde77c180
    19 articles:15cd58cc9e17e9579278440a0018ed0a
    20 articles:17452e64b47d9ea579146fc1df48ef5f
    21 articles:1cd67d3fbd4f87c93ba5ce333684c70c
    22 articles:214498b0ddb4aa5f92be5eacc6299598
    23 articles:2c76ba2aa169495bc4684f57122e4e04
    24 articles:3031df862ece8df2ce5828f292301ad8
    25 articles:3426933a830172905212a8c80726b2ba
    26 articles:34b8b3c57931d451f1c6bf666548cbaa
    27 articles:381ea6ccbcce5d9e5af063d694ec629a
    28 articles:4275121218e61b7978f76e9975e49811
    29 articles:4b0c6f9f0444a0c966524d65f28cbe0d
    30 articles:52d44e474167a5e2e76f262204dce427
    31 articles:53465eea22a6577ddc7e87874aff2e6c
    32 articles:61fdefd95e1fe5fe79261e4e7cc8cd29
    33 articles:6d187dbabbebf29d2abf9fc761a20648
    34 articles:703da049e39ac79f8e7fc7ad943098af
    35 articles:743c1765ee1d4b2aed739b10ab718223
    36 articles:7ab0e508ec19fb021b8f3a1a149646aa
    37 articles:8361b37540176f5217da4c1249010eeb
    38 articles:8993f98422d3721682c4bd3a4f0f074e
    39 articles:8f5886f0a6c3a6ffcd98570ea921fac3
    40 articles:971661ee564cd9f7001a701b32220bf4
    41 articles:9740db925c57d151b67e01c068ac7439
    42 articles:978f2a732939188be79508b8a36574ea
    43 articles:991f2863db0b829b3fb79ac3c1ce58b2
    44 articles:aa33ff9574931ae0a0821822101b2390
    45 articles:b91b9df46ee82871f7586f82f880eb3c
    46 articles:b9a42b59fb33cfae56d944991be94e03
    47 articles:c79943c3c17595753e908db9945d65c6
    48 articles:c7b37bb0bc565e225de7fd4065ac17ed
    49 articles:c7e3aa4b64f87cb70ac02ab95a0e5d51
    50 articles:c8156ffc59fd7926611c0799732cd0ba
    51 articles:ca267901c1ca25495a9fe9342c33f9e0
    52 articles:cf7a9aee299eaa7e19328b41121cf690
    53 articles:d4cf4a9ec8f82fab2e778e195175e1af
    54 articles:d74420482f384ad6a4037cbba07d7127
    55 articles:e0a2503d8b66f24fc533edde3f9ec04e
    56 articles:eabda8ab1256f7a124302aa318e76f35
    57 articles:ee2046a9fbda835d841001787c5b5738
    58 articles:ee75e0750b4597267b01041d667f5025
    59 articles:f1a6fa115cacba8b2725d9d3cbfae9c3
    60 articles:f1f99dd08531e15fbdab9e54b1e48dbc
    61 articles:f2833b2ee639cc8445b7e258919e642d
    62 articles:f673614b40b995928403ff009d62093c
    63 sg:hasFundingOrganization grid-institutes:grid.48336.3a
    64 sg:hasRecipientOrganization grid-institutes:grid.10698.36
    65 sg:language English
    66 sg:license http://scigraph.springernature.com/explorer/license/
    67 sg:scigraphId 02e0cff2eac53e0482455064602f94c2
    68 sg:startYear 2010
    69 sg:title Flexible statistical machine learning techniques for cancer-related data
    70 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=8603850
    71 rdf:type sg:Grant
    72 rdfs:label Grant: Flexible statistical machine learning techniques for cancer-related data
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/02e0cff2eac53e0482455064602f94c2'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/02e0cff2eac53e0482455064602f94c2'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/02e0cff2eac53e0482455064602f94c2'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/02e0cff2eac53e0482455064602f94c2'






    Preview window. Press ESC to close (or click here)


    ...