YEARS

2008-2009

AUTHORS

Ata Kaban

TITLE

Generative-discriminative hybrids for disease prediction and cell communication modelling

ABSTRACT

We aim to investigate new advances and expertise in machine learning to improve the reliability of disease prediction from genomic and proteomic data, and to enable answering novel biological questions regarding cell-cell communication mechanisms that underlie the development of disease. Statistical machine learning methods have already been shown to hold a lot of promise towards these goals in principle. However, high-throughput technologies result in increasingly high dimensional data, while the number of samples remains limited. The implications of these extreme conditions are largely overlooked by the existing state of the art. In addition, new biological questions are being asked that currently existing techniques are unable to tackle. Recent results in machine learning make it possible to address these issues. In particular, hybridizing generative and discriminative models may blend the benefits of both and may reduce the required sample size. An informed choice of distance functions and data models may mitigate the curse of dimensionality. Adapting certain techniques previously developed for social network inference may provide the required modelling power for inferring cell-cell communication mechanisms. By exploring the potential of these techniques, we hope to pave the way towards creating novel and improved computational methods for life scientists. Technical Summary This research aims at investigating the application of new advances and expertise in specific areas of machine learning to improve disease prediction and enabling to answer biological questions that would not be possible with the existing state of the art. A first objective is to improve disease prediction algorithms by taking into consideration the excessively high-dimensional and low sample size nature of biological data sets. Ways of hybridising generative and discriminative models represent a technique to control and reduce the sample size required for training, previously exploited in other application domains. At present, such methods are not available to life scientists. Also, there are recent theoretical results on how to control and mitigate the high dimensionality problem that I would like to apply. A second objective is to devise a new time series model, in collaboration with biologist experts, that is appropriate for inferring cell-cell communication networks. This would provide a new way of studying the development of disease. Whilst machine learning methods were previously successful for inferring gene communication networks in a single cell, similar tools for inferring cell-cell networks from data are not available to biologists at present. These should be able to account for varying delays in time. I have previously developed an algorithm (for social network inference) that has this functionality, and I would like to adapt this method for modelling cell-cell communication.

FUNDED PUBLICATIONS

  • Classification of mislabelled microarrays using robust sparse logistic regression.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    21 TRIPLES      17 PREDICATES      22 URIs      10 LITERALS

    Subject Predicate Object
    1 grants:7642ea2a627f19167ca612198ada9093 sg:abstract We aim to investigate new advances and expertise in machine learning to improve the reliability of disease prediction from genomic and proteomic data, and to enable answering novel biological questions regarding cell-cell communication mechanisms that underlie the development of disease. Statistical machine learning methods have already been shown to hold a lot of promise towards these goals in principle. However, high-throughput technologies result in increasingly high dimensional data, while the number of samples remains limited. The implications of these extreme conditions are largely overlooked by the existing state of the art. In addition, new biological questions are being asked that currently existing techniques are unable to tackle. Recent results in machine learning make it possible to address these issues. In particular, hybridizing generative and discriminative models may blend the benefits of both and may reduce the required sample size. An informed choice of distance functions and data models may mitigate the curse of dimensionality. Adapting certain techniques previously developed for social network inference may provide the required modelling power for inferring cell-cell communication mechanisms. By exploring the potential of these techniques, we hope to pave the way towards creating novel and improved computational methods for life scientists. Technical Summary This research aims at investigating the application of new advances and expertise in specific areas of machine learning to improve disease prediction and enabling to answer biological questions that would not be possible with the existing state of the art. A first objective is to improve disease prediction algorithms by taking into consideration the excessively high-dimensional and low sample size nature of biological data sets. Ways of hybridising generative and discriminative models represent a technique to control and reduce the sample size required for training, previously exploited in other application domains. At present, such methods are not available to life scientists. Also, there are recent theoretical results on how to control and mitigate the high dimensionality problem that I would like to apply. A second objective is to devise a new time series model, in collaboration with biologist experts, that is appropriate for inferring cell-cell communication networks. This would provide a new way of studying the development of disease. Whilst machine learning methods were previously successful for inferring gene communication networks in a single cell, similar tools for inferring cell-cell networks from data are not available to biologists at present. These should be able to account for varying delays in time. I have previously developed an algorithm (for social network inference) that has this functionality, and I would like to adapt this method for modelling cell-cell communication.
    2 sg:endYear 2009
    3 sg:fundingAmount 99349.0
    4 sg:fundingCurrency GBP
    5 sg:hasContribution contributions:24cd64c2762974b3f16d19236feb0df1
    6 sg:hasFieldOfResearchCode anzsrc-for:01
    7 anzsrc-for:0104
    8 anzsrc-for:08
    9 anzsrc-for:0801
    10 sg:hasFundedPublication articles:627c2e3b00755a643571ee6f4983aacd
    11 sg:hasFundingOrganization grid-institutes:grid.14105.31
    12 sg:hasRecipientOrganization grid-institutes:grid.6572.6
    13 sg:language English
    14 sg:license http://scigraph.springernature.com/explorer/license/
    15 Contains UK public sector information licensed under the Open Government Licence v2.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/).
    16 sg:scigraphId 7642ea2a627f19167ca612198ada9093
    17 sg:startYear 2008
    18 sg:title Generative-discriminative hybrids for disease prediction and cell communication modelling
    19 sg:webpage http://gtr.rcuk.ac.uk/project/3F246564-5CC9-476C-BA72-39FFC7BAE987
    20 rdf:type sg:Grant
    21 rdfs:label Grant: Generative-discriminative hybrids for disease prediction and cell communication modelling
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/7642ea2a627f19167ca612198ada9093'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/7642ea2a627f19167ca612198ada9093'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/7642ea2a627f19167ca612198ada9093'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/7642ea2a627f19167ca612198ada9093'






    Preview window. Press ESC to close (or click here)


    ...