YEARS

-

AUTHORS

Willy John Wilbur

TITLE

General and Semi-supervised Machine Learning Applied to Bioinformatics

ABSTRACT

1) Many different methods have been investigated for the purpose of clustering sets of documents with the hope of improving retrieval. Unfortunately these have generally failed to provide improved retrieval capability. Part of the problem is clearly the fact that a given document often involves more than one subject so that it is not possible to make a clean categorization of the documents into definite categories to the exclusion of others. In order to overcome this problem we have developed methods that are designed to identify a theme among a set of documents. The theme need not encompass the whole of any document. It only needs to exist in some subset of the documents in order to be identifiable. Some of these same documents may participate in the definition of several themes. One method of finding themes is based on the EM algorithm and requires an iterative procedure which converges to themes. The method has been implemented and tested and found to be successful. 2) A second approach can be based on the singular value decomposition and essentially is a vector approach. 3) We are also investigating other methods to extract higher level features. One method of interest is the method known as sparse coding, which is the basis of self-taught learning.

FUNDED PUBLICATIONS

  • Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    16 TRIPLES      15 PREDICATES      16 URIs      7 LITERALS

    Subject Predicate Object
    1 grants:d2ffe667294803e6d2a5d488fd661e6d sg:abstract 1) Many different methods have been investigated for the purpose of clustering sets of documents with the hope of improving retrieval. Unfortunately these have generally failed to provide improved retrieval capability. Part of the problem is clearly the fact that a given document often involves more than one subject so that it is not possible to make a clean categorization of the documents into definite categories to the exclusion of others. In order to overcome this problem we have developed methods that are designed to identify a theme among a set of documents. The theme need not encompass the whole of any document. It only needs to exist in some subset of the documents in order to be identifiable. Some of these same documents may participate in the definition of several themes. One method of finding themes is based on the EM algorithm and requires an iterative procedure which converges to themes. The method has been implemented and tested and found to be successful. 2) A second approach can be based on the singular value decomposition and essentially is a vector approach. 3) We are also investigating other methods to extract higher level features. One method of interest is the method known as sparse coding, which is the basis of self-taught learning.
    2 sg:fundingAmount 139177.0
    3 sg:fundingCurrency USD
    4 sg:hasContribution contributions:44ab53ef3da99b2ed5ecec0891bf00e0
    5 sg:hasFieldOfResearchCode anzsrc-for:08
    6 anzsrc-for:0801
    7 sg:hasFundedPublication articles:f11168d6ea77b12ddb935f0c73aecf2e
    8 sg:hasFundingOrganization grid-institutes:grid.280285.5
    9 sg:hasRecipientOrganization grid-institutes:grid.280285.5
    10 sg:language English
    11 sg:license http://scigraph.springernature.com/explorer/license/
    12 sg:scigraphId d2ffe667294803e6d2a5d488fd661e6d
    13 sg:title General and Semi-supervised Machine Learning Applied to Bioinformatics
    14 sg:webpage http://projectreporter.nih.gov/project_info_description.cfm?aid=7735076
    15 rdf:type sg:Grant
    16 rdfs:label Grant: General and Semi-supervised Machine Learning Applied to Bioinformatics
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/grants/d2ffe667294803e6d2a5d488fd661e6d'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/grants/d2ffe667294803e6d2a5d488fd661e6d'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/grants/d2ffe667294803e6d2a5d488fd661e6d'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/grants/d2ffe667294803e6d2a5d488fd661e6d'






    Preview window. Press ESC to close (or click here)


    ...