PUBLICATION DATE

2012

TITLE

A cross-validation scheme for machine learning algorithms in shotgun proteomics.

ISSUE

N/A

VOLUME

13 Suppl 16

ISSN (print)

N/A

ISSN (electronic)

N/A

ABSTRACT

Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

Related objects

JOURNAL BRAND

N/A (note: articles not published by Springer Nature have limited metadata)


FROM GRANT

  • Machine Learning Analysis Of Tandem Mass Spectra
  • Comprehensive Biology: Exploiting The Yeast Genome
  • How to use: Click on a object to move its position. Double click to open its homepage. Right click to preview its contents.

    Download the RDF metadata as:   json-ld nt turtle xml License info


    13 TRIPLES      12 PREDICATES      14 URIs      8 LITERALS

    Subject Predicate Object
    1 articles:932d0ac151535edff880bd853f37133a sg:abstract Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.
    2 sg:doi 10.1186/1471-2105-13-s16-s3
    3 sg:doiLink http://dx.doi.org/10.1186/1471-2105-13-s16-s3
    4 sg:isFundedPublicationOf grants:f93911b838d67d22c243e3df7adfd1c3
    5 grants:fe3b0ccd61ae3267d15535b7ef8c5b58
    6 sg:language English
    7 sg:license http://scigraph.springernature.com/explorer/license/
    8 sg:publicationYear 2012
    9 sg:scigraphId 932d0ac151535edff880bd853f37133a
    10 sg:title A cross-validation scheme for machine learning algorithms in shotgun proteomics.
    11 sg:volume 13 Suppl 16
    12 rdf:type sg:Article
    13 rdfs:label Article: A cross-validation scheme for machine learning algorithms in shotgun proteomics.
    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular JSON format for linked data.

    curl -H 'Accept: application/ld+json' 'http://scigraph.springernature.com/things/articles/932d0ac151535edff880bd853f37133a'

    N-Triples is a line-based linked data format ideal for batch operations .

    curl -H 'Accept: application/n-triples' 'http://scigraph.springernature.com/things/articles/932d0ac151535edff880bd853f37133a'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'http://scigraph.springernature.com/things/articles/932d0ac151535edff880bd853f37133a'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'http://scigraph.springernature.com/things/articles/932d0ac151535edff880bd853f37133a'






    Preview window. Press ESC to close (or click here)


    ...