Controlling false discoveries in high-dimensional situations: boosting with stability selection View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2015-12

AUTHORS

Benjamin Hofner, Luigi Boccuto, Markus Göker

ABSTRACT

BACKGROUND: Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given. RESULTS: Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways. CONCLUSION: Stability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound. More... »

PAGES

144

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12859-015-0575-3

DOI

http://dx.doi.org/10.1186/s12859-015-0575-3

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1028801735

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/25943565


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Biomarkers", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Case-Control Studies", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Child Development Disorders, Pervasive", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computer Simulation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "False Positive Reactions", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Gene Expression Profiling", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Models, Statistical", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Nonlinear Dynamics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Phenotype", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Erlangen-Nuremberg", 
          "id": "https://www.grid.ac/institutes/grid.5330.5", 
          "name": [
            "Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-University Erlangen-Nuremberg, Waldstra\u00dfe 6, 91054, Erlangen, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hofner", 
        "givenName": "Benjamin", 
        "id": "sg:person.01044432226.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01044432226.13"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Greenwood Genetic Center", 
          "id": "https://www.grid.ac/institutes/grid.418307.9", 
          "name": [
            "Greenwood Genetic Center, 113 Gregor Mendel Circle, 29646, Greenwood, SC, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Boccuto", 
        "givenName": "Luigi", 
        "id": "sg:person.0771354551.55", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0771354551.55"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Leibniz Institute DSMZ \u2013 German Collection of Microorganisms and Cell Cultures", 
          "id": "https://www.grid.ac/institutes/grid.420081.f", 
          "name": [
            "Leibniz Institute DSMZ \u2013 German Collection of Microorganisms and Cell Cultures, Inhoffenstra\u00dfe 7b, 38124, Braunschweig, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "G\u00f6ker", 
        "givenName": "Markus", 
        "id": "sg:person.0646712066.27", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0646712066.27"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1371/journal.pone.0034846", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000073536"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1467-9868.2010.00740.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000696823"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1467-9868.2010.00740.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000696823"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1890/10-2276.1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000901718"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.091062498", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001631710"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00180-012-0382-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001684018", 
          "https://doi.org/10.1007/s00180-012-0382-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jclinepi.2007.11.014", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004167802"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.csda.2013.02.022", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009360444"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1467-9868.2011.01034.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1012681255"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jclinepi.2004.04.003", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013946743"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-12-366", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016369639", 
          "https://doi.org/10.1186/1471-2105-12-366"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1515/1544-6115.1792", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017320119"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1541-0420.2008.01112.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018081846"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-8-25", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019863657", 
          "https://doi.org/10.1186/1471-2105-8-25"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.186501", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020261175"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/aos/1016218223", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020629296"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1890/10-0602.1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020740452"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.csda.2010.11.015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021359513"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1752-0509-6-145", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1022589506", 
          "https://doi.org/10.1186/1752-0509-6-145"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nmeth.2016", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023029250", 
          "https://doi.org/10.1038/nmeth.2016"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1010933404324", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024739340", 
          "https://doi.org/10.1023/a:1010933404324"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg2484", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030687647", 
          "https://doi.org/10.1038/nrg2484"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.cppeds.2012.08.001", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035837013"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/2040-2392-4-16", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038289950", 
          "https://doi.org/10.1186/2040-2392-4-16"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btt291", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038435662"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btt291", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038435662"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/009053604000000067", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038945634"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pone.0084483", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039586310"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-9-136", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041231178", 
          "https://doi.org/10.1186/1471-2105-9-136"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1467-9868.2005.00503.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043971564"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11222-014-9520-y", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045070158", 
          "https://doi.org/10.1007/s11222-014-9520-y"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-15-236", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045101318", 
          "https://doi.org/10.1186/1471-2105-15-236"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btq600", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045324570"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1146/annurev-statistics-022513-115545", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045721889"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.csda.2008.09.009", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047435658"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/07-sts242", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049744920"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.1658", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052467241", 
          "https://doi.org/10.1038/nbt.1658"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt.1658", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052467241", 
          "https://doi.org/10.1038/nbt.1658"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/ss/1177013604", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052575245"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ac031386+", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1054994858"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ac031386+", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1054994858"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1198/016214503000125", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064198102"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1198/jasa.2011.ap09272", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064200650"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1198/jcgs.2011.09220", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064201123"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/aoms/1177703732", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064400228"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3414/me11-02-0030", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1071312140"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2015-12", 
    "datePublishedReg": "2015-12-01", 
    "description": "BACKGROUND: Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n\u226ap). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given.\nRESULTS: Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways.\nCONCLUSION: Stability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12859-015-0575-3", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "16"
      }
    ], 
    "name": "Controlling false discoveries in high-dimensional situations: boosting with stability selection", 
    "pagination": "144", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "89346bffd945cf24dd79e48ff00a366dbf7b7b9a84aeada3165b80c020ea1dac"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "25943565"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12859-015-0575-3"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1028801735"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12859-015-0575-3", 
      "https://app.dimensions.ai/details/publication/pub.1028801735"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T09:58", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000347_0000000347/records_89812_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2Fs12859-015-0575-3"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0575-3'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0575-3'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0575-3'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-015-0575-3'


 

This table displays all metadata directly associated to this object as RDF triples.

270 TRIPLES      21 PREDICATES      82 URIs      32 LITERALS      20 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12859-015-0575-3 schema:about N18ba12d320274bc797ed8676f562ff12
2 N1ef727b8275d4dd987e70ac43afbc513
3 N2509d05a43684bea80a51c93f2721c21
4 N3cc6dce9f18740de8ab93b217f5e97bd
5 N59dc1fff230f4800a37dd2768e465a40
6 N6058a4fd283c4949a1eb3db02a9e6a18
7 N739183833577418f8eec2cb59e1d0df4
8 N800346b300da4f31858a50c60d4fb2ca
9 N94bc993244b7463890d8121eec3ebb80
10 Ndf84d60b233c4221a32d61922f3264a8
11 Neeb2f2fc0dd64e80a28f907e4e266ecd
12 anzsrc-for:01
13 anzsrc-for:0104
14 schema:author N3b235d54be564d83a1c42fc3690b5b18
15 schema:citation sg:pub.10.1007/s00180-012-0382-5
16 sg:pub.10.1007/s11222-014-9520-y
17 sg:pub.10.1023/a:1010933404324
18 sg:pub.10.1038/nbt.1658
19 sg:pub.10.1038/nmeth.2016
20 sg:pub.10.1038/nrg2484
21 sg:pub.10.1186/1471-2105-12-366
22 sg:pub.10.1186/1471-2105-15-236
23 sg:pub.10.1186/1471-2105-8-25
24 sg:pub.10.1186/1471-2105-9-136
25 sg:pub.10.1186/1752-0509-6-145
26 sg:pub.10.1186/2040-2392-4-16
27 https://doi.org/10.1016/j.cppeds.2012.08.001
28 https://doi.org/10.1016/j.csda.2008.09.009
29 https://doi.org/10.1016/j.csda.2010.11.015
30 https://doi.org/10.1016/j.csda.2013.02.022
31 https://doi.org/10.1016/j.jclinepi.2004.04.003
32 https://doi.org/10.1016/j.jclinepi.2007.11.014
33 https://doi.org/10.1021/ac031386+
34 https://doi.org/10.1073/pnas.091062498
35 https://doi.org/10.1093/bioinformatics/btq600
36 https://doi.org/10.1093/bioinformatics/btt291
37 https://doi.org/10.1101/gr.186501
38 https://doi.org/10.1111/j.1467-9868.2005.00503.x
39 https://doi.org/10.1111/j.1467-9868.2010.00740.x
40 https://doi.org/10.1111/j.1467-9868.2011.01034.x
41 https://doi.org/10.1111/j.1541-0420.2008.01112.x
42 https://doi.org/10.1146/annurev-statistics-022513-115545
43 https://doi.org/10.1198/016214503000125
44 https://doi.org/10.1198/jasa.2011.ap09272
45 https://doi.org/10.1198/jcgs.2011.09220
46 https://doi.org/10.1214/009053604000000067
47 https://doi.org/10.1214/07-sts242
48 https://doi.org/10.1214/aoms/1177703732
49 https://doi.org/10.1214/aos/1016218223
50 https://doi.org/10.1214/ss/1177013604
51 https://doi.org/10.1371/journal.pone.0034846
52 https://doi.org/10.1371/journal.pone.0084483
53 https://doi.org/10.1515/1544-6115.1792
54 https://doi.org/10.1890/10-0602.1
55 https://doi.org/10.1890/10-2276.1
56 https://doi.org/10.3414/me11-02-0030
57 schema:datePublished 2015-12
58 schema:datePublishedReg 2015-12-01
59 schema:description BACKGROUND: Modern biotechnologies often result in high-dimensional data sets with many more variables than observations (n≪p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given. RESULTS: Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways. CONCLUSION: Stability selection is implemented in the freely available R package stabs (http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.
60 schema:genre research_article
61 schema:inLanguage en
62 schema:isAccessibleForFree true
63 schema:isPartOf N3fb9383ce4af4797817211ec8e6db3fe
64 N8cf1bc3df2d84f9aa5cd6915f56bf78c
65 sg:journal.1023786
66 schema:name Controlling false discoveries in high-dimensional situations: boosting with stability selection
67 schema:pagination 144
68 schema:productId N1d3c1354e46048d0845d9b1761b00aff
69 N5e135bfa1ac54ec48ba8250789449ac9
70 N8be6a87ddf0f4172af87a404e77690db
71 Nc432c0552e3b41aa8bdf8a2b30f60755
72 Nf01f533a3c2a4b90ac8a27bed2445816
73 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028801735
74 https://doi.org/10.1186/s12859-015-0575-3
75 schema:sdDatePublished 2019-04-11T09:58
76 schema:sdLicense https://scigraph.springernature.com/explorer/license/
77 schema:sdPublisher N82c04a642df9441d8081fa317ed6e1af
78 schema:url http://link.springer.com/10.1186%2Fs12859-015-0575-3
79 sgo:license sg:explorer/license/
80 sgo:sdDataset articles
81 rdf:type schema:ScholarlyArticle
82 N0c719f736e80466abb7e4011661924c0 rdf:first sg:person.0646712066.27
83 rdf:rest rdf:nil
84 N18ba12d320274bc797ed8676f562ff12 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
85 schema:name Humans
86 rdf:type schema:DefinedTerm
87 N1d3c1354e46048d0845d9b1761b00aff schema:name dimensions_id
88 schema:value pub.1028801735
89 rdf:type schema:PropertyValue
90 N1ef727b8275d4dd987e70ac43afbc513 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
91 schema:name Computer Simulation
92 rdf:type schema:DefinedTerm
93 N2509d05a43684bea80a51c93f2721c21 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
94 schema:name Gene Expression Profiling
95 rdf:type schema:DefinedTerm
96 N3b235d54be564d83a1c42fc3690b5b18 rdf:first sg:person.01044432226.13
97 rdf:rest N79280d0b3dc3406da3ba981614dd367f
98 N3cc6dce9f18740de8ab93b217f5e97bd schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
99 schema:name Algorithms
100 rdf:type schema:DefinedTerm
101 N3fb9383ce4af4797817211ec8e6db3fe schema:volumeNumber 16
102 rdf:type schema:PublicationVolume
103 N59dc1fff230f4800a37dd2768e465a40 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Nonlinear Dynamics
105 rdf:type schema:DefinedTerm
106 N5e135bfa1ac54ec48ba8250789449ac9 schema:name doi
107 schema:value 10.1186/s12859-015-0575-3
108 rdf:type schema:PropertyValue
109 N6058a4fd283c4949a1eb3db02a9e6a18 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
110 schema:name Phenotype
111 rdf:type schema:DefinedTerm
112 N739183833577418f8eec2cb59e1d0df4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
113 schema:name Models, Statistical
114 rdf:type schema:DefinedTerm
115 N79280d0b3dc3406da3ba981614dd367f rdf:first sg:person.0771354551.55
116 rdf:rest N0c719f736e80466abb7e4011661924c0
117 N800346b300da4f31858a50c60d4fb2ca schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
118 schema:name Child Development Disorders, Pervasive
119 rdf:type schema:DefinedTerm
120 N82c04a642df9441d8081fa317ed6e1af schema:name Springer Nature - SN SciGraph project
121 rdf:type schema:Organization
122 N8be6a87ddf0f4172af87a404e77690db schema:name nlm_unique_id
123 schema:value 100965194
124 rdf:type schema:PropertyValue
125 N8cf1bc3df2d84f9aa5cd6915f56bf78c schema:issueNumber 1
126 rdf:type schema:PublicationIssue
127 N94bc993244b7463890d8121eec3ebb80 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name False Positive Reactions
129 rdf:type schema:DefinedTerm
130 Nc432c0552e3b41aa8bdf8a2b30f60755 schema:name pubmed_id
131 schema:value 25943565
132 rdf:type schema:PropertyValue
133 Ndf84d60b233c4221a32d61922f3264a8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
134 schema:name Biomarkers
135 rdf:type schema:DefinedTerm
136 Neeb2f2fc0dd64e80a28f907e4e266ecd schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Case-Control Studies
138 rdf:type schema:DefinedTerm
139 Nf01f533a3c2a4b90ac8a27bed2445816 schema:name readcube_id
140 schema:value 89346bffd945cf24dd79e48ff00a366dbf7b7b9a84aeada3165b80c020ea1dac
141 rdf:type schema:PropertyValue
142 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
143 schema:name Mathematical Sciences
144 rdf:type schema:DefinedTerm
145 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
146 schema:name Statistics
147 rdf:type schema:DefinedTerm
148 sg:journal.1023786 schema:issn 1471-2105
149 schema:name BMC Bioinformatics
150 rdf:type schema:Periodical
151 sg:person.01044432226.13 schema:affiliation https://www.grid.ac/institutes/grid.5330.5
152 schema:familyName Hofner
153 schema:givenName Benjamin
154 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01044432226.13
155 rdf:type schema:Person
156 sg:person.0646712066.27 schema:affiliation https://www.grid.ac/institutes/grid.420081.f
157 schema:familyName Göker
158 schema:givenName Markus
159 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0646712066.27
160 rdf:type schema:Person
161 sg:person.0771354551.55 schema:affiliation https://www.grid.ac/institutes/grid.418307.9
162 schema:familyName Boccuto
163 schema:givenName Luigi
164 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0771354551.55
165 rdf:type schema:Person
166 sg:pub.10.1007/s00180-012-0382-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001684018
167 https://doi.org/10.1007/s00180-012-0382-5
168 rdf:type schema:CreativeWork
169 sg:pub.10.1007/s11222-014-9520-y schema:sameAs https://app.dimensions.ai/details/publication/pub.1045070158
170 https://doi.org/10.1007/s11222-014-9520-y
171 rdf:type schema:CreativeWork
172 sg:pub.10.1023/a:1010933404324 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024739340
173 https://doi.org/10.1023/a:1010933404324
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nbt.1658 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052467241
176 https://doi.org/10.1038/nbt.1658
177 rdf:type schema:CreativeWork
178 sg:pub.10.1038/nmeth.2016 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023029250
179 https://doi.org/10.1038/nmeth.2016
180 rdf:type schema:CreativeWork
181 sg:pub.10.1038/nrg2484 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030687647
182 https://doi.org/10.1038/nrg2484
183 rdf:type schema:CreativeWork
184 sg:pub.10.1186/1471-2105-12-366 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016369639
185 https://doi.org/10.1186/1471-2105-12-366
186 rdf:type schema:CreativeWork
187 sg:pub.10.1186/1471-2105-15-236 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045101318
188 https://doi.org/10.1186/1471-2105-15-236
189 rdf:type schema:CreativeWork
190 sg:pub.10.1186/1471-2105-8-25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019863657
191 https://doi.org/10.1186/1471-2105-8-25
192 rdf:type schema:CreativeWork
193 sg:pub.10.1186/1471-2105-9-136 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041231178
194 https://doi.org/10.1186/1471-2105-9-136
195 rdf:type schema:CreativeWork
196 sg:pub.10.1186/1752-0509-6-145 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022589506
197 https://doi.org/10.1186/1752-0509-6-145
198 rdf:type schema:CreativeWork
199 sg:pub.10.1186/2040-2392-4-16 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038289950
200 https://doi.org/10.1186/2040-2392-4-16
201 rdf:type schema:CreativeWork
202 https://doi.org/10.1016/j.cppeds.2012.08.001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035837013
203 rdf:type schema:CreativeWork
204 https://doi.org/10.1016/j.csda.2008.09.009 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047435658
205 rdf:type schema:CreativeWork
206 https://doi.org/10.1016/j.csda.2010.11.015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021359513
207 rdf:type schema:CreativeWork
208 https://doi.org/10.1016/j.csda.2013.02.022 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009360444
209 rdf:type schema:CreativeWork
210 https://doi.org/10.1016/j.jclinepi.2004.04.003 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013946743
211 rdf:type schema:CreativeWork
212 https://doi.org/10.1016/j.jclinepi.2007.11.014 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004167802
213 rdf:type schema:CreativeWork
214 https://doi.org/10.1021/ac031386+ schema:sameAs https://app.dimensions.ai/details/publication/pub.1054994858
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1073/pnas.091062498 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001631710
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1093/bioinformatics/btq600 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045324570
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1093/bioinformatics/btt291 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038435662
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1101/gr.186501 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020261175
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1111/j.1467-9868.2005.00503.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1043971564
225 rdf:type schema:CreativeWork
226 https://doi.org/10.1111/j.1467-9868.2010.00740.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1000696823
227 rdf:type schema:CreativeWork
228 https://doi.org/10.1111/j.1467-9868.2011.01034.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1012681255
229 rdf:type schema:CreativeWork
230 https://doi.org/10.1111/j.1541-0420.2008.01112.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1018081846
231 rdf:type schema:CreativeWork
232 https://doi.org/10.1146/annurev-statistics-022513-115545 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045721889
233 rdf:type schema:CreativeWork
234 https://doi.org/10.1198/016214503000125 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064198102
235 rdf:type schema:CreativeWork
236 https://doi.org/10.1198/jasa.2011.ap09272 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064200650
237 rdf:type schema:CreativeWork
238 https://doi.org/10.1198/jcgs.2011.09220 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064201123
239 rdf:type schema:CreativeWork
240 https://doi.org/10.1214/009053604000000067 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038945634
241 rdf:type schema:CreativeWork
242 https://doi.org/10.1214/07-sts242 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049744920
243 rdf:type schema:CreativeWork
244 https://doi.org/10.1214/aoms/1177703732 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064400228
245 rdf:type schema:CreativeWork
246 https://doi.org/10.1214/aos/1016218223 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020629296
247 rdf:type schema:CreativeWork
248 https://doi.org/10.1214/ss/1177013604 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052575245
249 rdf:type schema:CreativeWork
250 https://doi.org/10.1371/journal.pone.0034846 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000073536
251 rdf:type schema:CreativeWork
252 https://doi.org/10.1371/journal.pone.0084483 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039586310
253 rdf:type schema:CreativeWork
254 https://doi.org/10.1515/1544-6115.1792 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017320119
255 rdf:type schema:CreativeWork
256 https://doi.org/10.1890/10-0602.1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020740452
257 rdf:type schema:CreativeWork
258 https://doi.org/10.1890/10-2276.1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000901718
259 rdf:type schema:CreativeWork
260 https://doi.org/10.3414/me11-02-0030 schema:sameAs https://app.dimensions.ai/details/publication/pub.1071312140
261 rdf:type schema:CreativeWork
262 https://www.grid.ac/institutes/grid.418307.9 schema:alternateName Greenwood Genetic Center
263 schema:name Greenwood Genetic Center, 113 Gregor Mendel Circle, 29646, Greenwood, SC, USA
264 rdf:type schema:Organization
265 https://www.grid.ac/institutes/grid.420081.f schema:alternateName Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures
266 schema:name Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7b, 38124, Braunschweig, Germany
267 rdf:type schema:Organization
268 https://www.grid.ac/institutes/grid.5330.5 schema:alternateName University of Erlangen-Nuremberg
269 schema:name Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-University Erlangen-Nuremberg, Waldstraße 6, 91054, Erlangen, Germany
270 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...