Ontology type: schema:ScholarlyArticle Open Access: True
2017-05
AUTHORSBaptiste Gregorutti, Bertrand Michel, Philippe Saint-Pierre
ABSTRACTThis paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more challenging in the presence of highly correlated predictors. Firstly we provide a theoretical study of the permutation importance measure for an additive regression model. This allows us to describe how the correlation between predictors impacts the permutation importance. Our results motivate the use of the recursive feature elimination (RFE) algorithm for variable selection in this context. This algorithm recursively eliminates the variables using permutation importance measure as a ranking criterion. Next various simulation experiments illustrate the efficiency of the RFE algorithm for selecting a small number of variables together with a good prediction error. Finally, this selection algorithm is tested on the Landsat Satellite data from the UCI Machine Learning Repository. More... »
PAGES659-678
http://scigraph.springernature.com/pub.10.1007/s11222-016-9646-1
DOIhttp://dx.doi.org/10.1007/s11222-016-9646-1
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1024423228
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e",
"id": "https://www.grid.ac/institutes/grid.463964.a",
"name": [
"Safety Line, 15 Rue Jean-Baptiste Berlier, 75013, Paris, France",
"Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e, Universit\u00e9 Pierre et Marie Curie, 4 Place Jussieu, 75252, Paris Cedex 05, France"
],
"type": "Organization"
},
"familyName": "Gregorutti",
"givenName": "Baptiste",
"id": "sg:person.010337330732.96",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010337330732.96"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e",
"id": "https://www.grid.ac/institutes/grid.463964.a",
"name": [
"Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e, Universit\u00e9 Pierre et Marie Curie, 4 Place Jussieu, 75252, Paris Cedex 05, France"
],
"type": "Organization"
},
"familyName": "Michel",
"givenName": "Bertrand",
"id": "sg:person.016617520265.86",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016617520265.86"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e",
"id": "https://www.grid.ac/institutes/grid.463964.a",
"name": [
"Laboratoire de Statistique Th\u00e9orique et Appliqu\u00e9e, Universit\u00e9 Pierre et Marie Curie, 4 Place Jussieu, 75252, Paris Cedex 05, France"
],
"type": "Organization"
},
"familyName": "Saint-Pierre",
"givenName": "Philippe",
"id": "sg:person.014720173732.65",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014720173732.65"
],
"type": "Person"
}
],
"citation": [
{
"id": "https://doi.org/10.1111/j.1467-9868.2010.00740.x",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1000696823"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1111/j.1467-9868.2010.00740.x",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1000696823"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/bf00058655",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002929950",
"https://doi.org/10.1007/bf00058655"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/bioinformatics/btr300",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1004108043"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-7-3",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1004156594",
"https://doi.org/10.1186/1471-2105-7-3"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-7-3",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1004156594",
"https://doi.org/10.1186/1471-2105-7-3"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.jspi.2013.05.019",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1005599215"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-74272-2_115",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1009024625",
"https://doi.org/10.1007/978-3-540-74272-2_115"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-74272-2_115",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1009024625",
"https://doi.org/10.1007/978-3-540-74272-2_115"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.chemolab.2010.12.004",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1015985534"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s10115-006-0040-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018091708",
"https://doi.org/10.1007/s10115-006-0040-8"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s10115-006-0040-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018091708",
"https://doi.org/10.1007/s10115-006-0040-8"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-5-81",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1019670626",
"https://doi.org/10.1186/1471-2105-5-81"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/s0004-3702(97)00063-5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020136638"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1371/journal.pone.0028210",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020997827"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.patrec.2010.03.014",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1021405554"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1010933404324",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1024739340",
"https://doi.org/10.1023/a:1010933404324"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/s0004-3702(97)00043-x",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1031014012"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.2202/1557-4679.1008",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1032915999"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-25966-4_33",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033920644",
"https://doi.org/10.1007/978-3-540-25966-4_33"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-25966-4_33",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1033920644",
"https://doi.org/10.1007/978-3-540-25966-4_33"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1073/pnas.102102699",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1034359388"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.csda.2012.09.020",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1040900298"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.jmva.2011.05.004",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1041570818"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/bib/bbr016",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1042608935"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-9-307",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1042870683",
"https://doi.org/10.1186/1471-2105-9-307"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.csda.2005.12.018",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1044061459"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-11-110",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1047298303",
"https://doi.org/10.1186/1471-2105-11-110"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1012487302797",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1048573168",
"https://doi.org/10.1023/a:1012487302797"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/j.csda.2007.08.015",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049823578"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1093/bioinformatics/btp331",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1050285412"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1080/01621459.2015.1036994",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1058306386"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tcbb.2012.33",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061541044"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1198/tast.2009.08199",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1064201606"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1214/07-ejs039",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1064389851"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1214/15-aos1321",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1064395231"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1002/9780470316436",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1109489408"
],
"type": "CreativeWork"
},
{
"id": "https://app.dimensions.ai/details/publication/pub.1109489408",
"type": "CreativeWork"
}
],
"datePublished": "2017-05",
"datePublishedReg": "2017-05-01",
"description": "This paper is about variable selection with the random forests algorithm in presence of correlated predictors. In high-dimensional regression or classification frameworks, variable selection is a difficult task, that becomes even more challenging in the presence of highly correlated predictors. Firstly we provide a theoretical study of the permutation importance measure for an additive regression model. This allows us to describe how the correlation between predictors impacts the permutation importance. Our results motivate the use of the recursive feature elimination (RFE) algorithm for variable selection in this context. This algorithm recursively eliminates the variables using permutation importance measure as a ranking criterion. Next various simulation experiments illustrate the efficiency of the RFE algorithm for selecting a small number of variables together with a good prediction error. Finally, this selection algorithm is tested on the Landsat Satellite data from the UCI Machine Learning Repository.",
"genre": "research_article",
"id": "sg:pub.10.1007/s11222-016-9646-1",
"inLanguage": [
"en"
],
"isAccessibleForFree": true,
"isPartOf": [
{
"id": "sg:journal.1327447",
"issn": [
"0960-3174",
"1573-1375"
],
"name": "Statistics and Computing",
"type": "Periodical"
},
{
"issueNumber": "3",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "27"
}
],
"name": "Correlation and variable importance in random forests",
"pagination": "659-678",
"productId": [
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"745ab80f41148386f3b4e29013b41d60bf365a5d5f845c75a47ac803cabe62a2"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/s11222-016-9646-1"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1024423228"
]
}
],
"sameAs": [
"https://doi.org/10.1007/s11222-016-9646-1",
"https://app.dimensions.ai/details/publication/pub.1024423228"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-10T20:08",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000586.jsonl",
"type": "ScholarlyArticle",
"url": "http://link.springer.com/10.1007%2Fs11222-016-9646-1"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11222-016-9646-1'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11222-016-9646-1'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11222-016-9646-1'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11222-016-9646-1'
This table displays all metadata directly associated to this object as RDF triples.
184 TRIPLES
21 PREDICATES
60 URIs
19 LITERALS
7 BLANK NODES