An efficient random forests algorithm for high dimensional data classification View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-12

AUTHORS

Qiang Wang, Thanh-Tung Nguyen, Joshua Z. Huang, Thuy Thi Nguyen

ABSTRACT

In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data. More... »

PAGES

953-972

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s11634-018-0318-1

DOI

http://dx.doi.org/10.1007/s11634-018-0318-1

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1101634311


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Shenzhen University", 
          "id": "https://www.grid.ac/institutes/grid.263488.3", 
          "name": [
            "College of Computer Science and Software Engineering, Shenzhen University, 518060, Shenzhen, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wang", 
        "givenName": "Qiang", 
        "id": "sg:person.07606053562.86", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07606053562.86"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Unit\u00e9 Mixte Internationnale de Mod\u00e9lisation Math\u00e9matique et Informatiques des Syst\u00e8mes Compl\u00e8xes", 
          "id": "https://www.grid.ac/institutes/grid.464114.2", 
          "name": [
            "Faculty of Computer Science and Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam", 
            "Sorbonne Universit\u00e9, IRD, JEAI WARM, Unit\u00e9 de Mod\u00e9lisation Math\u00e9matiques et Informatique des Syst\u00e8mes Complexes, UMMISCO, 93143, Bondy, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nguyen", 
        "givenName": "Thanh-Tung", 
        "id": "sg:person.0707501506.00", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0707501506.00"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Shenzhen University", 
          "id": "https://www.grid.ac/institutes/grid.263488.3", 
          "name": [
            "College of Computer Science and Software Engineering, Shenzhen University, 518060, Shenzhen, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Huang", 
        "givenName": "Joshua Z.", 
        "id": "sg:person.016311124762.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016311124762.12"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Vietnam National University of Agriculture", 
          "id": "https://www.grid.ac/institutes/grid.444964.f", 
          "name": [
            "Faculty of Information Technology, Vietnam National University of Agriculture, Hanoi, Vietnam"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Nguyen", 
        "givenName": "Thuy Thi", 
        "id": "sg:person.012606033735.21", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012606033735.21"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/s11263-006-9794-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008205152", 
          "https://doi.org/10.1007/s11263-006-9794-4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btn356", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009514376"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patrec.2010.03.014", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021405554"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10994-014-5452-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023553119", 
          "https://doi.org/10.1007/s10994-014-5452-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10994-014-5452-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023553119", 
          "https://doi.org/10.1007/s10994-014-5452-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1010933404324", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024739340", 
          "https://doi.org/10.1023/a:1010933404324"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patcog.2013.05.018", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030335031"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patcog.2012.09.005", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030635013"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1007607513941", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041829946", 
          "https://doi.org/10.1023/a:1007607513941"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.4018/jdwm.2012040103", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042627227"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1162/jocn.1991.3.1.71", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043225769"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.709601", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061156844"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.927464", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061157278"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tpami.2006.188", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061743024"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tpami.2007.250609", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061743325"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/acv.1994.341300", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094190322"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-12", 
    "datePublishedReg": "2018-12-01", 
    "description": "In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s11634-018-0318-1", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1045303", 
        "issn": [
          "1862-5347", 
          "1862-5355"
        ], 
        "name": "Advances in Data Analysis and Classification", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "4", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "name": "An efficient random forests algorithm for high dimensional data classification", 
    "pagination": "953-972", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "adb3f8d8a3b748de73c105adaea74a084ab0f97dd55085e71f18c652f0872868"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s11634-018-0318-1"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1101634311"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s11634-018-0318-1", 
      "https://app.dimensions.ai/details/publication/pub.1101634311"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T12:40", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000363_0000000363/records_70049_00000002.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1007%2Fs11634-018-0318-1"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11634-018-0318-1'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11634-018-0318-1'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11634-018-0318-1'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11634-018-0318-1'


 

This table displays all metadata directly associated to this object as RDF triples.

138 TRIPLES      21 PREDICATES      42 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s11634-018-0318-1 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ne960899510fe4d98a93b50f69e1fc084
4 schema:citation sg:pub.10.1007/s10994-014-5452-1
5 sg:pub.10.1007/s11263-006-9794-4
6 sg:pub.10.1023/a:1007607513941
7 sg:pub.10.1023/a:1010933404324
8 https://doi.org/10.1016/j.patcog.2012.09.005
9 https://doi.org/10.1016/j.patcog.2013.05.018
10 https://doi.org/10.1016/j.patrec.2010.03.014
11 https://doi.org/10.1093/bioinformatics/btn356
12 https://doi.org/10.1109/34.709601
13 https://doi.org/10.1109/34.927464
14 https://doi.org/10.1109/acv.1994.341300
15 https://doi.org/10.1109/tpami.2006.188
16 https://doi.org/10.1109/tpami.2007.250609
17 https://doi.org/10.1162/jocn.1991.3.1.71
18 https://doi.org/10.4018/jdwm.2012040103
19 schema:datePublished 2018-12
20 schema:datePublishedReg 2018-12-01
21 schema:description In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.
22 schema:genre research_article
23 schema:inLanguage en
24 schema:isAccessibleForFree false
25 schema:isPartOf N045ad0ca8030462b93af6072d15f6b63
26 N48f385d7f385463b89db3dfb55193222
27 sg:journal.1045303
28 schema:name An efficient random forests algorithm for high dimensional data classification
29 schema:pagination 953-972
30 schema:productId N2112e0397e5a477b89474b8e50849ebc
31 N4335e138abf84b0a9bf151dc31daf2f0
32 N484b9a4996d94a3e96347f02b81cb140
33 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101634311
34 https://doi.org/10.1007/s11634-018-0318-1
35 schema:sdDatePublished 2019-04-11T12:40
36 schema:sdLicense https://scigraph.springernature.com/explorer/license/
37 schema:sdPublisher N7f4021ee1a6747fd82cd1dcd0dae17d4
38 schema:url https://link.springer.com/10.1007%2Fs11634-018-0318-1
39 sgo:license sg:explorer/license/
40 sgo:sdDataset articles
41 rdf:type schema:ScholarlyArticle
42 N045ad0ca8030462b93af6072d15f6b63 schema:issueNumber 4
43 rdf:type schema:PublicationIssue
44 N2112e0397e5a477b89474b8e50849ebc schema:name doi
45 schema:value 10.1007/s11634-018-0318-1
46 rdf:type schema:PropertyValue
47 N4335e138abf84b0a9bf151dc31daf2f0 schema:name readcube_id
48 schema:value adb3f8d8a3b748de73c105adaea74a084ab0f97dd55085e71f18c652f0872868
49 rdf:type schema:PropertyValue
50 N484b9a4996d94a3e96347f02b81cb140 schema:name dimensions_id
51 schema:value pub.1101634311
52 rdf:type schema:PropertyValue
53 N48f385d7f385463b89db3dfb55193222 schema:volumeNumber 12
54 rdf:type schema:PublicationVolume
55 N5faac210600943b98ae184047acc5408 rdf:first sg:person.012606033735.21
56 rdf:rest rdf:nil
57 N7f4021ee1a6747fd82cd1dcd0dae17d4 schema:name Springer Nature - SN SciGraph project
58 rdf:type schema:Organization
59 Nb17b92e378db4e0ebe916ac517bbec31 rdf:first sg:person.0707501506.00
60 rdf:rest Nfc91006e20534f42a5cef02b8a17106e
61 Ne960899510fe4d98a93b50f69e1fc084 rdf:first sg:person.07606053562.86
62 rdf:rest Nb17b92e378db4e0ebe916ac517bbec31
63 Nfc91006e20534f42a5cef02b8a17106e rdf:first sg:person.016311124762.12
64 rdf:rest N5faac210600943b98ae184047acc5408
65 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
66 schema:name Information and Computing Sciences
67 rdf:type schema:DefinedTerm
68 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
69 schema:name Artificial Intelligence and Image Processing
70 rdf:type schema:DefinedTerm
71 sg:journal.1045303 schema:issn 1862-5347
72 1862-5355
73 schema:name Advances in Data Analysis and Classification
74 rdf:type schema:Periodical
75 sg:person.012606033735.21 schema:affiliation https://www.grid.ac/institutes/grid.444964.f
76 schema:familyName Nguyen
77 schema:givenName Thuy Thi
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012606033735.21
79 rdf:type schema:Person
80 sg:person.016311124762.12 schema:affiliation https://www.grid.ac/institutes/grid.263488.3
81 schema:familyName Huang
82 schema:givenName Joshua Z.
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016311124762.12
84 rdf:type schema:Person
85 sg:person.0707501506.00 schema:affiliation https://www.grid.ac/institutes/grid.464114.2
86 schema:familyName Nguyen
87 schema:givenName Thanh-Tung
88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0707501506.00
89 rdf:type schema:Person
90 sg:person.07606053562.86 schema:affiliation https://www.grid.ac/institutes/grid.263488.3
91 schema:familyName Wang
92 schema:givenName Qiang
93 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07606053562.86
94 rdf:type schema:Person
95 sg:pub.10.1007/s10994-014-5452-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023553119
96 https://doi.org/10.1007/s10994-014-5452-1
97 rdf:type schema:CreativeWork
98 sg:pub.10.1007/s11263-006-9794-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008205152
99 https://doi.org/10.1007/s11263-006-9794-4
100 rdf:type schema:CreativeWork
101 sg:pub.10.1023/a:1007607513941 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041829946
102 https://doi.org/10.1023/a:1007607513941
103 rdf:type schema:CreativeWork
104 sg:pub.10.1023/a:1010933404324 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024739340
105 https://doi.org/10.1023/a:1010933404324
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1016/j.patcog.2012.09.005 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030635013
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1016/j.patcog.2013.05.018 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030335031
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1016/j.patrec.2010.03.014 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021405554
112 rdf:type schema:CreativeWork
113 https://doi.org/10.1093/bioinformatics/btn356 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009514376
114 rdf:type schema:CreativeWork
115 https://doi.org/10.1109/34.709601 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061156844
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1109/34.927464 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061157278
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1109/acv.1994.341300 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094190322
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1109/tpami.2006.188 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061743024
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1109/tpami.2007.250609 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061743325
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1162/jocn.1991.3.1.71 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043225769
126 rdf:type schema:CreativeWork
127 https://doi.org/10.4018/jdwm.2012040103 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042627227
128 rdf:type schema:CreativeWork
129 https://www.grid.ac/institutes/grid.263488.3 schema:alternateName Shenzhen University
130 schema:name College of Computer Science and Software Engineering, Shenzhen University, 518060, Shenzhen, China
131 rdf:type schema:Organization
132 https://www.grid.ac/institutes/grid.444964.f schema:alternateName Vietnam National University of Agriculture
133 schema:name Faculty of Information Technology, Vietnam National University of Agriculture, Hanoi, Vietnam
134 rdf:type schema:Organization
135 https://www.grid.ac/institutes/grid.464114.2 schema:alternateName Unité Mixte Internationnale de Modélisation Mathématique et Informatiques des Systèmes Complèxes
136 schema:name Faculty of Computer Science and Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam
137 Sorbonne Université, IRD, JEAI WARM, Unité de Modélisation Mathématiques et Informatique des Systèmes Complexes, UMMISCO, 93143, Bondy, France
138 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...