Efficient greedy feature selection for unsupervised learning View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2013-05

AUTHORS

Ahmed K. Farahat, Ali Ghodsi, Mohamed S. Kamel

ABSTRACT

Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection. More... »

PAGES

285-310

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10115-012-0538-1

DOI

http://dx.doi.org/10.1007/s10115-012-0538-1

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1043085326


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Farahat", 
        "givenName": "Ahmed K.", 
        "id": "sg:person.013262542731.05", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013262542731.05"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Statistics and Actuarial Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ghodsi", 
        "givenName": "Ali", 
        "id": "sg:person.07545373531.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07545373531.09"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Waterloo", 
          "id": "https://www.grid.ac/institutes/grid.46078.3d", 
          "name": [
            "Department of Electrical and Computer Engineering, University of Waterloo, N2L 3G1, Waterloo, ON, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kamel", 
        "givenName": "Mohamed S.", 
        "id": "sg:person.01133760566.26", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01133760566.26"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/s10115-011-0381-9", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002719279", 
          "https://doi.org/10.1007/s10115-011-0381-9"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1401890.1401903", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004414534"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1835804.1835848", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010986574"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10115-012-0487-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016145627", 
          "https://doi.org/10.1007/s10115-012-0487-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1126/science.1136800", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017347292"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-35488-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021763913", 
          "https://doi.org/10.1007/978-3-540-35488-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-35488-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021763913", 
          "https://doi.org/10.1007/978-3-540-35488-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1291233.1291297", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049356289"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1007612920971", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049726547", 
          "https://doi.org/10.1023/a:1007612920971"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1273496.1273641", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051811766"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.291440", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061155985"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.990133", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061157378"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1198/106186006x113430", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064199521"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1137/1.9781611972801.54", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1088797253"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icdm.2011.22", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094186276"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/acv.1994.341300", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094190322"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.3115/116580.116725", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1099274417"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2013-05", 
    "datePublishedReg": "2013-05-01", 
    "description": "Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s10115-012-0538-1", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1041769", 
        "issn": [
          "0219-1377", 
          "0219-3116"
        ], 
        "name": "Knowledge and Information Systems", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "35"
      }
    ], 
    "name": "Efficient greedy feature selection for unsupervised learning", 
    "pagination": "285-310", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "cd3b18f6ea7a046ec6e0c90e7e6617fe568ff12592bfc45ca2891e6f68ad35d3"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10115-012-0538-1"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1043085326"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10115-012-0538-1", 
      "https://app.dimensions.ai/details/publication/pub.1043085326"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T16:43", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8669_00000515.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007%2Fs10115-012-0538-1"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10115-012-0538-1'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10115-012-0538-1'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10115-012-0538-1'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10115-012-0538-1'


 

This table displays all metadata directly associated to this object as RDF triples.

128 TRIPLES      21 PREDICATES      43 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10115-012-0538-1 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N4a5e3aec8a8f451aa80add0c19bce5b4
4 schema:citation sg:pub.10.1007/978-3-540-35488-8
5 sg:pub.10.1007/s10115-011-0381-9
6 sg:pub.10.1007/s10115-012-0487-8
7 sg:pub.10.1023/a:1007612920971
8 https://doi.org/10.1109/34.291440
9 https://doi.org/10.1109/34.990133
10 https://doi.org/10.1109/acv.1994.341300
11 https://doi.org/10.1109/icdm.2011.22
12 https://doi.org/10.1126/science.1136800
13 https://doi.org/10.1137/1.9781611972801.54
14 https://doi.org/10.1145/1273496.1273641
15 https://doi.org/10.1145/1291233.1291297
16 https://doi.org/10.1145/1401890.1401903
17 https://doi.org/10.1145/1835804.1835848
18 https://doi.org/10.1198/106186006x113430
19 https://doi.org/10.3115/116580.116725
20 schema:datePublished 2013-05
21 schema:datePublishedReg 2013-05-01
22 schema:description Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.
23 schema:genre research_article
24 schema:inLanguage en
25 schema:isAccessibleForFree false
26 schema:isPartOf N465a2fae902443f3a7f2d80f9a905d93
27 N4e3cd3840eba4100ba68d6fca155d958
28 sg:journal.1041769
29 schema:name Efficient greedy feature selection for unsupervised learning
30 schema:pagination 285-310
31 schema:productId N102a50b6dea2485eba5c8b9c2c1a98c5
32 N8757f7dd31c2438c9cea5cba098fb336
33 Nfba7a792bc1c4649be88411a30180be5
34 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043085326
35 https://doi.org/10.1007/s10115-012-0538-1
36 schema:sdDatePublished 2019-04-10T16:43
37 schema:sdLicense https://scigraph.springernature.com/explorer/license/
38 schema:sdPublisher N7a78ab4c7e924da5989c7e0afbb090bc
39 schema:url http://link.springer.com/10.1007%2Fs10115-012-0538-1
40 sgo:license sg:explorer/license/
41 sgo:sdDataset articles
42 rdf:type schema:ScholarlyArticle
43 N102a50b6dea2485eba5c8b9c2c1a98c5 schema:name doi
44 schema:value 10.1007/s10115-012-0538-1
45 rdf:type schema:PropertyValue
46 N14bce4e7c90f4b49b1409e457ab010fe rdf:first sg:person.01133760566.26
47 rdf:rest rdf:nil
48 N465a2fae902443f3a7f2d80f9a905d93 schema:volumeNumber 35
49 rdf:type schema:PublicationVolume
50 N4a5e3aec8a8f451aa80add0c19bce5b4 rdf:first sg:person.013262542731.05
51 rdf:rest Ncced080e35324678a437e2539e95ba47
52 N4e3cd3840eba4100ba68d6fca155d958 schema:issueNumber 2
53 rdf:type schema:PublicationIssue
54 N7a78ab4c7e924da5989c7e0afbb090bc schema:name Springer Nature - SN SciGraph project
55 rdf:type schema:Organization
56 N8757f7dd31c2438c9cea5cba098fb336 schema:name dimensions_id
57 schema:value pub.1043085326
58 rdf:type schema:PropertyValue
59 Ncced080e35324678a437e2539e95ba47 rdf:first sg:person.07545373531.09
60 rdf:rest N14bce4e7c90f4b49b1409e457ab010fe
61 Nfba7a792bc1c4649be88411a30180be5 schema:name readcube_id
62 schema:value cd3b18f6ea7a046ec6e0c90e7e6617fe568ff12592bfc45ca2891e6f68ad35d3
63 rdf:type schema:PropertyValue
64 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
65 schema:name Information and Computing Sciences
66 rdf:type schema:DefinedTerm
67 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
68 schema:name Artificial Intelligence and Image Processing
69 rdf:type schema:DefinedTerm
70 sg:journal.1041769 schema:issn 0219-1377
71 0219-3116
72 schema:name Knowledge and Information Systems
73 rdf:type schema:Periodical
74 sg:person.01133760566.26 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
75 schema:familyName Kamel
76 schema:givenName Mohamed S.
77 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01133760566.26
78 rdf:type schema:Person
79 sg:person.013262542731.05 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
80 schema:familyName Farahat
81 schema:givenName Ahmed K.
82 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013262542731.05
83 rdf:type schema:Person
84 sg:person.07545373531.09 schema:affiliation https://www.grid.ac/institutes/grid.46078.3d
85 schema:familyName Ghodsi
86 schema:givenName Ali
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07545373531.09
88 rdf:type schema:Person
89 sg:pub.10.1007/978-3-540-35488-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021763913
90 https://doi.org/10.1007/978-3-540-35488-8
91 rdf:type schema:CreativeWork
92 sg:pub.10.1007/s10115-011-0381-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002719279
93 https://doi.org/10.1007/s10115-011-0381-9
94 rdf:type schema:CreativeWork
95 sg:pub.10.1007/s10115-012-0487-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016145627
96 https://doi.org/10.1007/s10115-012-0487-8
97 rdf:type schema:CreativeWork
98 sg:pub.10.1023/a:1007612920971 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049726547
99 https://doi.org/10.1023/a:1007612920971
100 rdf:type schema:CreativeWork
101 https://doi.org/10.1109/34.291440 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061155985
102 rdf:type schema:CreativeWork
103 https://doi.org/10.1109/34.990133 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061157378
104 rdf:type schema:CreativeWork
105 https://doi.org/10.1109/acv.1994.341300 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094190322
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1109/icdm.2011.22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094186276
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1126/science.1136800 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017347292
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1137/1.9781611972801.54 schema:sameAs https://app.dimensions.ai/details/publication/pub.1088797253
112 rdf:type schema:CreativeWork
113 https://doi.org/10.1145/1273496.1273641 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051811766
114 rdf:type schema:CreativeWork
115 https://doi.org/10.1145/1291233.1291297 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049356289
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1145/1401890.1401903 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004414534
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1145/1835804.1835848 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010986574
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1198/106186006x113430 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064199521
122 rdf:type schema:CreativeWork
123 https://doi.org/10.3115/116580.116725 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099274417
124 rdf:type schema:CreativeWork
125 https://www.grid.ac/institutes/grid.46078.3d schema:alternateName University of Waterloo
126 schema:name Department of Electrical and Computer Engineering, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
127 Department of Statistics and Actuarial Science, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
128 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...