Robust PCA for High-dimensional Data View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2003

AUTHORS

M. Hubert , P. J. Rousseeuw , S. Verboven

ABSTRACT

Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set. More... »

PAGES

169-179

References to SciGraph publications

  • 2001. A Robust Approach to Common Principal Components in STATISTICS IN GENETICS AND IN THE ENVIRONMENTAL SCIENCES
  • 1996. A Fast Algorithm for Robust Principal Components Based on Projection Pursuit in COMPSTAT
  • Book

    TITLE

    Developments in Robust Statistics

    ISBN

    978-3-642-63241-9
    978-3-642-57338-5

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-57338-5_14

    DOI

    http://dx.doi.org/10.1007/978-3-642-57338-5_14

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1009923196


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Statistics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "KU Leuven", 
              "id": "https://www.grid.ac/institutes/grid.5596.f", 
              "name": [
                "Department of Mathematics, Catholic University of Leuven, Belgium"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hubert", 
            "givenName": "M.", 
            "id": "sg:person.014406052301.59", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014406052301.59"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Antwerp", 
              "id": "https://www.grid.ac/institutes/grid.5284.b", 
              "name": [
                "Department of Mathematics and Computer Science, University of Antwerp (UIA), Belgium"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Rousseeuw", 
            "givenName": "P. J.", 
            "id": "sg:person.0775337371.63", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0775337371.63"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Antwerp", 
              "id": "https://www.grid.ac/institutes/grid.5284.b", 
              "name": [
                "Department of Mathematics and Computer Science, University of Antwerp (UIA), Belgium"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Verboven", 
            "givenName": "S.", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0169-7439(01)00188-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1013367213"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-46992-3_22", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014224380", 
              "https://doi.org/10.1007/978-3-642-46992-3_22"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/jsfa.2740350116", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016786059"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-0348-8326-9_9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1038838787", 
              "https://doi.org/10.1007/978-3-0348-8326-9_9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-0348-8326-9_9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1038838787", 
              "https://doi.org/10.1007/978-3-0348-8326-9_9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/00401706.1999.10485591", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058287697"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/00401706.1999.10485670", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058287776"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/01621459.1984.10477105", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058302950"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/01621459.1985.10478181", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058303158"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/01621459.1990.10474920", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058303860"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1080/01621459.1993.10476408", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1058304492"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/biomet/87.3.603", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1059421030"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1214/aos/1176350505", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064409186"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://app.dimensions.ai/details/publication/pub.1107763504", 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/0471725382", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1107763504"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2003", 
        "datePublishedReg": "2003-01-01", 
        "description": "Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.", 
        "editor": [
          {
            "familyName": "Dutter", 
            "givenName": "Rudolf", 
            "type": "Person"
          }, 
          {
            "familyName": "Filzmoser", 
            "givenName": "Peter", 
            "type": "Person"
          }, 
          {
            "familyName": "Gather", 
            "givenName": "Ursula", 
            "type": "Person"
          }, 
          {
            "familyName": "Rousseeuw", 
            "givenName": "Peter J.", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-57338-5_14", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-642-63241-9", 
            "978-3-642-57338-5"
          ], 
          "name": "Developments in Robust Statistics", 
          "type": "Book"
        }, 
        "name": "Robust PCA for High-dimensional Data", 
        "pagination": "169-179", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1009923196"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-57338-5_14"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "2f46143fe10a03ed7f5d9f2d105d67c14adf321fc858f227059586656e4459a7"
            ]
          }
        ], 
        "publisher": {
          "location": "Heidelberg", 
          "name": "Physica-Verlag HD", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-57338-5_14", 
          "https://app.dimensions.ai/details/publication/pub.1009923196"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T09:16", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000371_0000000371/records_130814_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-642-57338-5_14"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-57338-5_14'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-57338-5_14'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-57338-5_14'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-57338-5_14'


     

    This table displays all metadata directly associated to this object as RDF triples.

    139 TRIPLES      23 PREDICATES      41 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-57338-5_14 schema:about anzsrc-for:01
    2 anzsrc-for:0104
    3 schema:author Nf17b9df414d0461792879003d0f114c4
    4 schema:citation sg:pub.10.1007/978-3-0348-8326-9_9
    5 sg:pub.10.1007/978-3-642-46992-3_22
    6 https://app.dimensions.ai/details/publication/pub.1107763504
    7 https://doi.org/10.1002/0471725382
    8 https://doi.org/10.1002/jsfa.2740350116
    9 https://doi.org/10.1016/s0169-7439(01)00188-5
    10 https://doi.org/10.1080/00401706.1999.10485591
    11 https://doi.org/10.1080/00401706.1999.10485670
    12 https://doi.org/10.1080/01621459.1984.10477105
    13 https://doi.org/10.1080/01621459.1985.10478181
    14 https://doi.org/10.1080/01621459.1990.10474920
    15 https://doi.org/10.1080/01621459.1993.10476408
    16 https://doi.org/10.1093/biomet/87.3.603
    17 https://doi.org/10.1214/aos/1176350505
    18 schema:datePublished 2003
    19 schema:datePublishedReg 2003-01-01
    20 schema:description Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix. The other situation, where there are many variables (often even more variables than observations), has received less attention in the robustness literature. We will compare two robust methods for this situation. The first one is based on projection pursuit (Li and Chen, 1985; Rousseeuw and Croux, 1993; Croux and Ruiz-Gazen, 1996, 2000; Hubert et al., 2002). The second method is a new proposal, which combines the notion of outlyingness (Stahel, 1981; Donoho, 1982) with the FAST-MCD algorithm (Rousseeuw and Van Driessen, 1999). The performance and the robustness of these two methods are compared through a simulation study. We also illustrate the new method on a chemometrical data set.
    21 schema:editor N72de01dfde284bdb880c5c96bc72ae3b
    22 schema:genre chapter
    23 schema:inLanguage en
    24 schema:isAccessibleForFree false
    25 schema:isPartOf Nd6635109cb4b4b7c92e0c17861cd02f9
    26 schema:name Robust PCA for High-dimensional Data
    27 schema:pagination 169-179
    28 schema:productId N0cf7d1c4505149a8995dd7aba5f0a4ad
    29 N3b740538ab3544a3a913f205934c7f9a
    30 N977be5be757a4a8fa14568a861f3ba82
    31 schema:publisher Nc76f1431fc1542568d6744fb7e1364fa
    32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009923196
    33 https://doi.org/10.1007/978-3-642-57338-5_14
    34 schema:sdDatePublished 2019-04-16T09:16
    35 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    36 schema:sdPublisher N9405278360744b11a5113f9e01e7be86
    37 schema:url https://link.springer.com/10.1007%2F978-3-642-57338-5_14
    38 sgo:license sg:explorer/license/
    39 sgo:sdDataset chapters
    40 rdf:type schema:Chapter
    41 N005fd5a9d9614242b0fecd5e437c4be9 rdf:first N45f4b2bc71ca44128074275dd3399729
    42 rdf:rest Ne9b06506db614473afd7ae852c27a7b5
    43 N0cf7d1c4505149a8995dd7aba5f0a4ad schema:name readcube_id
    44 schema:value 2f46143fe10a03ed7f5d9f2d105d67c14adf321fc858f227059586656e4459a7
    45 rdf:type schema:PropertyValue
    46 N3aadddd98db54ab2aa8daf6615d1168a schema:familyName Gather
    47 schema:givenName Ursula
    48 rdf:type schema:Person
    49 N3b740538ab3544a3a913f205934c7f9a schema:name dimensions_id
    50 schema:value pub.1009923196
    51 rdf:type schema:PropertyValue
    52 N3d7554814757483a9b88ecdcef581487 schema:affiliation https://www.grid.ac/institutes/grid.5284.b
    53 schema:familyName Verboven
    54 schema:givenName S.
    55 rdf:type schema:Person
    56 N45f4b2bc71ca44128074275dd3399729 schema:familyName Filzmoser
    57 schema:givenName Peter
    58 rdf:type schema:Person
    59 N72de01dfde284bdb880c5c96bc72ae3b rdf:first Nc55141e26f954467b1c3d3eeed2c606e
    60 rdf:rest N005fd5a9d9614242b0fecd5e437c4be9
    61 N82df9892542145d7a87e92dfeed1b405 schema:familyName Rousseeuw
    62 schema:givenName Peter J.
    63 rdf:type schema:Person
    64 N9405278360744b11a5113f9e01e7be86 schema:name Springer Nature - SN SciGraph project
    65 rdf:type schema:Organization
    66 N977be5be757a4a8fa14568a861f3ba82 schema:name doi
    67 schema:value 10.1007/978-3-642-57338-5_14
    68 rdf:type schema:PropertyValue
    69 Nc55141e26f954467b1c3d3eeed2c606e schema:familyName Dutter
    70 schema:givenName Rudolf
    71 rdf:type schema:Person
    72 Nc726dcc0f288449f98bbb516aca0f683 rdf:first N82df9892542145d7a87e92dfeed1b405
    73 rdf:rest rdf:nil
    74 Nc76f1431fc1542568d6744fb7e1364fa schema:location Heidelberg
    75 schema:name Physica-Verlag HD
    76 rdf:type schema:Organisation
    77 Nd3c5bc41440d45a79b93320fa6a5a75d rdf:first N3d7554814757483a9b88ecdcef581487
    78 rdf:rest rdf:nil
    79 Nd4a37578eb1d44ceac1f63b4105f9a9c rdf:first sg:person.0775337371.63
    80 rdf:rest Nd3c5bc41440d45a79b93320fa6a5a75d
    81 Nd6635109cb4b4b7c92e0c17861cd02f9 schema:isbn 978-3-642-57338-5
    82 978-3-642-63241-9
    83 schema:name Developments in Robust Statistics
    84 rdf:type schema:Book
    85 Ne9b06506db614473afd7ae852c27a7b5 rdf:first N3aadddd98db54ab2aa8daf6615d1168a
    86 rdf:rest Nc726dcc0f288449f98bbb516aca0f683
    87 Nf17b9df414d0461792879003d0f114c4 rdf:first sg:person.014406052301.59
    88 rdf:rest Nd4a37578eb1d44ceac1f63b4105f9a9c
    89 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    90 schema:name Mathematical Sciences
    91 rdf:type schema:DefinedTerm
    92 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
    93 schema:name Statistics
    94 rdf:type schema:DefinedTerm
    95 sg:person.014406052301.59 schema:affiliation https://www.grid.ac/institutes/grid.5596.f
    96 schema:familyName Hubert
    97 schema:givenName M.
    98 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014406052301.59
    99 rdf:type schema:Person
    100 sg:person.0775337371.63 schema:affiliation https://www.grid.ac/institutes/grid.5284.b
    101 schema:familyName Rousseeuw
    102 schema:givenName P. J.
    103 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0775337371.63
    104 rdf:type schema:Person
    105 sg:pub.10.1007/978-3-0348-8326-9_9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038838787
    106 https://doi.org/10.1007/978-3-0348-8326-9_9
    107 rdf:type schema:CreativeWork
    108 sg:pub.10.1007/978-3-642-46992-3_22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014224380
    109 https://doi.org/10.1007/978-3-642-46992-3_22
    110 rdf:type schema:CreativeWork
    111 https://app.dimensions.ai/details/publication/pub.1107763504 schema:CreativeWork
    112 https://doi.org/10.1002/0471725382 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107763504
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.1002/jsfa.2740350116 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016786059
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1016/s0169-7439(01)00188-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013367213
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1080/00401706.1999.10485591 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058287697
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1080/00401706.1999.10485670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058287776
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.1080/01621459.1984.10477105 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058302950
    123 rdf:type schema:CreativeWork
    124 https://doi.org/10.1080/01621459.1985.10478181 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058303158
    125 rdf:type schema:CreativeWork
    126 https://doi.org/10.1080/01621459.1990.10474920 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058303860
    127 rdf:type schema:CreativeWork
    128 https://doi.org/10.1080/01621459.1993.10476408 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058304492
    129 rdf:type schema:CreativeWork
    130 https://doi.org/10.1093/biomet/87.3.603 schema:sameAs https://app.dimensions.ai/details/publication/pub.1059421030
    131 rdf:type schema:CreativeWork
    132 https://doi.org/10.1214/aos/1176350505 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064409186
    133 rdf:type schema:CreativeWork
    134 https://www.grid.ac/institutes/grid.5284.b schema:alternateName University of Antwerp
    135 schema:name Department of Mathematics and Computer Science, University of Antwerp (UIA), Belgium
    136 rdf:type schema:Organization
    137 https://www.grid.ac/institutes/grid.5596.f schema:alternateName KU Leuven
    138 schema:name Department of Mathematics, Catholic University of Leuven, Belgium
    139 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...