Robust Grouped Variable Selection Using Distributionally Robust Optimization View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2022-06-28

AUTHORS

Ruidi Chen, Ioannis Ch. Paschalidis

ABSTRACT

We propose a distributionally robust optimization formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations on the data for both linear regression and classification problems. The resulting model offers robustness explanations for grouped least absolute shrinkage and selection operator algorithms and highlights the connection between robustness and regularization. We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator, showing that coefficients in the same group converge to the same value as the sample correlation between covariates approaches 1. Based on this result, we propose to use the spectral clustering algorithm with the Gaussian similarity function to perform grouping on the predictors, which makes our approach applicable without knowing the grouping structure a priori. We compare our approach to an array of alternatives and provide extensive numerical results on both synthetic data and a real large dataset of surgery-related medical records, showing that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level and is able to achieve better prediction and estimation performance in the presence of outliers. More... »

PAGES

1042-1071

References to SciGraph publications

  • 2014-09-30. Data-driven estimation in equilibrium using inverse optimization in MATHEMATICAL PROGRAMMING
  • 2011-11-10. Distributionally robust joint chance constraints with second-order moment information in MATHEMATICAL PROGRAMMING
  • 2007-08-22. A tutorial on spectral clustering in STATISTICS AND COMPUTING
  • 2001-09-13. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results in COMPUTATIONAL LEARNING THEORY
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s10957-022-02065-4

    DOI

    http://dx.doi.org/10.1007/s10957-022-02065-4

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1149027051


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/09", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Engineering", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0102", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Applied Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0103", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Numerical and Computational Mathematics", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0906", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Electrical and Electronic Engineering", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Boston University, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.189504.1", 
              "name": [
                "Boston University, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chen", 
            "givenName": "Ruidi", 
            "id": "sg:person.016152115754.49", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016152115754.49"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Boston University, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.189504.1", 
              "name": [
                "Boston University, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Paschalidis", 
            "givenName": "Ioannis Ch.", 
            "id": "sg:person.01357575514.92", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01357575514.92"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/3-540-44581-1_15", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023106154", 
              "https://doi.org/10.1007/3-540-44581-1_15"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s11222-007-9033-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008067612", 
              "https://doi.org/10.1007/s11222-007-9033-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10107-014-0819-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1025153037", 
              "https://doi.org/10.1007/s10107-014-0819-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10107-011-0494-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019393106", 
              "https://doi.org/10.1007/s10107-011-0494-7"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2022-06-28", 
        "datePublishedReg": "2022-06-28", 
        "description": "We propose a distributionally robust optimization formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations on the data for both linear regression and classification problems. The resulting model offers robustness explanations for grouped least absolute shrinkage and selection operator algorithms and highlights the connection between robustness and regularization. We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator, showing that coefficients in the same group converge to the same value as the sample correlation between covariates approaches 1. Based on this result, we propose to use the spectral clustering algorithm with the Gaussian similarity function to perform grouping on the predictors, which makes our approach applicable without knowing the grouping structure a priori. We compare our approach to an array of alternatives and provide extensive numerical results on both synthetic data and a real large dataset of surgery-related medical records, showing that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level and is able to achieve better prediction and estimation performance in the presence of outliers.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s10957-022-02065-4", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.8632080", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.8566477", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.6933063", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.7568977", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1044187", 
            "issn": [
              "0022-3239", 
              "1573-2878"
            ], 
            "name": "Journal of Optimization Theory and Applications", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "194"
          }
        ], 
        "keywords": [
          "robust optimization formulation", 
          "grouped variable selection", 
          "presence of outliers", 
          "robust optimization", 
          "probabilistic bounds", 
          "Extensive numerical results", 
          "optimization formulation", 
          "grouped variables", 
          "variable selection", 
          "Gaussian similarity function", 
          "estimation performance", 
          "spectral clustering algorithm", 
          "numerical results", 
          "estimation bias", 
          "sample correlation", 
          "classification problem", 
          "clustering algorithm", 
          "synthetic data", 
          "similarity function", 
          "operator algorithm", 
          "large datasets", 
          "parsimonious model", 
          "algorithm", 
          "least absolute shrinkage", 
          "better prediction", 
          "formulation", 
          "Wasserstein", 
          "bounds", 
          "same value", 
          "estimator", 
          "regularization", 
          "absolute shrinkage", 
          "sparsity", 
          "linear regression", 
          "optimization", 
          "perturbations", 
          "model", 
          "outliers", 
          "robustness", 
          "uncertainty", 
          "dataset", 
          "problem", 
          "approach", 
          "selection operator (LASSO) algorithm", 
          "coefficient", 
          "prediction", 
          "variables", 
          "data", 
          "results", 
          "function", 
          "performance", 
          "connection", 
          "array", 
          "structure", 
          "covariates", 
          "same group", 
          "selection", 
          "grouping", 
          "regression", 
          "values", 
          "bias", 
          "medical records", 
          "group level", 
          "records", 
          "alternative", 
          "array of alternatives", 
          "correlation", 
          "explanation", 
          "effect", 
          "presence", 
          "shrinkage", 
          "loss", 
          "levels", 
          "group", 
          "sample loss", 
          "predictors"
        ], 
        "name": "Robust Grouped Variable Selection Using Distributionally Robust Optimization", 
        "pagination": "1042-1071", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1149027051"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s10957-022-02065-4"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s10957-022-02065-4", 
          "https://app.dimensions.ai/details/publication/pub.1149027051"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-11-24T21:09", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/article/article_925.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s10957-022-02065-4"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10957-022-02065-4'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10957-022-02065-4'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10957-022-02065-4'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10957-022-02065-4'


     

    This table displays all metadata directly associated to this object as RDF triples.

    176 TRIPLES      21 PREDICATES      107 URIs      92 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s10957-022-02065-4 schema:about anzsrc-for:01
    2 anzsrc-for:0102
    3 anzsrc-for:0103
    4 anzsrc-for:09
    5 anzsrc-for:0906
    6 schema:author Nc1d44218213e4ace8a1c653ec00dc8a2
    7 schema:citation sg:pub.10.1007/3-540-44581-1_15
    8 sg:pub.10.1007/s10107-011-0494-7
    9 sg:pub.10.1007/s10107-014-0819-4
    10 sg:pub.10.1007/s11222-007-9033-z
    11 schema:datePublished 2022-06-28
    12 schema:datePublishedReg 2022-06-28
    13 schema:description We propose a distributionally robust optimization formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations on the data for both linear regression and classification problems. The resulting model offers robustness explanations for grouped least absolute shrinkage and selection operator algorithms and highlights the connection between robustness and regularization. We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator, showing that coefficients in the same group converge to the same value as the sample correlation between covariates approaches 1. Based on this result, we propose to use the spectral clustering algorithm with the Gaussian similarity function to perform grouping on the predictors, which makes our approach applicable without knowing the grouping structure a priori. We compare our approach to an array of alternatives and provide extensive numerical results on both synthetic data and a real large dataset of surgery-related medical records, showing that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level and is able to achieve better prediction and estimation performance in the presence of outliers.
    14 schema:genre article
    15 schema:isAccessibleForFree true
    16 schema:isPartOf N520ec1dd4c7b4120a860c4ae5cfcea8b
    17 N81d083c0a396450f80b0b5de2a22adb6
    18 sg:journal.1044187
    19 schema:keywords Extensive numerical results
    20 Gaussian similarity function
    21 Wasserstein
    22 absolute shrinkage
    23 algorithm
    24 alternative
    25 approach
    26 array
    27 array of alternatives
    28 better prediction
    29 bias
    30 bounds
    31 classification problem
    32 clustering algorithm
    33 coefficient
    34 connection
    35 correlation
    36 covariates
    37 data
    38 dataset
    39 effect
    40 estimation bias
    41 estimation performance
    42 estimator
    43 explanation
    44 formulation
    45 function
    46 group
    47 group level
    48 grouped variable selection
    49 grouped variables
    50 grouping
    51 large datasets
    52 least absolute shrinkage
    53 levels
    54 linear regression
    55 loss
    56 medical records
    57 model
    58 numerical results
    59 operator algorithm
    60 optimization
    61 optimization formulation
    62 outliers
    63 parsimonious model
    64 performance
    65 perturbations
    66 prediction
    67 predictors
    68 presence
    69 presence of outliers
    70 probabilistic bounds
    71 problem
    72 records
    73 regression
    74 regularization
    75 results
    76 robust optimization
    77 robust optimization formulation
    78 robustness
    79 same group
    80 same value
    81 sample correlation
    82 sample loss
    83 selection
    84 selection operator (LASSO) algorithm
    85 shrinkage
    86 similarity function
    87 sparsity
    88 spectral clustering algorithm
    89 structure
    90 synthetic data
    91 uncertainty
    92 values
    93 variable selection
    94 variables
    95 schema:name Robust Grouped Variable Selection Using Distributionally Robust Optimization
    96 schema:pagination 1042-1071
    97 schema:productId N6184a8d2d2f94f98a08d07d1f4fb9591
    98 Nef81f7b3179441b28f2c4504fc1525b0
    99 schema:sameAs https://app.dimensions.ai/details/publication/pub.1149027051
    100 https://doi.org/10.1007/s10957-022-02065-4
    101 schema:sdDatePublished 2022-11-24T21:09
    102 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    103 schema:sdPublisher N953cd35430364166934ce05144b3594c
    104 schema:url https://doi.org/10.1007/s10957-022-02065-4
    105 sgo:license sg:explorer/license/
    106 sgo:sdDataset articles
    107 rdf:type schema:ScholarlyArticle
    108 N520ec1dd4c7b4120a860c4ae5cfcea8b schema:volumeNumber 194
    109 rdf:type schema:PublicationVolume
    110 N6184a8d2d2f94f98a08d07d1f4fb9591 schema:name doi
    111 schema:value 10.1007/s10957-022-02065-4
    112 rdf:type schema:PropertyValue
    113 N81d083c0a396450f80b0b5de2a22adb6 schema:issueNumber 3
    114 rdf:type schema:PublicationIssue
    115 N8ebbcb20dda84e75af4b27169d3baff3 rdf:first sg:person.01357575514.92
    116 rdf:rest rdf:nil
    117 N953cd35430364166934ce05144b3594c schema:name Springer Nature - SN SciGraph project
    118 rdf:type schema:Organization
    119 Nc1d44218213e4ace8a1c653ec00dc8a2 rdf:first sg:person.016152115754.49
    120 rdf:rest N8ebbcb20dda84e75af4b27169d3baff3
    121 Nef81f7b3179441b28f2c4504fc1525b0 schema:name dimensions_id
    122 schema:value pub.1149027051
    123 rdf:type schema:PropertyValue
    124 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    125 schema:name Mathematical Sciences
    126 rdf:type schema:DefinedTerm
    127 anzsrc-for:0102 schema:inDefinedTermSet anzsrc-for:
    128 schema:name Applied Mathematics
    129 rdf:type schema:DefinedTerm
    130 anzsrc-for:0103 schema:inDefinedTermSet anzsrc-for:
    131 schema:name Numerical and Computational Mathematics
    132 rdf:type schema:DefinedTerm
    133 anzsrc-for:09 schema:inDefinedTermSet anzsrc-for:
    134 schema:name Engineering
    135 rdf:type schema:DefinedTerm
    136 anzsrc-for:0906 schema:inDefinedTermSet anzsrc-for:
    137 schema:name Electrical and Electronic Engineering
    138 rdf:type schema:DefinedTerm
    139 sg:grant.6933063 http://pending.schema.org/fundedItem sg:pub.10.1007/s10957-022-02065-4
    140 rdf:type schema:MonetaryGrant
    141 sg:grant.7568977 http://pending.schema.org/fundedItem sg:pub.10.1007/s10957-022-02065-4
    142 rdf:type schema:MonetaryGrant
    143 sg:grant.8566477 http://pending.schema.org/fundedItem sg:pub.10.1007/s10957-022-02065-4
    144 rdf:type schema:MonetaryGrant
    145 sg:grant.8632080 http://pending.schema.org/fundedItem sg:pub.10.1007/s10957-022-02065-4
    146 rdf:type schema:MonetaryGrant
    147 sg:journal.1044187 schema:issn 0022-3239
    148 1573-2878
    149 schema:name Journal of Optimization Theory and Applications
    150 schema:publisher Springer Nature
    151 rdf:type schema:Periodical
    152 sg:person.01357575514.92 schema:affiliation grid-institutes:grid.189504.1
    153 schema:familyName Paschalidis
    154 schema:givenName Ioannis Ch.
    155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01357575514.92
    156 rdf:type schema:Person
    157 sg:person.016152115754.49 schema:affiliation grid-institutes:grid.189504.1
    158 schema:familyName Chen
    159 schema:givenName Ruidi
    160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016152115754.49
    161 rdf:type schema:Person
    162 sg:pub.10.1007/3-540-44581-1_15 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023106154
    163 https://doi.org/10.1007/3-540-44581-1_15
    164 rdf:type schema:CreativeWork
    165 sg:pub.10.1007/s10107-011-0494-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019393106
    166 https://doi.org/10.1007/s10107-011-0494-7
    167 rdf:type schema:CreativeWork
    168 sg:pub.10.1007/s10107-014-0819-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025153037
    169 https://doi.org/10.1007/s10107-014-0819-4
    170 rdf:type schema:CreativeWork
    171 sg:pub.10.1007/s11222-007-9033-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1008067612
    172 https://doi.org/10.1007/s11222-007-9033-z
    173 rdf:type schema:CreativeWork
    174 grid-institutes:grid.189504.1 schema:alternateName Boston University, Boston, MA, USA
    175 schema:name Boston University, Boston, MA, USA
    176 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...