A high-dimensional M-estimator framework for bi-level variable selection View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2021-09-09

AUTHORS

Bin Luo, Xiaoli Gao

ABSTRACT

In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings. More... »

PAGES

559-579

References to SciGraph publications

  • 1996-12. Oncogenic regulation and function of keratins 8 and 18 in CANCER AND METASTASIS REVIEWS
  • 2012-12-21. Gradient methods for minimizing composite functions in MATHEMATICAL PROGRAMMING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s10463-021-00809-z

    DOI

    http://dx.doi.org/10.1007/s10463-021-00809-z

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1141002982


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Statistics", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, 27705, Durham, NC, United States", 
              "id": "http://www.grid.ac/institutes/grid.26009.3d", 
              "name": [
                "Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, 27705, Durham, NC, United States"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Luo", 
            "givenName": "Bin", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Mathematics and Statistics, The University of North Carolina at Greensboro, 116 Petty Building, 27402, Greensboro, NC, United States", 
              "id": "http://www.grid.ac/institutes/grid.266860.c", 
              "name": [
                "Department of Mathematics and Statistics, The University of North Carolina at Greensboro, 116 Petty Building, 27402, Greensboro, NC, United States"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gao", 
            "givenName": "Xiaoli", 
            "id": "sg:person.011504317322.11", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011504317322.11"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/bf00054012", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1004733785", 
              "https://doi.org/10.1007/bf00054012"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10107-012-0629-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1000563802", 
              "https://doi.org/10.1007/s10107-012-0629-5"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2021-09-09", 
        "datePublishedReg": "2021-09-09", 
        "description": "In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s10463-021-00809-z", 
        "inLanguage": "en", 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1041657", 
            "issn": [
              "0020-3157", 
              "1572-9052"
            ], 
            "name": "Annals of the Institute of Statistical Mathematics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "74"
          }
        ], 
        "keywords": [
          "bi-level variable selection", 
          "variable selection", 
          "high-dimensional data analysis", 
          "satisfactory finite sample performance", 
          "non-convex penalty function", 
          "finite sample performance", 
          "variable selection consistency", 
          "real data analysis", 
          "selection consistency", 
          "M-estimation", 
          "M-estimators", 
          "estimation consistency", 
          "sufficient conditions", 
          "sample performance", 
          "penalty function", 
          "tailed distribution", 
          "random errors", 
          "simulation study", 
          "sparsity", 
          "irregular settings", 
          "data analysis", 
          "two-stage procedure", 
          "estimator", 
          "two-stage", 
          "framework", 
          "theory", 
          "outliers", 
          "error", 
          "covariates", 
          "model", 
          "distribution", 
          "selection", 
          "such cases", 
          "consistency", 
          "function", 
          "ideal model", 
          "analysis", 
          "performance", 
          "cases", 
          "certain groups", 
          "conditions", 
          "procedure", 
          "data", 
          "group level", 
          "setting", 
          "levels", 
          "study", 
          "group", 
          "paper"
        ], 
        "name": "A high-dimensional M-estimator framework for bi-level variable selection", 
        "pagination": "559-579", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1141002982"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s10463-021-00809-z"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s10463-021-00809-z", 
          "https://app.dimensions.ai/details/publication/pub.1141002982"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-05-20T07:38", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/article/article_880.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s10463-021-00809-z"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10463-021-00809-z'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10463-021-00809-z'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10463-021-00809-z'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10463-021-00809-z'


     

    This table displays all metadata directly associated to this object as RDF triples.

    124 TRIPLES      22 PREDICATES      76 URIs      66 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s10463-021-00809-z schema:about anzsrc-for:01
    2 anzsrc-for:0104
    3 schema:author N8afa6c90bd40432dba4641ac17e95fd9
    4 schema:citation sg:pub.10.1007/bf00054012
    5 sg:pub.10.1007/s10107-012-0629-5
    6 schema:datePublished 2021-09-09
    7 schema:datePublishedReg 2021-09-09
    8 schema:description In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings.
    9 schema:genre article
    10 schema:inLanguage en
    11 schema:isAccessibleForFree true
    12 schema:isPartOf N4412efe9183b463e8340b8faa12d53a8
    13 N8fae0360107944948e54537360f808dc
    14 sg:journal.1041657
    15 schema:keywords M-estimation
    16 M-estimators
    17 analysis
    18 bi-level variable selection
    19 cases
    20 certain groups
    21 conditions
    22 consistency
    23 covariates
    24 data
    25 data analysis
    26 distribution
    27 error
    28 estimation consistency
    29 estimator
    30 finite sample performance
    31 framework
    32 function
    33 group
    34 group level
    35 high-dimensional data analysis
    36 ideal model
    37 irregular settings
    38 levels
    39 model
    40 non-convex penalty function
    41 outliers
    42 paper
    43 penalty function
    44 performance
    45 procedure
    46 random errors
    47 real data analysis
    48 sample performance
    49 satisfactory finite sample performance
    50 selection
    51 selection consistency
    52 setting
    53 simulation study
    54 sparsity
    55 study
    56 such cases
    57 sufficient conditions
    58 tailed distribution
    59 theory
    60 two-stage
    61 two-stage procedure
    62 variable selection
    63 variable selection consistency
    64 schema:name A high-dimensional M-estimator framework for bi-level variable selection
    65 schema:pagination 559-579
    66 schema:productId N073fe952f58e44c89e74f3a019cc8985
    67 Nfafbbc573b3c457ba2b8b293e6bf8881
    68 schema:sameAs https://app.dimensions.ai/details/publication/pub.1141002982
    69 https://doi.org/10.1007/s10463-021-00809-z
    70 schema:sdDatePublished 2022-05-20T07:38
    71 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    72 schema:sdPublisher N35fd480f795c4350836ab6d98545e488
    73 schema:url https://doi.org/10.1007/s10463-021-00809-z
    74 sgo:license sg:explorer/license/
    75 sgo:sdDataset articles
    76 rdf:type schema:ScholarlyArticle
    77 N073fe952f58e44c89e74f3a019cc8985 schema:name doi
    78 schema:value 10.1007/s10463-021-00809-z
    79 rdf:type schema:PropertyValue
    80 N1f6d7b4f753348948c45e94037c3d52f rdf:first sg:person.011504317322.11
    81 rdf:rest rdf:nil
    82 N35fd480f795c4350836ab6d98545e488 schema:name Springer Nature - SN SciGraph project
    83 rdf:type schema:Organization
    84 N4412efe9183b463e8340b8faa12d53a8 schema:issueNumber 3
    85 rdf:type schema:PublicationIssue
    86 N8afa6c90bd40432dba4641ac17e95fd9 rdf:first Ne634bebe54fa463c8151161115ee8c09
    87 rdf:rest N1f6d7b4f753348948c45e94037c3d52f
    88 N8fae0360107944948e54537360f808dc schema:volumeNumber 74
    89 rdf:type schema:PublicationVolume
    90 Ne634bebe54fa463c8151161115ee8c09 schema:affiliation grid-institutes:grid.26009.3d
    91 schema:familyName Luo
    92 schema:givenName Bin
    93 rdf:type schema:Person
    94 Nfafbbc573b3c457ba2b8b293e6bf8881 schema:name dimensions_id
    95 schema:value pub.1141002982
    96 rdf:type schema:PropertyValue
    97 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    98 schema:name Mathematical Sciences
    99 rdf:type schema:DefinedTerm
    100 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
    101 schema:name Statistics
    102 rdf:type schema:DefinedTerm
    103 sg:journal.1041657 schema:issn 0020-3157
    104 1572-9052
    105 schema:name Annals of the Institute of Statistical Mathematics
    106 schema:publisher Springer Nature
    107 rdf:type schema:Periodical
    108 sg:person.011504317322.11 schema:affiliation grid-institutes:grid.266860.c
    109 schema:familyName Gao
    110 schema:givenName Xiaoli
    111 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011504317322.11
    112 rdf:type schema:Person
    113 sg:pub.10.1007/bf00054012 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004733785
    114 https://doi.org/10.1007/bf00054012
    115 rdf:type schema:CreativeWork
    116 sg:pub.10.1007/s10107-012-0629-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000563802
    117 https://doi.org/10.1007/s10107-012-0629-5
    118 rdf:type schema:CreativeWork
    119 grid-institutes:grid.26009.3d schema:alternateName Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, 27705, Durham, NC, United States
    120 schema:name Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, 27705, Durham, NC, United States
    121 rdf:type schema:Organization
    122 grid-institutes:grid.266860.c schema:alternateName Department of Mathematics and Statistics, The University of North Carolina at Greensboro, 116 Petty Building, 27402, Greensboro, NC, United States
    123 schema:name Department of Mathematics and Statistics, The University of North Carolina at Greensboro, 116 Petty Building, 27402, Greensboro, NC, United States
    124 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...