Classification and regression using augmented trees View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2018-08-01

AUTHORS

Rajiv Sambasivan, Sourish Das

ABSTRACT

In this work, we present an algorithm for regression and classification tasks on big datasets using augmented tree models. Partitioning a big dataset using a tree model permits us to apply a divide and conquer strategy to classification and regression tasks. Experiments conducted as part of this study illustrate that such an approach has an important benefit. Methods associated with good accuracies on learning tasks on big datasets such as ensemble tree methods or neural networks produce models that are not interpretable. The models produced by the proposed algorithm are interpretable while being as accurate as ensemble methods such as random forests or gradient boosted trees. Model interpretation can be performed at coarse and fine granularity. This permits us to extract insights that characterize the entire dataset or a particular subset of the data. Models that are accurate and interpretable are highly desirable in many application settings. The partitions created by the algorithm also permit a divide and conquer approach to model analysis. Analysis of performance by partition helped identify problems such as possible data errors and model overfitting. More... »

PAGES

259-276

References to SciGraph publications

  • 2011-10-05. On Dynamic Generalized Linear Models with Applications in METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY
  • 2000-01. Hierarchical priors for Bayesian CART shrinkage in STATISTICS AND COMPUTING
  • 2002-07. Bayesian Treed Models in MACHINE LEARNING
  • 2001-10. Random Forests in MACHINE LEARNING
  • 2004. Introduction to Statistical Learning Theory in ADVANCED LECTURES ON MACHINE LEARNING
  • 2006-03-02. Extremely randomized trees in MACHINE LEARNING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s41060-018-0146-6

    DOI

    http://dx.doi.org/10.1007/s41060-018-0146-6

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1105949270


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, Chennai Mathematical Institute, 603103, Kelambakkam, India", 
              "id": "http://www.grid.ac/institutes/grid.444722.3", 
              "name": [
                "Department of Computer Science, Chennai Mathematical Institute, 603103, Kelambakkam, India"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sambasivan", 
            "givenName": "Rajiv", 
            "id": "sg:person.07464724630.94", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07464724630.94"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Mathematics, Chennai Mathematical Institute, 603103, Kelambakkam, India", 
              "id": "http://www.grid.ac/institutes/grid.444722.3", 
              "name": [
                "Department of Mathematics, Chennai Mathematical Institute, 603103, Kelambakkam, India"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Das", 
            "givenName": "Sourish", 
            "id": "sg:person.01300636265.37", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300636265.37"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-540-28650-9_8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1026544816", 
              "https://doi.org/10.1007/978-3-540-28650-9_8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1010933404324", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024739340", 
              "https://doi.org/10.1023/a:1010933404324"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s11009-011-9255-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021186073", 
              "https://doi.org/10.1007/s11009-011-9255-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1008980332240", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022251304", 
              "https://doi.org/10.1023/a:1008980332240"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10994-006-6226-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007730804", 
              "https://doi.org/10.1007/s10994-006-6226-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1013916107446", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1022913587", 
              "https://doi.org/10.1023/a:1013916107446"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-08-01", 
        "datePublishedReg": "2018-08-01", 
        "description": "In this work, we present an algorithm for regression and classification tasks on big datasets using augmented tree models. Partitioning a big dataset using a tree model permits us to apply a divide and conquer strategy to classification and regression tasks. Experiments conducted as part of this study illustrate that such an approach has an important benefit. Methods associated with good accuracies on learning tasks on big datasets such as ensemble tree methods or neural networks produce models that are not interpretable. The models produced by the proposed algorithm are interpretable while being as accurate as ensemble methods such as random forests or gradient boosted trees. Model interpretation can be performed at coarse and fine granularity. This permits us to extract insights that characterize the entire dataset or a particular subset of the data. Models that are accurate and interpretable are highly desirable in many application settings. The partitions created by the algorithm also permit a divide and conquer approach to model analysis. Analysis of performance by partition helped identify problems such as possible data errors and model overfitting.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s41060-018-0146-6", 
        "inLanguage": "en", 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1156617", 
            "issn": [
              "2364-415X", 
              "2364-4168"
            ], 
            "name": "International Journal of Data Science and Analytics", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "4", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "7"
          }
        ], 
        "keywords": [
          "big datasets", 
          "possible data errors", 
          "tree model", 
          "regression tasks", 
          "neural network", 
          "classification task", 
          "fine granularity", 
          "application settings", 
          "ensemble method", 
          "random forest", 
          "model overfitting", 
          "entire dataset", 
          "analysis of performance", 
          "algorithm", 
          "datasets", 
          "data errors", 
          "tree method", 
          "task", 
          "good accuracy", 
          "model interpretation", 
          "partition", 
          "classification", 
          "overfitting", 
          "granularity", 
          "important benefits", 
          "network", 
          "divide", 
          "trees", 
          "model", 
          "method", 
          "accuracy", 
          "performance", 
          "error", 
          "particular subset", 
          "work", 
          "experiments", 
          "subset", 
          "data", 
          "benefits", 
          "strategies", 
          "model analysis", 
          "analysis", 
          "regression", 
          "part", 
          "setting", 
          "interpretation", 
          "insights", 
          "forest", 
          "gradient", 
          "study", 
          "approach", 
          "problem", 
          "ensemble tree methods"
        ], 
        "name": "Classification and regression using augmented trees", 
        "pagination": "259-276", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1105949270"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s41060-018-0146-6"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s41060-018-0146-6", 
          "https://app.dimensions.ai/details/publication/pub.1105949270"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2021-11-01T18:31", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/article/article_758.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s41060-018-0146-6"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s41060-018-0146-6'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s41060-018-0146-6'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s41060-018-0146-6'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s41060-018-0146-6'


     

    This table displays all metadata directly associated to this object as RDF triples.

    144 TRIPLES      22 PREDICATES      84 URIs      70 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s41060-018-0146-6 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N1576b6e311414322b75bc6602cca1e48
    4 schema:citation sg:pub.10.1007/978-3-540-28650-9_8
    5 sg:pub.10.1007/s10994-006-6226-1
    6 sg:pub.10.1007/s11009-011-9255-6
    7 sg:pub.10.1023/a:1008980332240
    8 sg:pub.10.1023/a:1010933404324
    9 sg:pub.10.1023/a:1013916107446
    10 schema:datePublished 2018-08-01
    11 schema:datePublishedReg 2018-08-01
    12 schema:description In this work, we present an algorithm for regression and classification tasks on big datasets using augmented tree models. Partitioning a big dataset using a tree model permits us to apply a divide and conquer strategy to classification and regression tasks. Experiments conducted as part of this study illustrate that such an approach has an important benefit. Methods associated with good accuracies on learning tasks on big datasets such as ensemble tree methods or neural networks produce models that are not interpretable. The models produced by the proposed algorithm are interpretable while being as accurate as ensemble methods such as random forests or gradient boosted trees. Model interpretation can be performed at coarse and fine granularity. This permits us to extract insights that characterize the entire dataset or a particular subset of the data. Models that are accurate and interpretable are highly desirable in many application settings. The partitions created by the algorithm also permit a divide and conquer approach to model analysis. Analysis of performance by partition helped identify problems such as possible data errors and model overfitting.
    13 schema:genre article
    14 schema:inLanguage en
    15 schema:isAccessibleForFree false
    16 schema:isPartOf N2b51d124f6144f5c835403b1bb91296d
    17 N782fbc768ea749da9c255b0515438690
    18 sg:journal.1156617
    19 schema:keywords accuracy
    20 algorithm
    21 analysis
    22 analysis of performance
    23 application settings
    24 approach
    25 benefits
    26 big datasets
    27 classification
    28 classification task
    29 data
    30 data errors
    31 datasets
    32 divide
    33 ensemble method
    34 ensemble tree methods
    35 entire dataset
    36 error
    37 experiments
    38 fine granularity
    39 forest
    40 good accuracy
    41 gradient
    42 granularity
    43 important benefits
    44 insights
    45 interpretation
    46 method
    47 model
    48 model analysis
    49 model interpretation
    50 model overfitting
    51 network
    52 neural network
    53 overfitting
    54 part
    55 particular subset
    56 partition
    57 performance
    58 possible data errors
    59 problem
    60 random forest
    61 regression
    62 regression tasks
    63 setting
    64 strategies
    65 study
    66 subset
    67 task
    68 tree method
    69 tree model
    70 trees
    71 work
    72 schema:name Classification and regression using augmented trees
    73 schema:pagination 259-276
    74 schema:productId N804bd128b6c947bba63614a874f9cb44
    75 N9dc9a495162444619b6793baeb55794a
    76 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105949270
    77 https://doi.org/10.1007/s41060-018-0146-6
    78 schema:sdDatePublished 2021-11-01T18:31
    79 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    80 schema:sdPublisher N836266e36c5d43e2b5f76d35fb191644
    81 schema:url https://doi.org/10.1007/s41060-018-0146-6
    82 sgo:license sg:explorer/license/
    83 sgo:sdDataset articles
    84 rdf:type schema:ScholarlyArticle
    85 N1576b6e311414322b75bc6602cca1e48 rdf:first sg:person.07464724630.94
    86 rdf:rest N1c236b9306c1421b94ec4fe9b456d270
    87 N1c236b9306c1421b94ec4fe9b456d270 rdf:first sg:person.01300636265.37
    88 rdf:rest rdf:nil
    89 N2b51d124f6144f5c835403b1bb91296d schema:volumeNumber 7
    90 rdf:type schema:PublicationVolume
    91 N782fbc768ea749da9c255b0515438690 schema:issueNumber 4
    92 rdf:type schema:PublicationIssue
    93 N804bd128b6c947bba63614a874f9cb44 schema:name doi
    94 schema:value 10.1007/s41060-018-0146-6
    95 rdf:type schema:PropertyValue
    96 N836266e36c5d43e2b5f76d35fb191644 schema:name Springer Nature - SN SciGraph project
    97 rdf:type schema:Organization
    98 N9dc9a495162444619b6793baeb55794a schema:name dimensions_id
    99 schema:value pub.1105949270
    100 rdf:type schema:PropertyValue
    101 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    102 schema:name Information and Computing Sciences
    103 rdf:type schema:DefinedTerm
    104 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    105 schema:name Artificial Intelligence and Image Processing
    106 rdf:type schema:DefinedTerm
    107 sg:journal.1156617 schema:issn 2364-415X
    108 2364-4168
    109 schema:name International Journal of Data Science and Analytics
    110 schema:publisher Springer Nature
    111 rdf:type schema:Periodical
    112 sg:person.01300636265.37 schema:affiliation grid-institutes:grid.444722.3
    113 schema:familyName Das
    114 schema:givenName Sourish
    115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300636265.37
    116 rdf:type schema:Person
    117 sg:person.07464724630.94 schema:affiliation grid-institutes:grid.444722.3
    118 schema:familyName Sambasivan
    119 schema:givenName Rajiv
    120 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07464724630.94
    121 rdf:type schema:Person
    122 sg:pub.10.1007/978-3-540-28650-9_8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026544816
    123 https://doi.org/10.1007/978-3-540-28650-9_8
    124 rdf:type schema:CreativeWork
    125 sg:pub.10.1007/s10994-006-6226-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007730804
    126 https://doi.org/10.1007/s10994-006-6226-1
    127 rdf:type schema:CreativeWork
    128 sg:pub.10.1007/s11009-011-9255-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021186073
    129 https://doi.org/10.1007/s11009-011-9255-6
    130 rdf:type schema:CreativeWork
    131 sg:pub.10.1023/a:1008980332240 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022251304
    132 https://doi.org/10.1023/a:1008980332240
    133 rdf:type schema:CreativeWork
    134 sg:pub.10.1023/a:1010933404324 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024739340
    135 https://doi.org/10.1023/a:1010933404324
    136 rdf:type schema:CreativeWork
    137 sg:pub.10.1023/a:1013916107446 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022913587
    138 https://doi.org/10.1023/a:1013916107446
    139 rdf:type schema:CreativeWork
    140 grid-institutes:grid.444722.3 schema:alternateName Department of Computer Science, Chennai Mathematical Institute, 603103, Kelambakkam, India
    141 Department of Mathematics, Chennai Mathematical Institute, 603103, Kelambakkam, India
    142 schema:name Department of Computer Science, Chennai Mathematical Institute, 603103, Kelambakkam, India
    143 Department of Mathematics, Chennai Mathematical Institute, 603103, Kelambakkam, India
    144 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...