Scaling Up Inductive Learning with Massive Parallelism View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

1996-04

AUTHORS

Foster John Provost, John M. Aronis

ABSTRACT

Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research. More... »

PAGES

33-46

References to SciGraph publications

  • 1995-07. Inductive policy: The pragmatics of bias selection in MACHINE LEARNING
  • 1993-04. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets in MACHINE LEARNING
  • 1989-11. Incremental Induction of Decision Trees in MACHINE LEARNING
  • 1987-12. Parallel depth first search. Part II. Analysis in INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1023/a:1018086232231

    DOI

    http://dx.doi.org/10.1023/a:1018086232231

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1021829887


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "name": [
                "NYNEX Science and Technology, 400 Westchester Avenue, 10604, White Plains, NY"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Provost", 
            "givenName": "Foster John", 
            "id": "sg:person.07501646413.35", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501646413.35"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Pittsburgh", 
              "id": "https://www.grid.ac/institutes/grid.21925.3d", 
              "name": [
                "Intelligent Systems Laboratory, University of Pittsburgh, 15260, Pittsburgh, PA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Aronis", 
            "givenName": "John M.", 
            "id": "sg:person.01264231511.58", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01264231511.58"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1023/a:1022631118932", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006996698", 
              "https://doi.org/10.1023/a:1022631118932"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1045343.1045371", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016072729"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1022699900025", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021865543", 
              "https://doi.org/10.1023/a:1022699900025"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(93)90003-t", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035829989"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(93)90003-t", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1035829989"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01389001", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041813370", 
              "https://doi.org/10.1007/bf01389001"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(93)90002-s", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043048074"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0004-3702(93)90002-s", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043048074"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/b978-1-55860-307-3.50017-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046143006"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf00993474", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1051938845", 
              "https://doi.org/10.1007/bf00993474"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/b978-1-55860-247-2.50012-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1052483883"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/mc.1987.1663360", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061386271"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1142/s0218213093000102", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062965121"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://app.dimensions.ai/details/publication/pub.1082424119", 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tai.1990.130305", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1086281611"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "1996-04", 
        "datePublishedReg": "1996-04-01", 
        "description": "Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1023/a:1018086232231", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": [
          {
            "id": "sg:journal.1125588", 
            "issn": [
              "0885-6125", 
              "1573-0565"
            ], 
            "name": "Machine Learning", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "23"
          }
        ], 
        "name": "Scaling Up Inductive Learning with Massive Parallelism", 
        "pagination": "33-46", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "680171cea85d0e63df40409e47767d3feeda4658b0b25abb38e99d0a40d80b1c"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1023/a:1018086232231"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1021829887"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1023/a:1018086232231", 
          "https://app.dimensions.ai/details/publication/pub.1021829887"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-10T13:14", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8659_00000505.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "http://link.springer.com/10.1023/A:1018086232231"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1018086232231'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1018086232231'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1018086232231'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1018086232231'


     

    This table displays all metadata directly associated to this object as RDF triples.

    112 TRIPLES      21 PREDICATES      40 URIs      19 LITERALS      7 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1023/a:1018086232231 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Na1227eec7ae74b378c5a0ecf8247e38a
    4 schema:citation sg:pub.10.1007/bf00993474
    5 sg:pub.10.1007/bf01389001
    6 sg:pub.10.1023/a:1022631118932
    7 sg:pub.10.1023/a:1022699900025
    8 https://app.dimensions.ai/details/publication/pub.1082424119
    9 https://doi.org/10.1016/0004-3702(93)90002-s
    10 https://doi.org/10.1016/0004-3702(93)90003-t
    11 https://doi.org/10.1016/b978-1-55860-247-2.50012-7
    12 https://doi.org/10.1016/b978-1-55860-307-3.50017-4
    13 https://doi.org/10.1109/mc.1987.1663360
    14 https://doi.org/10.1109/tai.1990.130305
    15 https://doi.org/10.1142/s0218213093000102
    16 https://doi.org/10.1145/1045343.1045371
    17 schema:datePublished 1996-04
    18 schema:datePublishedReg 1996-04-01
    19 schema:description Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.
    20 schema:genre research_article
    21 schema:inLanguage en
    22 schema:isAccessibleForFree true
    23 schema:isPartOf N197e34f4b8d242979359be42fd6c48e7
    24 Ncc0ca23f0686461cac0adeec0bb7ead6
    25 sg:journal.1125588
    26 schema:name Scaling Up Inductive Learning with Massive Parallelism
    27 schema:pagination 33-46
    28 schema:productId N9a6e75ebef764b83a4d3e2f72b41019c
    29 Na5c7e3d6960b41a983f90036445cb01a
    30 Ncb75c75103f7400c88889e008aac1260
    31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021829887
    32 https://doi.org/10.1023/a:1018086232231
    33 schema:sdDatePublished 2019-04-10T13:14
    34 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    35 schema:sdPublisher Nb4e2b1abaa2e4702a00b1d7ca2173fb6
    36 schema:url http://link.springer.com/10.1023/A:1018086232231
    37 sgo:license sg:explorer/license/
    38 sgo:sdDataset articles
    39 rdf:type schema:ScholarlyArticle
    40 N197e34f4b8d242979359be42fd6c48e7 schema:issueNumber 1
    41 rdf:type schema:PublicationIssue
    42 N3627831ea37e4a50b60297cb9ba93e82 schema:name NYNEX Science and Technology, 400 Westchester Avenue, 10604, White Plains, NY
    43 rdf:type schema:Organization
    44 N9a6e75ebef764b83a4d3e2f72b41019c schema:name readcube_id
    45 schema:value 680171cea85d0e63df40409e47767d3feeda4658b0b25abb38e99d0a40d80b1c
    46 rdf:type schema:PropertyValue
    47 Na1227eec7ae74b378c5a0ecf8247e38a rdf:first sg:person.07501646413.35
    48 rdf:rest Nc69c5b487b644a52aefc20f1f8d36118
    49 Na5c7e3d6960b41a983f90036445cb01a schema:name dimensions_id
    50 schema:value pub.1021829887
    51 rdf:type schema:PropertyValue
    52 Nb4e2b1abaa2e4702a00b1d7ca2173fb6 schema:name Springer Nature - SN SciGraph project
    53 rdf:type schema:Organization
    54 Nc69c5b487b644a52aefc20f1f8d36118 rdf:first sg:person.01264231511.58
    55 rdf:rest rdf:nil
    56 Ncb75c75103f7400c88889e008aac1260 schema:name doi
    57 schema:value 10.1023/a:1018086232231
    58 rdf:type schema:PropertyValue
    59 Ncc0ca23f0686461cac0adeec0bb7ead6 schema:volumeNumber 23
    60 rdf:type schema:PublicationVolume
    61 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    62 schema:name Information and Computing Sciences
    63 rdf:type schema:DefinedTerm
    64 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    65 schema:name Artificial Intelligence and Image Processing
    66 rdf:type schema:DefinedTerm
    67 sg:journal.1125588 schema:issn 0885-6125
    68 1573-0565
    69 schema:name Machine Learning
    70 rdf:type schema:Periodical
    71 sg:person.01264231511.58 schema:affiliation https://www.grid.ac/institutes/grid.21925.3d
    72 schema:familyName Aronis
    73 schema:givenName John M.
    74 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01264231511.58
    75 rdf:type schema:Person
    76 sg:person.07501646413.35 schema:affiliation N3627831ea37e4a50b60297cb9ba93e82
    77 schema:familyName Provost
    78 schema:givenName Foster John
    79 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07501646413.35
    80 rdf:type schema:Person
    81 sg:pub.10.1007/bf00993474 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051938845
    82 https://doi.org/10.1007/bf00993474
    83 rdf:type schema:CreativeWork
    84 sg:pub.10.1007/bf01389001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041813370
    85 https://doi.org/10.1007/bf01389001
    86 rdf:type schema:CreativeWork
    87 sg:pub.10.1023/a:1022631118932 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006996698
    88 https://doi.org/10.1023/a:1022631118932
    89 rdf:type schema:CreativeWork
    90 sg:pub.10.1023/a:1022699900025 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021865543
    91 https://doi.org/10.1023/a:1022699900025
    92 rdf:type schema:CreativeWork
    93 https://app.dimensions.ai/details/publication/pub.1082424119 schema:CreativeWork
    94 https://doi.org/10.1016/0004-3702(93)90002-s schema:sameAs https://app.dimensions.ai/details/publication/pub.1043048074
    95 rdf:type schema:CreativeWork
    96 https://doi.org/10.1016/0004-3702(93)90003-t schema:sameAs https://app.dimensions.ai/details/publication/pub.1035829989
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1016/b978-1-55860-247-2.50012-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052483883
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.1016/b978-1-55860-307-3.50017-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046143006
    101 rdf:type schema:CreativeWork
    102 https://doi.org/10.1109/mc.1987.1663360 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061386271
    103 rdf:type schema:CreativeWork
    104 https://doi.org/10.1109/tai.1990.130305 schema:sameAs https://app.dimensions.ai/details/publication/pub.1086281611
    105 rdf:type schema:CreativeWork
    106 https://doi.org/10.1142/s0218213093000102 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062965121
    107 rdf:type schema:CreativeWork
    108 https://doi.org/10.1145/1045343.1045371 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016072729
    109 rdf:type schema:CreativeWork
    110 https://www.grid.ac/institutes/grid.21925.3d schema:alternateName University of Pittsburgh
    111 schema:name Intelligent Systems Laboratory, University of Pittsburgh, 15260, Pittsburgh, PA
    112 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...