Computing LTS Regression for Large Data Sets View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2006-01

AUTHORS

PETER J. ROUSSEEUW, KATRIEN VAN DRIESSEN

ABSTRACT

Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call ‘selective iteration’ and ‘nested extensions’. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases. More... »

PAGES

29-45

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s10618-005-0024-4

DOI

http://dx.doi.org/10.1007/s10618-005-0024-4

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1014859262


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Antwerp", 
          "id": "https://www.grid.ac/institutes/grid.5284.b", 
          "name": [
            "Department of Mathematics and Computer Science, Universiteit Antwerpen, Middelheimlaan 1, B-2020, Antwerpen, Belgium"
          ], 
          "type": "Organization"
        }, 
        "familyName": "ROUSSEEUW", 
        "givenName": "PETER J.", 
        "id": "sg:person.0775337371.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0775337371.63"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Antwerp", 
          "id": "https://www.grid.ac/institutes/grid.5284.b", 
          "name": [
            "Faculty of Applied Economics, Universiteit Antwerpen, Prinsstraat 13, B-2000, Antwerpen, Belgium"
          ], 
          "type": "Organization"
        }, 
        "familyName": "VAN DRIESSEN", 
        "givenName": "KATRIEN", 
        "id": "sg:person.016127315362.74", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016127315362.74"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1023/a:1009783824328", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000292060", 
          "https://doi.org/10.1023/a:1009783824328"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0167-9473(92)00070-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003170913"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0166-218x(86)90009-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019981143"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1009769707641", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027035492", 
          "https://doi.org/10.1023/a:1009769707641"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00127126", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027830068", 
          "https://doi.org/10.1007/bf00127126"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00127126", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027830068", 
          "https://doi.org/10.1007/bf00127126"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-0-444-87877-9.50039-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1043141706"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-9473(98)00082-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049486548"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/00401706.1997.10485436", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058287505"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/00401706.1999.10485670", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058287776"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1984.10477105", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058302950"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1990.10474920", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058303860"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1992.10475224", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058304253"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1993.10476352", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058304436"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1994.10476821", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058304685"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/59.496203", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061193891"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/59.76693", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061194675"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1137/0914076", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1062857597"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/aos/1176350366", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1064409134"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1214/lnms/1215454133", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1086780837"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006-01", 
    "datePublishedReg": "2006-01-01", 
    "description": "Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call \u2018selective iteration\u2019 and \u2018nested extensions\u2019. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s10618-005-0024-4", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1041853", 
        "issn": [
          "1384-5810", 
          "1573-756X"
        ], 
        "name": "Data Mining and Knowledge Discovery", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "name": "Computing LTS Regression for Large Data Sets", 
    "pagination": "29-45", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "46cb73b5059cdbd2c1f346742407baf8aac6d2e50d3f402b4d48e810a0ffefd3"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s10618-005-0024-4"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1014859262"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s10618-005-0024-4", 
      "https://app.dimensions.ai/details/publication/pub.1014859262"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T14:14", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8660_00000531.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007%2Fs10618-005-0024-4"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0024-4'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0024-4'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0024-4'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10618-005-0024-4'


 

This table displays all metadata directly associated to this object as RDF triples.

129 TRIPLES      21 PREDICATES      46 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s10618-005-0024-4 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nddc78b4e37004cefab52e9b085f435d0
4 schema:citation sg:pub.10.1007/bf00127126
5 sg:pub.10.1023/a:1009769707641
6 sg:pub.10.1023/a:1009783824328
7 https://doi.org/10.1016/0166-218x(86)90009-0
8 https://doi.org/10.1016/0167-9473(92)00070-8
9 https://doi.org/10.1016/b978-0-444-87877-9.50039-x
10 https://doi.org/10.1016/s0167-9473(98)00082-6
11 https://doi.org/10.1080/00401706.1997.10485436
12 https://doi.org/10.1080/00401706.1999.10485670
13 https://doi.org/10.1080/01621459.1984.10477105
14 https://doi.org/10.1080/01621459.1990.10474920
15 https://doi.org/10.1080/01621459.1992.10475224
16 https://doi.org/10.1080/01621459.1993.10476352
17 https://doi.org/10.1080/01621459.1994.10476821
18 https://doi.org/10.1109/59.496203
19 https://doi.org/10.1109/59.76693
20 https://doi.org/10.1137/0914076
21 https://doi.org/10.1214/aos/1176350366
22 https://doi.org/10.1214/lnms/1215454133
23 schema:datePublished 2006-01
24 schema:datePublishedReg 2006-01-01
25 schema:description Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call ‘selective iteration’ and ‘nested extensions’. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases.
26 schema:genre research_article
27 schema:inLanguage en
28 schema:isAccessibleForFree false
29 schema:isPartOf N7b5c9b7e0c7740fdb19d5b0cff0f6e62
30 N9ce13d7b616a4162b75846bd38890929
31 sg:journal.1041853
32 schema:name Computing LTS Regression for Large Data Sets
33 schema:pagination 29-45
34 schema:productId N37b0ebc3b725474ba9ac31d270b66fe8
35 N4522020e06ea4f1281974b1c6fe705d9
36 Ndc047f7f899c43c3baa44f6dda4a5f57
37 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014859262
38 https://doi.org/10.1007/s10618-005-0024-4
39 schema:sdDatePublished 2019-04-10T14:14
40 schema:sdLicense https://scigraph.springernature.com/explorer/license/
41 schema:sdPublisher N3e01127ace084f958a8890168a295cf5
42 schema:url http://link.springer.com/10.1007%2Fs10618-005-0024-4
43 sgo:license sg:explorer/license/
44 sgo:sdDataset articles
45 rdf:type schema:ScholarlyArticle
46 N37b0ebc3b725474ba9ac31d270b66fe8 schema:name doi
47 schema:value 10.1007/s10618-005-0024-4
48 rdf:type schema:PropertyValue
49 N3e01127ace084f958a8890168a295cf5 schema:name Springer Nature - SN SciGraph project
50 rdf:type schema:Organization
51 N4522020e06ea4f1281974b1c6fe705d9 schema:name dimensions_id
52 schema:value pub.1014859262
53 rdf:type schema:PropertyValue
54 N7b5c9b7e0c7740fdb19d5b0cff0f6e62 schema:volumeNumber 12
55 rdf:type schema:PublicationVolume
56 N9ce13d7b616a4162b75846bd38890929 schema:issueNumber 1
57 rdf:type schema:PublicationIssue
58 Nb307c534682a4230a476db12249f1a81 rdf:first sg:person.016127315362.74
59 rdf:rest rdf:nil
60 Ndc047f7f899c43c3baa44f6dda4a5f57 schema:name readcube_id
61 schema:value 46cb73b5059cdbd2c1f346742407baf8aac6d2e50d3f402b4d48e810a0ffefd3
62 rdf:type schema:PropertyValue
63 Nddc78b4e37004cefab52e9b085f435d0 rdf:first sg:person.0775337371.63
64 rdf:rest Nb307c534682a4230a476db12249f1a81
65 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
66 schema:name Information and Computing Sciences
67 rdf:type schema:DefinedTerm
68 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
69 schema:name Artificial Intelligence and Image Processing
70 rdf:type schema:DefinedTerm
71 sg:journal.1041853 schema:issn 1384-5810
72 1573-756X
73 schema:name Data Mining and Knowledge Discovery
74 rdf:type schema:Periodical
75 sg:person.016127315362.74 schema:affiliation https://www.grid.ac/institutes/grid.5284.b
76 schema:familyName VAN DRIESSEN
77 schema:givenName KATRIEN
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016127315362.74
79 rdf:type schema:Person
80 sg:person.0775337371.63 schema:affiliation https://www.grid.ac/institutes/grid.5284.b
81 schema:familyName ROUSSEEUW
82 schema:givenName PETER J.
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0775337371.63
84 rdf:type schema:Person
85 sg:pub.10.1007/bf00127126 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027830068
86 https://doi.org/10.1007/bf00127126
87 rdf:type schema:CreativeWork
88 sg:pub.10.1023/a:1009769707641 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027035492
89 https://doi.org/10.1023/a:1009769707641
90 rdf:type schema:CreativeWork
91 sg:pub.10.1023/a:1009783824328 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000292060
92 https://doi.org/10.1023/a:1009783824328
93 rdf:type schema:CreativeWork
94 https://doi.org/10.1016/0166-218x(86)90009-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019981143
95 rdf:type schema:CreativeWork
96 https://doi.org/10.1016/0167-9473(92)00070-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003170913
97 rdf:type schema:CreativeWork
98 https://doi.org/10.1016/b978-0-444-87877-9.50039-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1043141706
99 rdf:type schema:CreativeWork
100 https://doi.org/10.1016/s0167-9473(98)00082-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049486548
101 rdf:type schema:CreativeWork
102 https://doi.org/10.1080/00401706.1997.10485436 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058287505
103 rdf:type schema:CreativeWork
104 https://doi.org/10.1080/00401706.1999.10485670 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058287776
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1080/01621459.1984.10477105 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058302950
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1080/01621459.1990.10474920 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058303860
109 rdf:type schema:CreativeWork
110 https://doi.org/10.1080/01621459.1992.10475224 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058304253
111 rdf:type schema:CreativeWork
112 https://doi.org/10.1080/01621459.1993.10476352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058304436
113 rdf:type schema:CreativeWork
114 https://doi.org/10.1080/01621459.1994.10476821 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058304685
115 rdf:type schema:CreativeWork
116 https://doi.org/10.1109/59.496203 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061193891
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1109/59.76693 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061194675
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1137/0914076 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062857597
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1214/aos/1176350366 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064409134
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1214/lnms/1215454133 schema:sameAs https://app.dimensions.ai/details/publication/pub.1086780837
125 rdf:type schema:CreativeWork
126 https://www.grid.ac/institutes/grid.5284.b schema:alternateName University of Antwerp
127 schema:name Department of Mathematics and Computer Science, Universiteit Antwerpen, Middelheimlaan 1, B-2020, Antwerpen, Belgium
128 Faculty of Applied Economics, Universiteit Antwerpen, Prinsstraat 13, B-2000, Antwerpen, Belgium
129 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...