Discretization: An Enabling Technique View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2002-10

AUTHORS

Huan Liu, Farhad Hussain, Chew Lim Tan, Manoranjan Dash

ABSTRACT

Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization. More... »

PAGES

393-423

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1023/a:1016304305535

DOI

http://dx.doi.org/10.1023/a:1016304305535

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1051417652


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "National University of Singapore", 
          "id": "https://www.grid.ac/institutes/grid.4280.e", 
          "name": [
            "School of Computing, National University of Singapore, Singapore"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Liu", 
        "givenName": "Huan", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National University of Singapore", 
          "id": "https://www.grid.ac/institutes/grid.4280.e", 
          "name": [
            "School of Computing, National University of Singapore, Singapore"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hussain", 
        "givenName": "Farhad", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National University of Singapore", 
          "id": "https://www.grid.ac/institutes/grid.4280.e", 
          "name": [
            "School of Computing, National University of Singapore, Singapore"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tan", 
        "givenName": "Chew Lim", 
        "id": "sg:person.01120373366.14", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01120373366.14"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National University of Singapore", 
          "id": "https://www.grid.ac/institutes/grid.4280.e", 
          "name": [
            "School of Computing, National University of Singapore, Singapore"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Dash", 
        "givenName": "Manoranjan", 
        "id": "sg:person.012062261630.38", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012062261630.38"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/bfb0095274", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002863066", 
          "https://doi.org/10.1007/bfb0095274"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1022631118932", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006996698", 
          "https://doi.org/10.1023/a:1022631118932"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/180139.181016", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007088478"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00994007", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018294482", 
          "https://doi.org/10.1007/bf00994007"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf00116251", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019422208", 
          "https://doi.org/10.1007/bf00116251"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bfb0017012", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020632806", 
          "https://doi.org/10.1007/bfb0017012"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1022699900025", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021865543", 
          "https://doi.org/10.1023/a:1022699900025"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-377-6.50032-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027065619"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-335-6.50039-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027532884"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/09528139008953718", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042251591"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-377-6.50063-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1047602398"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-377-6.50038-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051141133"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-59286-5_81", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052494318", 
          "https://doi.org/10.1007/3-540-59286-5_81"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/b978-1-55860-335-6.50023-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052966378"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1080/01621459.1983.10477973", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1058302834"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.88569", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061157172"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/69.617056", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061213619"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.2307/1403680", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1069473952"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tai.1995.479783", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094009676"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1613/jair.279", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1105538422"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2002-10", 
    "datePublishedReg": "2002-10-01", 
    "description": "Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1023/a:1016304305535", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1041853", 
        "issn": [
          "1384-5810", 
          "1573-756X"
        ], 
        "name": "Data Mining and Knowledge Discovery", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "4", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "6"
      }
    ], 
    "name": "Discretization: An Enabling Technique", 
    "pagination": "393-423", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "f4eeb1a1b7a7e4aeddf7c74e1ad6b22ad43103b23e547477508b48349e506bc1"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1023/a:1016304305535"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1051417652"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1023/a:1016304305535", 
      "https://app.dimensions.ai/details/publication/pub.1051417652"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T20:02", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000537.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1023%2FA%3A1016304305535"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'


 

This table displays all metadata directly associated to this object as RDF triples.

147 TRIPLES      21 PREDICATES      47 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1023/a:1016304305535 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N6737ad5efaf143ceae2b846dcc67b7f7
4 schema:citation sg:pub.10.1007/3-540-59286-5_81
5 sg:pub.10.1007/bf00116251
6 sg:pub.10.1007/bf00994007
7 sg:pub.10.1007/bfb0017012
8 sg:pub.10.1007/bfb0095274
9 sg:pub.10.1023/a:1022631118932
10 sg:pub.10.1023/a:1022699900025
11 https://doi.org/10.1016/b978-1-55860-335-6.50023-4
12 https://doi.org/10.1016/b978-1-55860-335-6.50039-8
13 https://doi.org/10.1016/b978-1-55860-377-6.50032-3
14 https://doi.org/10.1016/b978-1-55860-377-6.50038-4
15 https://doi.org/10.1016/b978-1-55860-377-6.50063-3
16 https://doi.org/10.1080/01621459.1983.10477973
17 https://doi.org/10.1080/09528139008953718
18 https://doi.org/10.1109/34.88569
19 https://doi.org/10.1109/69.617056
20 https://doi.org/10.1109/tai.1995.479783
21 https://doi.org/10.1145/180139.181016
22 https://doi.org/10.1613/jair.279
23 https://doi.org/10.2307/1403680
24 schema:datePublished 2002-10
25 schema:datePublishedReg 2002-10-01
26 schema:description Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.
27 schema:genre research_article
28 schema:inLanguage en
29 schema:isAccessibleForFree false
30 schema:isPartOf N320eabaa87b64960b02f18918542a7ef
31 N3af6a3d894d5492783e89d8d02eb7874
32 sg:journal.1041853
33 schema:name Discretization: An Enabling Technique
34 schema:pagination 393-423
35 schema:productId N2e0413764b54482f8e9edc6af2f8232b
36 Na43f4975c12f4764ac8a3a7c997181d4
37 Nd4c52fc6ba244bac9f3c017c1999ac5f
38 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051417652
39 https://doi.org/10.1023/a:1016304305535
40 schema:sdDatePublished 2019-04-10T20:02
41 schema:sdLicense https://scigraph.springernature.com/explorer/license/
42 schema:sdPublisher N65367849c2bf4e64b211324c0e9351b3
43 schema:url http://link.springer.com/10.1023%2FA%3A1016304305535
44 sgo:license sg:explorer/license/
45 sgo:sdDataset articles
46 rdf:type schema:ScholarlyArticle
47 N2a031ba01bdb4e94ac5acbc79bee20b5 rdf:first sg:person.012062261630.38
48 rdf:rest rdf:nil
49 N2e0413764b54482f8e9edc6af2f8232b schema:name doi
50 schema:value 10.1023/a:1016304305535
51 rdf:type schema:PropertyValue
52 N320eabaa87b64960b02f18918542a7ef schema:issueNumber 4
53 rdf:type schema:PublicationIssue
54 N39519bc9eeb04b6f8a89f112fb3eccb5 rdf:first sg:person.01120373366.14
55 rdf:rest N2a031ba01bdb4e94ac5acbc79bee20b5
56 N3af6a3d894d5492783e89d8d02eb7874 schema:volumeNumber 6
57 rdf:type schema:PublicationVolume
58 N3d637b3dea9a471fa80b4067e2ef0ab6 rdf:first N442410d9ef124c1ab46bb1a4aba67918
59 rdf:rest N39519bc9eeb04b6f8a89f112fb3eccb5
60 N442410d9ef124c1ab46bb1a4aba67918 schema:affiliation https://www.grid.ac/institutes/grid.4280.e
61 schema:familyName Hussain
62 schema:givenName Farhad
63 rdf:type schema:Person
64 N5aea68ed8a6e47288e9b22eda9483e67 schema:affiliation https://www.grid.ac/institutes/grid.4280.e
65 schema:familyName Liu
66 schema:givenName Huan
67 rdf:type schema:Person
68 N65367849c2bf4e64b211324c0e9351b3 schema:name Springer Nature - SN SciGraph project
69 rdf:type schema:Organization
70 N6737ad5efaf143ceae2b846dcc67b7f7 rdf:first N5aea68ed8a6e47288e9b22eda9483e67
71 rdf:rest N3d637b3dea9a471fa80b4067e2ef0ab6
72 Na43f4975c12f4764ac8a3a7c997181d4 schema:name dimensions_id
73 schema:value pub.1051417652
74 rdf:type schema:PropertyValue
75 Nd4c52fc6ba244bac9f3c017c1999ac5f schema:name readcube_id
76 schema:value f4eeb1a1b7a7e4aeddf7c74e1ad6b22ad43103b23e547477508b48349e506bc1
77 rdf:type schema:PropertyValue
78 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
79 schema:name Information and Computing Sciences
80 rdf:type schema:DefinedTerm
81 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
82 schema:name Artificial Intelligence and Image Processing
83 rdf:type schema:DefinedTerm
84 sg:journal.1041853 schema:issn 1384-5810
85 1573-756X
86 schema:name Data Mining and Knowledge Discovery
87 rdf:type schema:Periodical
88 sg:person.01120373366.14 schema:affiliation https://www.grid.ac/institutes/grid.4280.e
89 schema:familyName Tan
90 schema:givenName Chew Lim
91 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01120373366.14
92 rdf:type schema:Person
93 sg:person.012062261630.38 schema:affiliation https://www.grid.ac/institutes/grid.4280.e
94 schema:familyName Dash
95 schema:givenName Manoranjan
96 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012062261630.38
97 rdf:type schema:Person
98 sg:pub.10.1007/3-540-59286-5_81 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052494318
99 https://doi.org/10.1007/3-540-59286-5_81
100 rdf:type schema:CreativeWork
101 sg:pub.10.1007/bf00116251 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019422208
102 https://doi.org/10.1007/bf00116251
103 rdf:type schema:CreativeWork
104 sg:pub.10.1007/bf00994007 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018294482
105 https://doi.org/10.1007/bf00994007
106 rdf:type schema:CreativeWork
107 sg:pub.10.1007/bfb0017012 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020632806
108 https://doi.org/10.1007/bfb0017012
109 rdf:type schema:CreativeWork
110 sg:pub.10.1007/bfb0095274 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002863066
111 https://doi.org/10.1007/bfb0095274
112 rdf:type schema:CreativeWork
113 sg:pub.10.1023/a:1022631118932 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006996698
114 https://doi.org/10.1023/a:1022631118932
115 rdf:type schema:CreativeWork
116 sg:pub.10.1023/a:1022699900025 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021865543
117 https://doi.org/10.1023/a:1022699900025
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1016/b978-1-55860-335-6.50023-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052966378
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1016/b978-1-55860-335-6.50039-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027532884
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1016/b978-1-55860-377-6.50032-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027065619
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1016/b978-1-55860-377-6.50038-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051141133
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1016/b978-1-55860-377-6.50063-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047602398
128 rdf:type schema:CreativeWork
129 https://doi.org/10.1080/01621459.1983.10477973 schema:sameAs https://app.dimensions.ai/details/publication/pub.1058302834
130 rdf:type schema:CreativeWork
131 https://doi.org/10.1080/09528139008953718 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042251591
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1109/34.88569 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061157172
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1109/69.617056 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061213619
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1109/tai.1995.479783 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094009676
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1145/180139.181016 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007088478
140 rdf:type schema:CreativeWork
141 https://doi.org/10.1613/jair.279 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105538422
142 rdf:type schema:CreativeWork
143 https://doi.org/10.2307/1403680 schema:sameAs https://app.dimensions.ai/details/publication/pub.1069473952
144 rdf:type schema:CreativeWork
145 https://www.grid.ac/institutes/grid.4280.e schema:alternateName National University of Singapore
146 schema:name School of Computing, National University of Singapore, Singapore
147 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...