2002-10
AUTHORSHuan Liu, Farhad Hussain, Chew Lim Tan, Manoranjan Dash
ABSTRACTDiscrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization. More... »
PAGES393-423
http://scigraph.springernature.com/pub.10.1023/a:1016304305535
DOIhttp://dx.doi.org/10.1023/a:1016304305535
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1051417652
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Artificial Intelligence and Image Processing",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "National University of Singapore",
"id": "https://www.grid.ac/institutes/grid.4280.e",
"name": [
"School of Computing, National University of Singapore, Singapore"
],
"type": "Organization"
},
"familyName": "Liu",
"givenName": "Huan",
"type": "Person"
},
{
"affiliation": {
"alternateName": "National University of Singapore",
"id": "https://www.grid.ac/institutes/grid.4280.e",
"name": [
"School of Computing, National University of Singapore, Singapore"
],
"type": "Organization"
},
"familyName": "Hussain",
"givenName": "Farhad",
"type": "Person"
},
{
"affiliation": {
"alternateName": "National University of Singapore",
"id": "https://www.grid.ac/institutes/grid.4280.e",
"name": [
"School of Computing, National University of Singapore, Singapore"
],
"type": "Organization"
},
"familyName": "Tan",
"givenName": "Chew Lim",
"id": "sg:person.01120373366.14",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01120373366.14"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "National University of Singapore",
"id": "https://www.grid.ac/institutes/grid.4280.e",
"name": [
"School of Computing, National University of Singapore, Singapore"
],
"type": "Organization"
},
"familyName": "Dash",
"givenName": "Manoranjan",
"id": "sg:person.012062261630.38",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012062261630.38"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1007/bfb0095274",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002863066",
"https://doi.org/10.1007/bfb0095274"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1022631118932",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1006996698",
"https://doi.org/10.1023/a:1022631118932"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1145/180139.181016",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1007088478"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/bf00994007",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1018294482",
"https://doi.org/10.1007/bf00994007"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/bf00116251",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1019422208",
"https://doi.org/10.1007/bf00116251"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/bfb0017012",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020632806",
"https://doi.org/10.1007/bfb0017012"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1023/a:1022699900025",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1021865543",
"https://doi.org/10.1023/a:1022699900025"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/b978-1-55860-377-6.50032-3",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027065619"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/b978-1-55860-335-6.50039-8",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1027532884"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1080/09528139008953718",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1042251591"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/b978-1-55860-377-6.50063-3",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1047602398"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/b978-1-55860-377-6.50038-4",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1051141133"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-59286-5_81",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1052494318",
"https://doi.org/10.1007/3-540-59286-5_81"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1016/b978-1-55860-335-6.50023-4",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1052966378"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1080/01621459.1983.10477973",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1058302834"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/34.88569",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061157172"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/69.617056",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1061213619"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.2307/1403680",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1069473952"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1109/tai.1995.479783",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1094009676"
],
"type": "CreativeWork"
},
{
"id": "https://doi.org/10.1613/jair.279",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1105538422"
],
"type": "CreativeWork"
}
],
"datePublished": "2002-10",
"datePublishedReg": "2002-10-01",
"description": "Discrete values have important roles in data mining and knowledge discovery. They are about intervals of numbers which are more concise to represent and specify, easier to use and comprehend as they are closer to a knowledge-level representation than continuous values. Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable and discretization can lead to improved predictive accuracy. Furthermore, many induction algorithms found in the literature require discrete features. All these prompt researchers and practitioners to discretize continuous features before or during a machine learning or data mining task. There are numerous discretization methods available in the literature. It is time for us to examine these seemingly different methods for discretization and find out how different they really are, what are the key components of a discretization process, how we can improve the current level of research for new development as well as the use of existing methods. This paper aims at a systematic study of discretization methods with their history of development, effect on classification, and trade-off between speed and accuracy. Contributions of this paper are an abstract description summarizing existing discretization methods, a hierarchical framework to categorize the existing methods and pave the way for further development, concise discussions of representative discretization methods, extensive experiments and their analysis, and some guidelines as to how to choose a discretization method under various circumstances. We also identify some issues yet to solve and future research for discretization.",
"genre": "research_article",
"id": "sg:pub.10.1023/a:1016304305535",
"inLanguage": [
"en"
],
"isAccessibleForFree": false,
"isPartOf": [
{
"id": "sg:journal.1041853",
"issn": [
"1384-5810",
"1573-756X"
],
"name": "Data Mining and Knowledge Discovery",
"type": "Periodical"
},
{
"issueNumber": "4",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "6"
}
],
"name": "Discretization: An Enabling Technique",
"pagination": "393-423",
"productId": [
{
"name": "readcube_id",
"type": "PropertyValue",
"value": [
"f4eeb1a1b7a7e4aeddf7c74e1ad6b22ad43103b23e547477508b48349e506bc1"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1023/a:1016304305535"
]
},
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1051417652"
]
}
],
"sameAs": [
"https://doi.org/10.1023/a:1016304305535",
"https://app.dimensions.ai/details/publication/pub.1051417652"
],
"sdDataset": "articles",
"sdDatePublished": "2019-04-10T20:02",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8681_00000537.jsonl",
"type": "ScholarlyArticle",
"url": "http://link.springer.com/10.1023%2FA%3A1016304305535"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1023/a:1016304305535'
This table displays all metadata directly associated to this object as RDF triples.
147 TRIPLES
21 PREDICATES
47 URIs
19 LITERALS
7 BLANK NODES