Prior Data Quality Management in Data Mining Process View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2015

AUTHORS

Mamadou S. Camara , Djasrabe Naguingar , Alassane Bah

ABSTRACT

Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems. More... »

PAGES

299-307

References to SciGraph publications

Book

TITLE

New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering

ISBN

978-3-319-06763-6
978-3-319-06764-3

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-06764-3_37

DOI

http://dx.doi.org/10.1007/978-3-319-06764-3_37

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1048771278


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Cheikh Anta Diop University", 
          "id": "https://www.grid.ac/institutes/grid.8191.1", 
          "name": [
            "Laboratoire d\u2019Informatique, R\u00e9seaux et T\u00e9l\u00e9coms (LIRT), Ecole Sup\u00e9rieure Polytechnique, Universit\u00e9 Cheikh Anta Diop de Dakar, BP 5085\u00a0dakar-fann, Dakar, Senegal"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Camara", 
        "givenName": "Mamadou S.", 
        "id": "sg:person.07364533341.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07364533341.41"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Cheikh Anta Diop University", 
          "id": "https://www.grid.ac/institutes/grid.8191.1", 
          "name": [
            "Laboratoire d\u2019Imagerie M\u00e9dicale et de BioInformatique (LIMBI), Ecole Sup\u00e9rieure Polytechnique, Universit\u00e9 Cheikh Anta Diop de Dakar, BP 5085\u00a0dakar-fann, Dakar, Senegal"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Naguingar", 
        "givenName": "Djasrabe", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Cheikh Anta Diop University", 
          "id": "https://www.grid.ac/institutes/grid.8191.1", 
          "name": [
            "UMI 209, UMMISCO - UCAD, Ecole Sup\u00e9rieure Polytechnique, Universit\u00e9 Cheikh Anta Diop de Dakar, BP 15915\u00a0Dakar-Fann, Senegal"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bah", 
        "givenName": "Alassane", 
        "id": "sg:person.0720633131.58", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0720633131.58"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/j.datak.2007.06.020", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001547036"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.is.2008.04.003", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003686169"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.apm.2012.11.015", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003872549"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/303976.303983", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009681559"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.eswa.2012.02.044", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019974423"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.datak.2012.04.002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020495800"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/331499.331504", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026347712"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.jom.2005.03.001", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027127788"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1147376.1147391", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1027278261"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf02925480", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028202005", 
          "https://doi.org/10.1007/bf02925480"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.datak.2009.08.008", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032807948"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.patrec.2011.07.002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035943831"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.artint.2008.07.004", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046048643"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-94-015-3994-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053674345", 
          "https://doi.org/10.1007/978-94-015-3994-4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-94-015-3994-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053674345", 
          "https://doi.org/10.1007/978-94-015-3994-4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1063/1.2995737", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1057891654"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.4018/978-1-59904-387-6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1096031270"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2015", 
    "datePublishedReg": "2015-01-01", 
    "description": "Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.", 
    "editor": [
      {
        "familyName": "Elleithy", 
        "givenName": "Khaled", 
        "type": "Person"
      }, 
      {
        "familyName": "Sobh", 
        "givenName": "Tarek", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-06764-3_37", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-319-06763-6", 
        "978-3-319-06764-3"
      ], 
      "name": "New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering", 
      "type": "Book"
    }, 
    "name": "Prior Data Quality Management in Data Mining Process", 
    "pagination": "299-307", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-06764-3_37"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "e8564aa86890eb48b4b90cd8e9000ff4870f0c4f79bf22bc6bef9fbb7cb49189"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1048771278"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-06764-3_37", 
      "https://app.dimensions.ai/details/publication/pub.1048771278"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T20:08", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8687_00000273.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/978-3-319-06764-3_37"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-06764-3_37'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-06764-3_37'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-06764-3_37'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-06764-3_37'


 

This table displays all metadata directly associated to this object as RDF triples.

135 TRIPLES      23 PREDICATES      43 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-06764-3_37 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N328f0585f3ea47c7add43c04142e4c87
4 schema:citation sg:pub.10.1007/978-94-015-3994-4
5 sg:pub.10.1007/bf02925480
6 https://doi.org/10.1016/j.apm.2012.11.015
7 https://doi.org/10.1016/j.artint.2008.07.004
8 https://doi.org/10.1016/j.datak.2007.06.020
9 https://doi.org/10.1016/j.datak.2009.08.008
10 https://doi.org/10.1016/j.datak.2012.04.002
11 https://doi.org/10.1016/j.eswa.2012.02.044
12 https://doi.org/10.1016/j.is.2008.04.003
13 https://doi.org/10.1016/j.jom.2005.03.001
14 https://doi.org/10.1016/j.patrec.2011.07.002
15 https://doi.org/10.1063/1.2995737
16 https://doi.org/10.1145/1147376.1147391
17 https://doi.org/10.1145/303976.303983
18 https://doi.org/10.1145/331499.331504
19 https://doi.org/10.4018/978-1-59904-387-6
20 schema:datePublished 2015
21 schema:datePublishedReg 2015-01-01
22 schema:description Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.
23 schema:editor Nef5e0a10b3be4372846d5ccacb9b39d5
24 schema:genre chapter
25 schema:inLanguage en
26 schema:isAccessibleForFree false
27 schema:isPartOf N4237b6c114ca49ae9608897dcb237dc0
28 schema:name Prior Data Quality Management in Data Mining Process
29 schema:pagination 299-307
30 schema:productId N043281d51a674c248825cc6fa88ebbc6
31 N670b7853b9cd42738fc8dde6dea5accf
32 Nd53c7972ec1347c38b962dd00ba674b0
33 schema:publisher N49727044fc9e44ef8bc25e6b96d7f9ae
34 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048771278
35 https://doi.org/10.1007/978-3-319-06764-3_37
36 schema:sdDatePublished 2019-04-15T20:08
37 schema:sdLicense https://scigraph.springernature.com/explorer/license/
38 schema:sdPublisher Nc2ca1b6a730e47b9aaf2863bd533df42
39 schema:url http://link.springer.com/10.1007/978-3-319-06764-3_37
40 sgo:license sg:explorer/license/
41 sgo:sdDataset chapters
42 rdf:type schema:Chapter
43 N043281d51a674c248825cc6fa88ebbc6 schema:name doi
44 schema:value 10.1007/978-3-319-06764-3_37
45 rdf:type schema:PropertyValue
46 N328f0585f3ea47c7add43c04142e4c87 rdf:first sg:person.07364533341.41
47 rdf:rest N7ce5f74e6b444232a9ca191551b30d32
48 N4237b6c114ca49ae9608897dcb237dc0 schema:isbn 978-3-319-06763-6
49 978-3-319-06764-3
50 schema:name New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering
51 rdf:type schema:Book
52 N49727044fc9e44ef8bc25e6b96d7f9ae schema:location Cham
53 schema:name Springer International Publishing
54 rdf:type schema:Organisation
55 N52c706dcafa04cab8974ab4e907ea112 schema:affiliation https://www.grid.ac/institutes/grid.8191.1
56 schema:familyName Naguingar
57 schema:givenName Djasrabe
58 rdf:type schema:Person
59 N609ec7570afa4cb997ae4101941c59e7 schema:familyName Sobh
60 schema:givenName Tarek
61 rdf:type schema:Person
62 N670b7853b9cd42738fc8dde6dea5accf schema:name dimensions_id
63 schema:value pub.1048771278
64 rdf:type schema:PropertyValue
65 N7ce5f74e6b444232a9ca191551b30d32 rdf:first N52c706dcafa04cab8974ab4e907ea112
66 rdf:rest Nc3d6dbce2e564f5b8e043976a2d5dead
67 Nbd146fca3c2945cda4d0ce7d9848d1f7 schema:familyName Elleithy
68 schema:givenName Khaled
69 rdf:type schema:Person
70 Nc2ca1b6a730e47b9aaf2863bd533df42 schema:name Springer Nature - SN SciGraph project
71 rdf:type schema:Organization
72 Nc3d6dbce2e564f5b8e043976a2d5dead rdf:first sg:person.0720633131.58
73 rdf:rest rdf:nil
74 Nd53c7972ec1347c38b962dd00ba674b0 schema:name readcube_id
75 schema:value e8564aa86890eb48b4b90cd8e9000ff4870f0c4f79bf22bc6bef9fbb7cb49189
76 rdf:type schema:PropertyValue
77 Nef10db3aae9946438cc7fa850be91960 rdf:first N609ec7570afa4cb997ae4101941c59e7
78 rdf:rest rdf:nil
79 Nef5e0a10b3be4372846d5ccacb9b39d5 rdf:first Nbd146fca3c2945cda4d0ce7d9848d1f7
80 rdf:rest Nef10db3aae9946438cc7fa850be91960
81 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
82 schema:name Information and Computing Sciences
83 rdf:type schema:DefinedTerm
84 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
85 schema:name Information Systems
86 rdf:type schema:DefinedTerm
87 sg:person.0720633131.58 schema:affiliation https://www.grid.ac/institutes/grid.8191.1
88 schema:familyName Bah
89 schema:givenName Alassane
90 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0720633131.58
91 rdf:type schema:Person
92 sg:person.07364533341.41 schema:affiliation https://www.grid.ac/institutes/grid.8191.1
93 schema:familyName Camara
94 schema:givenName Mamadou S.
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07364533341.41
96 rdf:type schema:Person
97 sg:pub.10.1007/978-94-015-3994-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053674345
98 https://doi.org/10.1007/978-94-015-3994-4
99 rdf:type schema:CreativeWork
100 sg:pub.10.1007/bf02925480 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028202005
101 https://doi.org/10.1007/bf02925480
102 rdf:type schema:CreativeWork
103 https://doi.org/10.1016/j.apm.2012.11.015 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003872549
104 rdf:type schema:CreativeWork
105 https://doi.org/10.1016/j.artint.2008.07.004 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046048643
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1016/j.datak.2007.06.020 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001547036
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1016/j.datak.2009.08.008 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032807948
110 rdf:type schema:CreativeWork
111 https://doi.org/10.1016/j.datak.2012.04.002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020495800
112 rdf:type schema:CreativeWork
113 https://doi.org/10.1016/j.eswa.2012.02.044 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019974423
114 rdf:type schema:CreativeWork
115 https://doi.org/10.1016/j.is.2008.04.003 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003686169
116 rdf:type schema:CreativeWork
117 https://doi.org/10.1016/j.jom.2005.03.001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027127788
118 rdf:type schema:CreativeWork
119 https://doi.org/10.1016/j.patrec.2011.07.002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035943831
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1063/1.2995737 schema:sameAs https://app.dimensions.ai/details/publication/pub.1057891654
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1145/1147376.1147391 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027278261
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1145/303976.303983 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009681559
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1145/331499.331504 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026347712
128 rdf:type schema:CreativeWork
129 https://doi.org/10.4018/978-1-59904-387-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1096031270
130 rdf:type schema:CreativeWork
131 https://www.grid.ac/institutes/grid.8191.1 schema:alternateName Cheikh Anta Diop University
132 schema:name Laboratoire d’Imagerie Médicale et de BioInformatique (LIMBI), Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 5085 dakar-fann, Dakar, Senegal
133 Laboratoire d’Informatique, Réseaux et Télécoms (LIRT), Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 5085 dakar-fann, Dakar, Senegal
134 UMI 209, UMMISCO - UCAD, Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 15915 Dakar-Fann, Senegal
135 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...