A Comparative Study of Classification Based Personal E-mail Filtering View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2000

AUTHORS

Yanlei Diao , Hongjun Lu , Dekai Wu

ABSTRACT

This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust. More... »

PAGES

408-419

Book

TITLE

Knowledge Discovery and Data Mining. Current Issues and New Applications

ISBN

978-3-540-67382-8
978-3-540-45571-4

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-45571-x_48

DOI

http://dx.doi.org/10.1007/3-540-45571-x_48

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1025259101


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Hong Kong University of Science and Technology", 
          "id": "https://www.grid.ac/institutes/grid.24515.37", 
          "name": [
            "Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Diao", 
        "givenName": "Yanlei", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hong Kong University of Science and Technology", 
          "id": "https://www.grid.ac/institutes/grid.24515.37", 
          "name": [
            "Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Lu", 
        "givenName": "Hongjun", 
        "id": "sg:person.012457171273.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012457171273.13"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Hong Kong University of Science and Technology", 
          "id": "https://www.grid.ac/institutes/grid.24515.37", 
          "name": [
            "Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wu", 
        "givenName": "Dekai", 
        "id": "sg:person.012657745057.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012657745057.48"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s0306-4573(96)00063-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032831757"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2000", 
    "datePublishedReg": "2000-01-01", 
    "description": "This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.", 
    "editor": [
      {
        "familyName": "Terano", 
        "givenName": "Takao", 
        "type": "Person"
      }, 
      {
        "familyName": "Liu", 
        "givenName": "Huan", 
        "type": "Person"
      }, 
      {
        "familyName": "Chen", 
        "givenName": "Arbee L. P.", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-45571-x_48", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-540-67382-8", 
        "978-3-540-45571-4"
      ], 
      "name": "Knowledge Discovery and Data Mining. Current Issues and New Applications", 
      "type": "Book"
    }, 
    "name": "A Comparative Study of Classification Based Personal E-mail Filtering", 
    "pagination": "408-419", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-45571-x_48"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "78558331e45c97960d8133ef909feaa3166631d5339081194b2448ce80844c28"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1025259101"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-45571-x_48", 
      "https://app.dimensions.ai/details/publication/pub.1025259101"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-15T23:52", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8697_00000258.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-45571-X_48"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-45571-x_48'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-45571-x_48'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-45571-x_48'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-45571-x_48'


 

This table displays all metadata directly associated to this object as RDF triples.

91 TRIPLES      23 PREDICATES      28 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-45571-x_48 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N2f6519d3490d4f79975b92bd4425b06e
4 schema:citation https://doi.org/10.1016/s0306-4573(96)00063-5
5 schema:datePublished 2000
6 schema:datePublishedReg 2000-01-01
7 schema:description This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.
8 schema:editor Na154a5f48f87403facef8cd011e1c5ae
9 schema:genre chapter
10 schema:inLanguage en
11 schema:isAccessibleForFree true
12 schema:isPartOf Ne82632fc5a414729a68157c56351ec99
13 schema:name A Comparative Study of Classification Based Personal E-mail Filtering
14 schema:pagination 408-419
15 schema:productId N6c496e8cd2734bc0a12feaa65c21ec19
16 N84036cb7330f4ce5a49d4b664ec5aa78
17 N847c53f56bdb4cb3979adb1233140a4b
18 schema:publisher N8e467d02db5f414e838a83dbd332689a
19 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025259101
20 https://doi.org/10.1007/3-540-45571-x_48
21 schema:sdDatePublished 2019-04-15T23:52
22 schema:sdLicense https://scigraph.springernature.com/explorer/license/
23 schema:sdPublisher Ncc5b8ee367b64d73ba714ffae77644b9
24 schema:url http://link.springer.com/10.1007/3-540-45571-X_48
25 sgo:license sg:explorer/license/
26 sgo:sdDataset chapters
27 rdf:type schema:Chapter
28 N0135c33030be4a81bcd8a061de6e1383 rdf:first Nf2f82f2e126d403398aba638246105c8
29 rdf:rest rdf:nil
30 N2f6519d3490d4f79975b92bd4425b06e rdf:first Ne02b52e5c93f426593c7ac77cdccb37d
31 rdf:rest N66dbc612f62c4340b7aeb573f4edf563
32 N356e9d88fcb0413d8cf8679340918194 rdf:first Nd41f6529ac8b4b6e9d4710afaa382b31
33 rdf:rest N0135c33030be4a81bcd8a061de6e1383
34 N66dbc612f62c4340b7aeb573f4edf563 rdf:first sg:person.012457171273.13
35 rdf:rest Nfa796346bf6042038d6cd2367ea8de13
36 N6c496e8cd2734bc0a12feaa65c21ec19 schema:name dimensions_id
37 schema:value pub.1025259101
38 rdf:type schema:PropertyValue
39 N84036cb7330f4ce5a49d4b664ec5aa78 schema:name doi
40 schema:value 10.1007/3-540-45571-x_48
41 rdf:type schema:PropertyValue
42 N847c53f56bdb4cb3979adb1233140a4b schema:name readcube_id
43 schema:value 78558331e45c97960d8133ef909feaa3166631d5339081194b2448ce80844c28
44 rdf:type schema:PropertyValue
45 N8e467d02db5f414e838a83dbd332689a schema:location Berlin, Heidelberg
46 schema:name Springer Berlin Heidelberg
47 rdf:type schema:Organisation
48 Na154a5f48f87403facef8cd011e1c5ae rdf:first Ndf6a820e27e647ab8e3448e38ba48bcb
49 rdf:rest N356e9d88fcb0413d8cf8679340918194
50 Ncc5b8ee367b64d73ba714ffae77644b9 schema:name Springer Nature - SN SciGraph project
51 rdf:type schema:Organization
52 Nd41f6529ac8b4b6e9d4710afaa382b31 schema:familyName Liu
53 schema:givenName Huan
54 rdf:type schema:Person
55 Ndf6a820e27e647ab8e3448e38ba48bcb schema:familyName Terano
56 schema:givenName Takao
57 rdf:type schema:Person
58 Ne02b52e5c93f426593c7ac77cdccb37d schema:affiliation https://www.grid.ac/institutes/grid.24515.37
59 schema:familyName Diao
60 schema:givenName Yanlei
61 rdf:type schema:Person
62 Ne82632fc5a414729a68157c56351ec99 schema:isbn 978-3-540-45571-4
63 978-3-540-67382-8
64 schema:name Knowledge Discovery and Data Mining. Current Issues and New Applications
65 rdf:type schema:Book
66 Nf2f82f2e126d403398aba638246105c8 schema:familyName Chen
67 schema:givenName Arbee L. P.
68 rdf:type schema:Person
69 Nfa796346bf6042038d6cd2367ea8de13 rdf:first sg:person.012657745057.48
70 rdf:rest rdf:nil
71 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
72 schema:name Information and Computing Sciences
73 rdf:type schema:DefinedTerm
74 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
75 schema:name Artificial Intelligence and Image Processing
76 rdf:type schema:DefinedTerm
77 sg:person.012457171273.13 schema:affiliation https://www.grid.ac/institutes/grid.24515.37
78 schema:familyName Lu
79 schema:givenName Hongjun
80 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012457171273.13
81 rdf:type schema:Person
82 sg:person.012657745057.48 schema:affiliation https://www.grid.ac/institutes/grid.24515.37
83 schema:familyName Wu
84 schema:givenName Dekai
85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012657745057.48
86 rdf:type schema:Person
87 https://doi.org/10.1016/s0306-4573(96)00063-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032831757
88 rdf:type schema:CreativeWork
89 https://www.grid.ac/institutes/grid.24515.37 schema:alternateName Hong Kong University of Science and Technology
90 schema:name Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
91 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...