Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2005

AUTHORS

E. Montañés , E. F. Combarro , I. Díaz , J. Ranilla

ABSTRACT

Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem. More... »

PAGES

239-248

Book

TITLE

Advances in Intelligent Data Analysis VI

ISBN

978-3-540-28795-7
978-3-540-31926-9

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/11552253_22

DOI

http://dx.doi.org/10.1007/11552253_22

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1002592515


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Monta\u00f1\u00e9s", 
        "givenName": "E.", 
        "id": "sg:person.011600442422.98", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Combarro", 
        "givenName": "E. F.", 
        "id": "sg:person.014120426453.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "D\u00edaz", 
        "givenName": "I.", 
        "id": "sg:person.010242453671.42", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ranilla", 
        "givenName": "J.", 
        "id": "sg:person.011017130042.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/312624.312647", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000656518"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-349-17741-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017760680", 
          "https://doi.org/10.1007/978-1-349-17741-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-1-349-17741-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017760680", 
          "https://doi.org/10.1007/978-1-349-17741-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/183422.183423", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021178021"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/505282.505283", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023316280"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1015330.1015388", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028749713"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1108/eb046814", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037275209"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bfb0026683", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051853845", 
          "https://doi.org/10.1007/bfb0026683"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/mis.2005.49", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061405828"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2005", 
    "datePublishedReg": "2005-01-01", 
    "description": "Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.", 
    "editor": [
      {
        "familyName": "Famili", 
        "givenName": "A. Fazel", 
        "type": "Person"
      }, 
      {
        "familyName": "Kok", 
        "givenName": "Joost N.", 
        "type": "Person"
      }, 
      {
        "familyName": "Pe\u00f1a", 
        "givenName": "Jos\u00e9 M.", 
        "type": "Person"
      }, 
      {
        "familyName": "Siebes", 
        "givenName": "Arno", 
        "type": "Person"
      }, 
      {
        "familyName": "Feelders", 
        "givenName": "Ad", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/11552253_22", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-28795-7", 
        "978-3-540-31926-9"
      ], 
      "name": "Advances in Intelligent Data Analysis VI", 
      "type": "Book"
    }, 
    "name": "Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization", 
    "pagination": "239-248", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1002592515"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/11552253_22"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "0c5842762204203aa85581045acfbdda841a6d2667dfc68b683cd300a7b8cd0a"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/11552253_22", 
      "https://app.dimensions.ai/details/publication/pub.1002592515"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T08:07", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000360_0000000360/records_118305_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F11552253_22"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11552253_22'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11552253_22'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11552253_22'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11552253_22'


 

This table displays all metadata directly associated to this object as RDF triples.

132 TRIPLES      23 PREDICATES      35 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/11552253_22 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nc880c7b01db24235a15cdae5371595f4
4 schema:citation sg:pub.10.1007/978-1-349-17741-7
5 sg:pub.10.1007/bfb0026683
6 https://doi.org/10.1108/eb046814
7 https://doi.org/10.1109/mis.2005.49
8 https://doi.org/10.1145/1015330.1015388
9 https://doi.org/10.1145/183422.183423
10 https://doi.org/10.1145/312624.312647
11 https://doi.org/10.1145/505282.505283
12 schema:datePublished 2005
13 schema:datePublishedReg 2005-01-01
14 schema:description Text Categorization (TC) is an important issue within Information Retrieval (IR). Feature Selection (FS) becomes a crucial task, because of the presence of irrelevant features causing a loss in the performance. FS is usually performed selecting the features with highest score according to certain measures. However, the disadvantage of these approaches is that they need to determine in advance the number of features that are selected, commonly defined by the percentage of words removed, which is called Filtering Level (FL). In view of that, it is usual to carry out a set of experiments manually taking several FLs representing all possible ones. This process does not guarantee that any of the FLs chosen are the optimal ones, even not an approximation. This paper deals with overcoming this difficulty proposing a method that automatically determines optimal FLs by means of solving a univariate maximization problem.
15 schema:editor Nbb2d510501fc42a585357ce70e3468ac
16 schema:genre chapter
17 schema:inLanguage en
18 schema:isAccessibleForFree false
19 schema:isPartOf Na8b43deb667f4cab8b9cbbb1ddb505f7
20 schema:name Towards Automatic and Optimal Filtering Levels for Feature Selection in Text Categorization
21 schema:pagination 239-248
22 schema:productId N2f8864b0d98e465d82365a72cfa3447b
23 N5033fdcb5c364771b7019aa06f5e3a1b
24 Ne29768b1a1464b00b62a160a3183c596
25 schema:publisher Nac06a1788f6c4d598504e59349e1066f
26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002592515
27 https://doi.org/10.1007/11552253_22
28 schema:sdDatePublished 2019-04-16T08:07
29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
30 schema:sdPublisher Nd143a01f883040f28e3e508a077c3be0
31 schema:url https://link.springer.com/10.1007%2F11552253_22
32 sgo:license sg:explorer/license/
33 sgo:sdDataset chapters
34 rdf:type schema:Chapter
35 N109f98fe28a8472bb156aa19e9b98d42 schema:familyName Famili
36 schema:givenName A. Fazel
37 rdf:type schema:Person
38 N2f8864b0d98e465d82365a72cfa3447b schema:name doi
39 schema:value 10.1007/11552253_22
40 rdf:type schema:PropertyValue
41 N3577690592b746caa200b0f4c4b1291f schema:familyName Kok
42 schema:givenName Joost N.
43 rdf:type schema:Person
44 N3b9e7cbbe16c42f08997e2b6fabf5bd1 schema:familyName Siebes
45 schema:givenName Arno
46 rdf:type schema:Person
47 N5033fdcb5c364771b7019aa06f5e3a1b schema:name readcube_id
48 schema:value 0c5842762204203aa85581045acfbdda841a6d2667dfc68b683cd300a7b8cd0a
49 rdf:type schema:PropertyValue
50 N55fbaf782d0847a49735e6d65481f562 rdf:first sg:person.011017130042.09
51 rdf:rest rdf:nil
52 N6953d30a35dd4a0d938192630d0b3d3c schema:familyName Feelders
53 schema:givenName Ad
54 rdf:type schema:Person
55 N73a854c53efe4f5c86650bdf3c73e93a rdf:first sg:person.010242453671.42
56 rdf:rest N55fbaf782d0847a49735e6d65481f562
57 N9063e81cb8f4454ea8f223f292e69552 rdf:first Nf4b49de63fd441528bc0915a18a144f4
58 rdf:rest Nce4a87176c6e41e5840e8764559e30b0
59 N93be45dbdfff4fed9da93b925a3bfa57 rdf:first N3577690592b746caa200b0f4c4b1291f
60 rdf:rest N9063e81cb8f4454ea8f223f292e69552
61 N997ddf7ab0e4489a82d96df5ff14fcae rdf:first sg:person.014120426453.50
62 rdf:rest N73a854c53efe4f5c86650bdf3c73e93a
63 Na8b43deb667f4cab8b9cbbb1ddb505f7 schema:isbn 978-3-540-28795-7
64 978-3-540-31926-9
65 schema:name Advances in Intelligent Data Analysis VI
66 rdf:type schema:Book
67 Nac06a1788f6c4d598504e59349e1066f schema:location Berlin, Heidelberg
68 schema:name Springer Berlin Heidelberg
69 rdf:type schema:Organisation
70 Nb0aca6c51b1c414bb43d1da1fd8778b9 rdf:first N6953d30a35dd4a0d938192630d0b3d3c
71 rdf:rest rdf:nil
72 Nbb2d510501fc42a585357ce70e3468ac rdf:first N109f98fe28a8472bb156aa19e9b98d42
73 rdf:rest N93be45dbdfff4fed9da93b925a3bfa57
74 Nc880c7b01db24235a15cdae5371595f4 rdf:first sg:person.011600442422.98
75 rdf:rest N997ddf7ab0e4489a82d96df5ff14fcae
76 Nce4a87176c6e41e5840e8764559e30b0 rdf:first N3b9e7cbbe16c42f08997e2b6fabf5bd1
77 rdf:rest Nb0aca6c51b1c414bb43d1da1fd8778b9
78 Nd143a01f883040f28e3e508a077c3be0 schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 Ne29768b1a1464b00b62a160a3183c596 schema:name dimensions_id
81 schema:value pub.1002592515
82 rdf:type schema:PropertyValue
83 Nf4b49de63fd441528bc0915a18a144f4 schema:familyName Peña
84 schema:givenName José M.
85 rdf:type schema:Person
86 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
87 schema:name Information and Computing Sciences
88 rdf:type schema:DefinedTerm
89 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
90 schema:name Artificial Intelligence and Image Processing
91 rdf:type schema:DefinedTerm
92 sg:person.010242453671.42 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
93 schema:familyName Díaz
94 schema:givenName I.
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42
96 rdf:type schema:Person
97 sg:person.011017130042.09 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
98 schema:familyName Ranilla
99 schema:givenName J.
100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09
101 rdf:type schema:Person
102 sg:person.011600442422.98 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
103 schema:familyName Montañés
104 schema:givenName E.
105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98
106 rdf:type schema:Person
107 sg:person.014120426453.50 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
108 schema:familyName Combarro
109 schema:givenName E. F.
110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50
111 rdf:type schema:Person
112 sg:pub.10.1007/978-1-349-17741-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017760680
113 https://doi.org/10.1007/978-1-349-17741-7
114 rdf:type schema:CreativeWork
115 sg:pub.10.1007/bfb0026683 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051853845
116 https://doi.org/10.1007/bfb0026683
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1108/eb046814 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037275209
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1109/mis.2005.49 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061405828
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1145/1015330.1015388 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028749713
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1145/183422.183423 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021178021
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1145/312624.312647 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000656518
127 rdf:type schema:CreativeWork
128 https://doi.org/10.1145/505282.505283 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023316280
129 rdf:type schema:CreativeWork
130 https://www.grid.ac/institutes/grid.10863.3c schema:alternateName University of Oviedo
131 schema:name Artificial Intelligence Center, University of Oviedo, Spain
132 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...