Text Categorization by a Machine-Learning-Based Term Selection View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2004

AUTHORS

Javier Fernández , Elena Montañés , Irene Díaz , José Ranilla , Elías F. Combarro

ABSTRACT

Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a document by taking into account their linguistic information. These relevant words are obtained by a Machine Learning algorithm which takes manually selected words as training set. With the lexica obtained by this technique Text Categorization is performed by using Support Vector Machines. The results are compared with one of the most used method for term selection (based just on statistical information) and it is found the new method performs better and has the additional advantage of automatically selecting the filtering level. More... »

PAGES

253-262

References to SciGraph publications

Book

TITLE

Database and Expert Systems Applications

ISBN

978-3-540-22936-0
978-3-540-30075-5

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-30075-5_25

DOI

http://dx.doi.org/10.1007/978-3-540-30075-5_25

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030815242


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fern\u00e1ndez", 
        "givenName": "Javier", 
        "id": "sg:person.011220142546.36", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011220142546.36"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Monta\u00f1\u00e9s", 
        "givenName": "Elena", 
        "id": "sg:person.011600442422.98", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "D\u00edaz", 
        "givenName": "Irene", 
        "id": "sg:person.010242453671.42", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ranilla", 
        "givenName": "Jos\u00e9", 
        "id": "sg:person.011017130042.09", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Oviedo", 
          "id": "https://www.grid.ac/institutes/grid.10863.3c", 
          "name": [
            "Artificial Intelligence Center, University of Oviedo, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Combarro", 
        "givenName": "El\u00edas F.", 
        "id": "sg:person.014120426453.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/b978-0-08-050058-4.50007-3", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001305396"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-24687-9_96", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005975224", 
          "https://doi.org/10.1007/978-3-540-24687-9_96"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-24687-9_96", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005975224", 
          "https://doi.org/10.1007/978-3-540-24687-9_96"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1006/ijhc.2002.1002", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1007076379"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0306-4573(81)90029-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009977786"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0306-4573(81)90029-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009977786"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-45486-1_4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015602649", 
          "https://doi.org/10.1007/3-540-45486-1_4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-45486-1_4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015602649", 
          "https://doi.org/10.1007/3-540-45486-1_4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1009976227802", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1049582902", 
          "https://doi.org/10.1023/a:1009976227802"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bfb0026683", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051853845", 
          "https://doi.org/10.1007/bfb0026683"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.21236/ada273556", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091558316"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2004", 
    "datePublishedReg": "2004-01-01", 
    "description": "Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a document by taking into account their linguistic information. These relevant words are obtained by a Machine Learning algorithm which takes manually selected words as training set. With the lexica obtained by this technique Text Categorization is performed by using Support Vector Machines. The results are compared with one of the most used method for term selection (based just on statistical information) and it is found the new method performs better and has the additional advantage of automatically selecting the filtering level.", 
    "editor": [
      {
        "familyName": "Galindo", 
        "givenName": "Fernando", 
        "type": "Person"
      }, 
      {
        "familyName": "Takizawa", 
        "givenName": "Makoto", 
        "type": "Person"
      }, 
      {
        "familyName": "Traunm\u00fcller", 
        "givenName": "Roland", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-540-30075-5_25", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-22936-0", 
        "978-3-540-30075-5"
      ], 
      "name": "Database and Expert Systems Applications", 
      "type": "Book"
    }, 
    "name": "Text Categorization by a Machine-Learning-Based Term Selection", 
    "pagination": "253-262", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030815242"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-540-30075-5_25"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "92ccec62292efc860f1a2a61c944e969902100218c861aaf29a25681770af094"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-540-30075-5_25", 
      "https://app.dimensions.ai/details/publication/pub.1030815242"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T08:39", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000365_0000000365/records_71712_00000001.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-540-30075-5_25"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30075-5_25'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30075-5_25'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30075-5_25'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-30075-5_25'


 

This table displays all metadata directly associated to this object as RDF triples.

131 TRIPLES      23 PREDICATES      35 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-540-30075-5_25 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nd848b2934d884f6ca1974dd248602c82
4 schema:citation sg:pub.10.1007/3-540-45486-1_4
5 sg:pub.10.1007/978-3-540-24687-9_96
6 sg:pub.10.1007/bfb0026683
7 sg:pub.10.1023/a:1009976227802
8 https://doi.org/10.1006/ijhc.2002.1002
9 https://doi.org/10.1016/0306-4573(81)90029-7
10 https://doi.org/10.1016/b978-0-08-050058-4.50007-3
11 https://doi.org/10.21236/ada273556
12 schema:datePublished 2004
13 schema:datePublishedReg 2004-01-01
14 schema:description Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a document by taking into account their linguistic information. These relevant words are obtained by a Machine Learning algorithm which takes manually selected words as training set. With the lexica obtained by this technique Text Categorization is performed by using Support Vector Machines. The results are compared with one of the most used method for term selection (based just on statistical information) and it is found the new method performs better and has the additional advantage of automatically selecting the filtering level.
15 schema:editor N41b98fa2be7c4e86873b840e200a73ef
16 schema:genre chapter
17 schema:inLanguage en
18 schema:isAccessibleForFree false
19 schema:isPartOf N7497171a6542490482876c69d723f731
20 schema:name Text Categorization by a Machine-Learning-Based Term Selection
21 schema:pagination 253-262
22 schema:productId N373c7fa39bc94f8abeb005fbea5b6a17
23 N4ef30a9d7d3e44189db5950667968962
24 Ncc9141a58a5a4690806fbcf554d46732
25 schema:publisher N6764d10655804f92bb8af461bc4de561
26 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030815242
27 https://doi.org/10.1007/978-3-540-30075-5_25
28 schema:sdDatePublished 2019-04-16T08:39
29 schema:sdLicense https://scigraph.springernature.com/explorer/license/
30 schema:sdPublisher N1af7aea8ce264c0f96a4996fb7d5343d
31 schema:url https://link.springer.com/10.1007%2F978-3-540-30075-5_25
32 sgo:license sg:explorer/license/
33 sgo:sdDataset chapters
34 rdf:type schema:Chapter
35 N1889229240e041e0ab9acbbfd4884e33 rdf:first sg:person.011017130042.09
36 rdf:rest Nb49d81fe248e48f79aed486282a9476a
37 N1af7aea8ce264c0f96a4996fb7d5343d schema:name Springer Nature - SN SciGraph project
38 rdf:type schema:Organization
39 N1cf5254fe53f44b6928eda77a2388ec9 schema:familyName Traunmüller
40 schema:givenName Roland
41 rdf:type schema:Person
42 N373c7fa39bc94f8abeb005fbea5b6a17 schema:name dimensions_id
43 schema:value pub.1030815242
44 rdf:type schema:PropertyValue
45 N41b98fa2be7c4e86873b840e200a73ef rdf:first Nd4d01b1da9d34df49a25330a5b5c26b4
46 rdf:rest Naaf0eba4b4814cefbc8fe65fe8bf4612
47 N4ef30a9d7d3e44189db5950667968962 schema:name readcube_id
48 schema:value 92ccec62292efc860f1a2a61c944e969902100218c861aaf29a25681770af094
49 rdf:type schema:PropertyValue
50 N6764d10655804f92bb8af461bc4de561 schema:location Berlin, Heidelberg
51 schema:name Springer Berlin Heidelberg
52 rdf:type schema:Organisation
53 N7497171a6542490482876c69d723f731 schema:isbn 978-3-540-22936-0
54 978-3-540-30075-5
55 schema:name Database and Expert Systems Applications
56 rdf:type schema:Book
57 Naaf0eba4b4814cefbc8fe65fe8bf4612 rdf:first Nb54b6638171c41bc9ec27d2889ee1b75
58 rdf:rest Nfb5db7a95b6f4de586f8d19742e07223
59 Nb49d81fe248e48f79aed486282a9476a rdf:first sg:person.014120426453.50
60 rdf:rest rdf:nil
61 Nb54b6638171c41bc9ec27d2889ee1b75 schema:familyName Takizawa
62 schema:givenName Makoto
63 rdf:type schema:Person
64 Ncc9141a58a5a4690806fbcf554d46732 schema:name doi
65 schema:value 10.1007/978-3-540-30075-5_25
66 rdf:type schema:PropertyValue
67 Nd4d01b1da9d34df49a25330a5b5c26b4 schema:familyName Galindo
68 schema:givenName Fernando
69 rdf:type schema:Person
70 Nd848b2934d884f6ca1974dd248602c82 rdf:first sg:person.011220142546.36
71 rdf:rest Neb4b3ec662c349819a9318363177a76f
72 Ne8f2eb83cfb14f96bf470d5a371fb472 rdf:first sg:person.010242453671.42
73 rdf:rest N1889229240e041e0ab9acbbfd4884e33
74 Neb4b3ec662c349819a9318363177a76f rdf:first sg:person.011600442422.98
75 rdf:rest Ne8f2eb83cfb14f96bf470d5a371fb472
76 Nfb5db7a95b6f4de586f8d19742e07223 rdf:first N1cf5254fe53f44b6928eda77a2388ec9
77 rdf:rest rdf:nil
78 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
79 schema:name Information and Computing Sciences
80 rdf:type schema:DefinedTerm
81 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
82 schema:name Artificial Intelligence and Image Processing
83 rdf:type schema:DefinedTerm
84 sg:person.010242453671.42 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
85 schema:familyName Díaz
86 schema:givenName Irene
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42
88 rdf:type schema:Person
89 sg:person.011017130042.09 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
90 schema:familyName Ranilla
91 schema:givenName José
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09
93 rdf:type schema:Person
94 sg:person.011220142546.36 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
95 schema:familyName Fernández
96 schema:givenName Javier
97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011220142546.36
98 rdf:type schema:Person
99 sg:person.011600442422.98 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
100 schema:familyName Montañés
101 schema:givenName Elena
102 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98
103 rdf:type schema:Person
104 sg:person.014120426453.50 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
105 schema:familyName Combarro
106 schema:givenName Elías F.
107 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50
108 rdf:type schema:Person
109 sg:pub.10.1007/3-540-45486-1_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015602649
110 https://doi.org/10.1007/3-540-45486-1_4
111 rdf:type schema:CreativeWork
112 sg:pub.10.1007/978-3-540-24687-9_96 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005975224
113 https://doi.org/10.1007/978-3-540-24687-9_96
114 rdf:type schema:CreativeWork
115 sg:pub.10.1007/bfb0026683 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051853845
116 https://doi.org/10.1007/bfb0026683
117 rdf:type schema:CreativeWork
118 sg:pub.10.1023/a:1009976227802 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049582902
119 https://doi.org/10.1023/a:1009976227802
120 rdf:type schema:CreativeWork
121 https://doi.org/10.1006/ijhc.2002.1002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007076379
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1016/0306-4573(81)90029-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009977786
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1016/b978-0-08-050058-4.50007-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001305396
126 rdf:type schema:CreativeWork
127 https://doi.org/10.21236/ada273556 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091558316
128 rdf:type schema:CreativeWork
129 https://www.grid.ac/institutes/grid.10863.3c schema:alternateName University of Oviedo
130 schema:name Artificial Intelligence Center, University of Oviedo, Spain
131 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...