Extraction of Document Descriptive Terms with a Linguistic-Based Machine Learning Approach View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2004

AUTHORS

Javier Fernández , Elena Montañés , Irene Díaz , José Ranilla , Elías F. Combarro

ABSTRACT

In this paper we present a method for extracting relevant words from a document taking into account linguistic information. Form a set of words manually labelled as relevant or not and with the aid of a Machine Learning algorithm, we build a classifier which is able to decide which words from a unseen document must be regarded as relevant. This system is compared with some classical methods (based just on statistical information). More... »

PAGES

658-661

References to SciGraph publications

  • 2000-05-19. Using Noun Phrase Heads to Extract Document Keyphrases in ADVANCES IN ARTIFICAL INTELLIGENCE
  • 2000-05. Learning Algorithms for Keyphrase Extraction in INFORMATION RETRIEVAL JOURNAL
  • Book

    TITLE

    Computational Science - ICCS 2004

    ISBN

    978-3-540-22115-9
    978-3-540-24687-9

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-540-24687-9_96

    DOI

    http://dx.doi.org/10.1007/978-3-540-24687-9_96

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1005975224


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Oviedo", 
              "id": "https://www.grid.ac/institutes/grid.10863.3c", 
              "name": [
                "Artificial Intelligence Center, University of Oviedo, Spain"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Fern\u00e1ndez", 
            "givenName": "Javier", 
            "id": "sg:person.011220142546.36", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011220142546.36"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Oviedo", 
              "id": "https://www.grid.ac/institutes/grid.10863.3c", 
              "name": [
                "Artificial Intelligence Center, University of Oviedo, Spain"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Monta\u00f1\u00e9s", 
            "givenName": "Elena", 
            "id": "sg:person.011600442422.98", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Oviedo", 
              "id": "https://www.grid.ac/institutes/grid.10863.3c", 
              "name": [
                "Artificial Intelligence Center, University of Oviedo, Spain"
              ], 
              "type": "Organization"
            }, 
            "familyName": "D\u00edaz", 
            "givenName": "Irene", 
            "id": "sg:person.010242453671.42", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Oviedo", 
              "id": "https://www.grid.ac/institutes/grid.10863.3c", 
              "name": [
                "Artificial Intelligence Center, University of Oviedo, Spain"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Ranilla", 
            "givenName": "Jos\u00e9", 
            "id": "sg:person.011017130042.09", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Oviedo", 
              "id": "https://www.grid.ac/institutes/grid.10863.3c", 
              "name": [
                "Artificial Intelligence Center, University of Oviedo, Spain"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Combarro", 
            "givenName": "El\u00edas F.", 
            "id": "sg:person.014120426453.50", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/b978-0-08-050058-4.50007-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001305396"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1006/ijhc.2002.1002", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007076379"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45486-1_4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015602649", 
              "https://doi.org/10.1007/3-540-45486-1_4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45486-1_4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015602649", 
              "https://doi.org/10.1007/3-540-45486-1_4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/505282.505283", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1023316280"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1108/eb046814", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037275209"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1023/a:1009976227802", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049582902", 
              "https://doi.org/10.1023/a:1009976227802"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2004", 
        "datePublishedReg": "2004-01-01", 
        "description": "In this paper we present a method for extracting relevant words from a document taking into account linguistic information. Form a set of words manually labelled as relevant or not and with the aid of a Machine Learning algorithm, we build a classifier which is able to decide which words from a unseen document must be regarded as relevant. This system is compared with some classical methods (based just on statistical information).", 
        "editor": [
          {
            "familyName": "Bubak", 
            "givenName": "Marian", 
            "type": "Person"
          }, 
          {
            "familyName": "van Albada", 
            "givenName": "Geert Dick", 
            "type": "Person"
          }, 
          {
            "familyName": "Sloot", 
            "givenName": "Peter M. A.", 
            "type": "Person"
          }, 
          {
            "familyName": "Dongarra", 
            "givenName": "Jack", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-540-24687-9_96", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-22115-9", 
            "978-3-540-24687-9"
          ], 
          "name": "Computational Science - ICCS 2004", 
          "type": "Book"
        }, 
        "name": "Extraction of Document Descriptive Terms with a Linguistic-Based Machine Learning Approach", 
        "pagination": "658-661", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1005975224"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-540-24687-9_96"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "1c76512c7dbd79eac73a8da01ca53a3e2bef2d6057764ef42b0180b6ee0384eb"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-540-24687-9_96", 
          "https://app.dimensions.ai/details/publication/pub.1005975224"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T08:23", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000363_0000000363/records_70040_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-540-24687-9_96"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-24687-9_96'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-24687-9_96'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-24687-9_96'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-24687-9_96'


     

    This table displays all metadata directly associated to this object as RDF triples.

    128 TRIPLES      23 PREDICATES      33 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-540-24687-9_96 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author N244942e243ff4c93a02624837a71d44f
    4 schema:citation sg:pub.10.1007/3-540-45486-1_4
    5 sg:pub.10.1023/a:1009976227802
    6 https://doi.org/10.1006/ijhc.2002.1002
    7 https://doi.org/10.1016/b978-0-08-050058-4.50007-3
    8 https://doi.org/10.1108/eb046814
    9 https://doi.org/10.1145/505282.505283
    10 schema:datePublished 2004
    11 schema:datePublishedReg 2004-01-01
    12 schema:description In this paper we present a method for extracting relevant words from a document taking into account linguistic information. Form a set of words manually labelled as relevant or not and with the aid of a Machine Learning algorithm, we build a classifier which is able to decide which words from a unseen document must be regarded as relevant. This system is compared with some classical methods (based just on statistical information).
    13 schema:editor N081c1631e81c46868a5cf27c794f5daf
    14 schema:genre chapter
    15 schema:inLanguage en
    16 schema:isAccessibleForFree true
    17 schema:isPartOf N192c0b34163b458786367b39bd0cc8fe
    18 schema:name Extraction of Document Descriptive Terms with a Linguistic-Based Machine Learning Approach
    19 schema:pagination 658-661
    20 schema:productId N58a93b33db8349caab904bc09feac2aa
    21 N637b3c769396490d8073ea0f12a45348
    22 N873006b5423f443b85dc9d9d01acfa10
    23 schema:publisher N5fd31e0fa1774e2d8adceb402e64ee0a
    24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005975224
    25 https://doi.org/10.1007/978-3-540-24687-9_96
    26 schema:sdDatePublished 2019-04-16T08:23
    27 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    28 schema:sdPublisher Na7cfd31e10134133853125ac51678101
    29 schema:url https://link.springer.com/10.1007%2F978-3-540-24687-9_96
    30 sgo:license sg:explorer/license/
    31 sgo:sdDataset chapters
    32 rdf:type schema:Chapter
    33 N081c1631e81c46868a5cf27c794f5daf rdf:first Na36ab7c9ae5c470f9f9f84a4f7e07eec
    34 rdf:rest Nf30fe37b46db4ead8956c7741b238c08
    35 N0a90b4ab67a34361a3522191604920eb schema:familyName Dongarra
    36 schema:givenName Jack
    37 rdf:type schema:Person
    38 N192c0b34163b458786367b39bd0cc8fe schema:isbn 978-3-540-22115-9
    39 978-3-540-24687-9
    40 schema:name Computational Science - ICCS 2004
    41 rdf:type schema:Book
    42 N1b70e50b9ae5463486f5825f7f6d9696 rdf:first sg:person.010242453671.42
    43 rdf:rest N6529871154904c049cadce15849910ee
    44 N244942e243ff4c93a02624837a71d44f rdf:first sg:person.011220142546.36
    45 rdf:rest Ndd112bcb0d96486d84388cdb317058f4
    46 N58a93b33db8349caab904bc09feac2aa schema:name readcube_id
    47 schema:value 1c76512c7dbd79eac73a8da01ca53a3e2bef2d6057764ef42b0180b6ee0384eb
    48 rdf:type schema:PropertyValue
    49 N5fd31e0fa1774e2d8adceb402e64ee0a schema:location Berlin, Heidelberg
    50 schema:name Springer Berlin Heidelberg
    51 rdf:type schema:Organisation
    52 N637b3c769396490d8073ea0f12a45348 schema:name dimensions_id
    53 schema:value pub.1005975224
    54 rdf:type schema:PropertyValue
    55 N6529871154904c049cadce15849910ee rdf:first sg:person.011017130042.09
    56 rdf:rest Na16dd34be6d94ec695d387c5872b9fcd
    57 N6e95abb8406545128fab157fdf52d32c rdf:first Na25bbf27996a42cbb43bb2d4d8ac8a1a
    58 rdf:rest Nb02d6f114bcc4aa7a3da9f7db20e469e
    59 N873006b5423f443b85dc9d9d01acfa10 schema:name doi
    60 schema:value 10.1007/978-3-540-24687-9_96
    61 rdf:type schema:PropertyValue
    62 Na16dd34be6d94ec695d387c5872b9fcd rdf:first sg:person.014120426453.50
    63 rdf:rest rdf:nil
    64 Na25bbf27996a42cbb43bb2d4d8ac8a1a schema:familyName Sloot
    65 schema:givenName Peter M. A.
    66 rdf:type schema:Person
    67 Na36ab7c9ae5c470f9f9f84a4f7e07eec schema:familyName Bubak
    68 schema:givenName Marian
    69 rdf:type schema:Person
    70 Na7cfd31e10134133853125ac51678101 schema:name Springer Nature - SN SciGraph project
    71 rdf:type schema:Organization
    72 Nb02d6f114bcc4aa7a3da9f7db20e469e rdf:first N0a90b4ab67a34361a3522191604920eb
    73 rdf:rest rdf:nil
    74 Nb303774f550248a2a7866e67fecb3fc4 schema:familyName van Albada
    75 schema:givenName Geert Dick
    76 rdf:type schema:Person
    77 Ndd112bcb0d96486d84388cdb317058f4 rdf:first sg:person.011600442422.98
    78 rdf:rest N1b70e50b9ae5463486f5825f7f6d9696
    79 Nf30fe37b46db4ead8956c7741b238c08 rdf:first Nb303774f550248a2a7866e67fecb3fc4
    80 rdf:rest N6e95abb8406545128fab157fdf52d32c
    81 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    82 schema:name Information and Computing Sciences
    83 rdf:type schema:DefinedTerm
    84 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    85 schema:name Artificial Intelligence and Image Processing
    86 rdf:type schema:DefinedTerm
    87 sg:person.010242453671.42 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
    88 schema:familyName Díaz
    89 schema:givenName Irene
    90 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010242453671.42
    91 rdf:type schema:Person
    92 sg:person.011017130042.09 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
    93 schema:familyName Ranilla
    94 schema:givenName José
    95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011017130042.09
    96 rdf:type schema:Person
    97 sg:person.011220142546.36 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
    98 schema:familyName Fernández
    99 schema:givenName Javier
    100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011220142546.36
    101 rdf:type schema:Person
    102 sg:person.011600442422.98 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
    103 schema:familyName Montañés
    104 schema:givenName Elena
    105 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011600442422.98
    106 rdf:type schema:Person
    107 sg:person.014120426453.50 schema:affiliation https://www.grid.ac/institutes/grid.10863.3c
    108 schema:familyName Combarro
    109 schema:givenName Elías F.
    110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014120426453.50
    111 rdf:type schema:Person
    112 sg:pub.10.1007/3-540-45486-1_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015602649
    113 https://doi.org/10.1007/3-540-45486-1_4
    114 rdf:type schema:CreativeWork
    115 sg:pub.10.1023/a:1009976227802 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049582902
    116 https://doi.org/10.1023/a:1009976227802
    117 rdf:type schema:CreativeWork
    118 https://doi.org/10.1006/ijhc.2002.1002 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007076379
    119 rdf:type schema:CreativeWork
    120 https://doi.org/10.1016/b978-0-08-050058-4.50007-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001305396
    121 rdf:type schema:CreativeWork
    122 https://doi.org/10.1108/eb046814 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037275209
    123 rdf:type schema:CreativeWork
    124 https://doi.org/10.1145/505282.505283 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023316280
    125 rdf:type schema:CreativeWork
    126 https://www.grid.ac/institutes/grid.10863.3c schema:alternateName University of Oviedo
    127 schema:name Artificial Intelligence Center, University of Oviedo, Spain
    128 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...