Analyzing Document Collections via Context-Aware Term Extraction View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Daniel A. Keim , Daniela Oelke , Christian Rohrdantz

ABSTRACT

In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches. More... »

PAGES

154-168

Book

TITLE

Natural Language Processing and Information Systems

ISBN

978-3-642-12549-2
978-3-642-12550-8

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-12550-8_13

DOI

http://dx.doi.org/10.1007/978-3-642-12550-8_13

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030179986


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "University of Konstanz, Germany", 
          "id": "http://www.grid.ac/institutes/grid.9811.1", 
          "name": [
            "University of Konstanz, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Keim", 
        "givenName": "Daniel A.", 
        "id": "sg:person.0635776571.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0635776571.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Konstanz, Germany", 
          "id": "http://www.grid.ac/institutes/grid.9811.1", 
          "name": [
            "University of Konstanz, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Oelke", 
        "givenName": "Daniela", 
        "id": "sg:person.07667765141.23", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07667765141.23"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Konstanz, Germany", 
          "id": "http://www.grid.ac/institutes/grid.9811.1", 
          "name": [
            "University of Konstanz, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rohrdantz", 
        "givenName": "Christian", 
        "id": "sg:person.015642516425.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015642516425.63"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2010", 
    "datePublishedReg": "2010-01-01", 
    "description": "In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.", 
    "editor": [
      {
        "familyName": "Horacek", 
        "givenName": "Helmut", 
        "type": "Person"
      }, 
      {
        "familyName": "M\u00e9tais", 
        "givenName": "Elisabeth", 
        "type": "Person"
      }, 
      {
        "familyName": "Mu\u00f1oz", 
        "givenName": "Rafael", 
        "type": "Person"
      }, 
      {
        "familyName": "Wolska", 
        "givenName": "Magdalena", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-12550-8_13", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-12549-2", 
        "978-3-642-12550-8"
      ], 
      "name": "Natural Language Processing and Information Systems", 
      "type": "Book"
    }, 
    "keywords": [
      "document collections", 
      "such document collections", 
      "real-world applications", 
      "application tasks", 
      "world applications", 
      "term extraction", 
      "product reviews", 
      "large collection", 
      "different scientific conferences", 
      "collection", 
      "review class", 
      "applications", 
      "task", 
      "documents", 
      "first application", 
      "topic", 
      "practice examples", 
      "alternative approach", 
      "class", 
      "topical differences", 
      "extraction", 
      "similarity", 
      "terms", 
      "effectiveness", 
      "example", 
      "conference", 
      "single class", 
      "method", 
      "subset", 
      "interest", 
      "one", 
      "scientific conferences", 
      "collection of papers", 
      "special interest", 
      "importance", 
      "reviewers", 
      "review", 
      "differences", 
      "paper", 
      "approach"
    ], 
    "name": "Analyzing Document Collections via Context-Aware Term Extraction", 
    "pagination": "154-168", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030179986"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-12550-8_13"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-12550-8_13", 
      "https://app.dimensions.ai/details/publication/pub.1030179986"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-10-01T06:54", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/chapter/chapter_219.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-12550-8_13"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12550-8_13'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12550-8_13'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12550-8_13'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-12550-8_13'


 

This table displays all metadata directly associated to this object as RDF triples.

128 TRIPLES      22 PREDICATES      65 URIs      58 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-12550-8_13 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author N80cc58c6caee4efca1f367f1589bd186
4 schema:datePublished 2010
5 schema:datePublishedReg 2010-01-01
6 schema:description In large collections of documents that are divided into predefined classes, the differences and similarities of those classes are of special interest. This paper presents an approach that is able to automatically extract terms from such document collections which describe what topics discriminate a single class from the others (discriminating terms) and which topics discriminate a subset of the classes against the remaining ones (overlap terms). The importance for real world applications and the effectiveness of our approach are demonstrated by two out of practice examples. In a first application our predefined classes correspond to different scientific conferences. By extracting terms from collections of papers published on these conferences, we determine automatically the topical differences and similarities of the conferences. In our second application task we extract terms out of a collection of product reviews which show what features reviewers commented on. We get these terms by discriminating the product review class against a suitable counter-balance class. Finally, our method is evaluated comparing it to alternative approaches.
7 schema:editor Nbcc505b76da0434fa90faff97a62f6d4
8 schema:genre chapter
9 schema:isAccessibleForFree true
10 schema:isPartOf Ne59ffc7eaf184e8b881c62078a4159b7
11 schema:keywords alternative approach
12 application tasks
13 applications
14 approach
15 class
16 collection
17 collection of papers
18 conference
19 differences
20 different scientific conferences
21 document collections
22 documents
23 effectiveness
24 example
25 extraction
26 first application
27 importance
28 interest
29 large collection
30 method
31 one
32 paper
33 practice examples
34 product reviews
35 real-world applications
36 review
37 review class
38 reviewers
39 scientific conferences
40 similarity
41 single class
42 special interest
43 subset
44 such document collections
45 task
46 term extraction
47 terms
48 topic
49 topical differences
50 world applications
51 schema:name Analyzing Document Collections via Context-Aware Term Extraction
52 schema:pagination 154-168
53 schema:productId N986572e158e54c87ab0590f141f1facb
54 Nc87101b1cb704a67b9c6530dd0a8fa04
55 schema:publisher Na9761c245375476e9e6a288dda3811d4
56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030179986
57 https://doi.org/10.1007/978-3-642-12550-8_13
58 schema:sdDatePublished 2022-10-01T06:54
59 schema:sdLicense https://scigraph.springernature.com/explorer/license/
60 schema:sdPublisher N5809464e589b4697b1b708dfdbd2687e
61 schema:url https://doi.org/10.1007/978-3-642-12550-8_13
62 sgo:license sg:explorer/license/
63 sgo:sdDataset chapters
64 rdf:type schema:Chapter
65 N08f1936151344a588d331e002ff65fbe schema:familyName Horacek
66 schema:givenName Helmut
67 rdf:type schema:Person
68 N0a34f720014e4dfca90b8c5b914c6606 schema:familyName Muñoz
69 schema:givenName Rafael
70 rdf:type schema:Person
71 N33ca3760a3d34d358b3c3ac29e5a0b24 schema:familyName Wolska
72 schema:givenName Magdalena
73 rdf:type schema:Person
74 N349104c75938423e9ad135efc2d3e05f rdf:first N33ca3760a3d34d358b3c3ac29e5a0b24
75 rdf:rest rdf:nil
76 N4c1b078be5de411fabb1282a5bae63d1 rdf:first Nd557e3c78bc5418eba61cc9e47c33b82
77 rdf:rest Ne3add5bd8e7b4898aa6db20ea391de73
78 N5809464e589b4697b1b708dfdbd2687e schema:name Springer Nature - SN SciGraph project
79 rdf:type schema:Organization
80 N6317ac89739b432fad680750cba56bea rdf:first sg:person.07667765141.23
81 rdf:rest N74402cec16c940d8871c5bbfc1a948b5
82 N74402cec16c940d8871c5bbfc1a948b5 rdf:first sg:person.015642516425.63
83 rdf:rest rdf:nil
84 N80cc58c6caee4efca1f367f1589bd186 rdf:first sg:person.0635776571.01
85 rdf:rest N6317ac89739b432fad680750cba56bea
86 N986572e158e54c87ab0590f141f1facb schema:name doi
87 schema:value 10.1007/978-3-642-12550-8_13
88 rdf:type schema:PropertyValue
89 Na9761c245375476e9e6a288dda3811d4 schema:name Springer Nature
90 rdf:type schema:Organisation
91 Nbcc505b76da0434fa90faff97a62f6d4 rdf:first N08f1936151344a588d331e002ff65fbe
92 rdf:rest N4c1b078be5de411fabb1282a5bae63d1
93 Nc87101b1cb704a67b9c6530dd0a8fa04 schema:name dimensions_id
94 schema:value pub.1030179986
95 rdf:type schema:PropertyValue
96 Nd557e3c78bc5418eba61cc9e47c33b82 schema:familyName Métais
97 schema:givenName Elisabeth
98 rdf:type schema:Person
99 Ne3add5bd8e7b4898aa6db20ea391de73 rdf:first N0a34f720014e4dfca90b8c5b914c6606
100 rdf:rest N349104c75938423e9ad135efc2d3e05f
101 Ne59ffc7eaf184e8b881c62078a4159b7 schema:isbn 978-3-642-12549-2
102 978-3-642-12550-8
103 schema:name Natural Language Processing and Information Systems
104 rdf:type schema:Book
105 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
106 schema:name Information and Computing Sciences
107 rdf:type schema:DefinedTerm
108 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
109 schema:name Artificial Intelligence and Image Processing
110 rdf:type schema:DefinedTerm
111 sg:person.015642516425.63 schema:affiliation grid-institutes:grid.9811.1
112 schema:familyName Rohrdantz
113 schema:givenName Christian
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015642516425.63
115 rdf:type schema:Person
116 sg:person.0635776571.01 schema:affiliation grid-institutes:grid.9811.1
117 schema:familyName Keim
118 schema:givenName Daniel A.
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0635776571.01
120 rdf:type schema:Person
121 sg:person.07667765141.23 schema:affiliation grid-institutes:grid.9811.1
122 schema:familyName Oelke
123 schema:givenName Daniela
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07667765141.23
125 rdf:type schema:Person
126 grid-institutes:grid.9811.1 schema:alternateName University of Konstanz, Germany
127 schema:name University of Konstanz, Germany
128 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...