Collection-Document Summaries View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2018-03-01

AUTHORS

Nils Witt , Michael Granitzer , Christin Seifert

ABSTRACT

Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth. More... »

PAGES

638-643

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56

DOI

http://dx.doi.org/10.1007/978-3-319-76941-7_56

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1101242765


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology and Cognitive Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Psychology", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "ZBW-Leibniz Information Centre for Economics, D\u00fcsternbrooker Weg 120, 24105, Kiel, Germany", 
          "id": "http://www.grid.ac/institutes/grid.461649.8", 
          "name": [
            "ZBW-Leibniz Information Centre for Economics, D\u00fcsternbrooker Weg 120, 24105, Kiel, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Witt", 
        "givenName": "Nils", 
        "id": "sg:person.011012402743.21", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011012402743.21"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany", 
          "id": "http://www.grid.ac/institutes/grid.11046.32", 
          "name": [
            "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Granitzer", 
        "givenName": "Michael", 
        "id": "sg:person.016026412441.95", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016026412441.95"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany", 
          "id": "http://www.grid.ac/institutes/grid.11046.32", 
          "name": [
            "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Seifert", 
        "givenName": "Christin", 
        "id": "sg:person.010257616672.34", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010257616672.34"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2018-03-01", 
    "datePublishedReg": "2018-03-01", 
    "description": "Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.", 
    "editor": [
      {
        "familyName": "Pasi", 
        "givenName": "Gabriella", 
        "type": "Person"
      }, 
      {
        "familyName": "Piwowarski", 
        "givenName": "Benjamin", 
        "type": "Person"
      }, 
      {
        "familyName": "Azzopardi", 
        "givenName": "Leif", 
        "type": "Person"
      }, 
      {
        "familyName": "Hanbury", 
        "givenName": "Allan", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-319-76941-7_56", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-319-76940-0", 
        "978-3-319-76941-7"
      ], 
      "name": "Advances in Information Retrieval", 
      "type": "Book"
    }, 
    "keywords": [
      "keyword extraction method", 
      "document overlaps", 
      "evaluation metrics", 
      "TF-IDF", 
      "different algorithms", 
      "ground truth", 
      "single document", 
      "human judgment", 
      "algorithm", 
      "same time", 
      "topical coverage", 
      "keywords", 
      "documents", 
      "metrics", 
      "corpus", 
      "text", 
      "collection", 
      "adaption", 
      "different strengths", 
      "evaluation", 
      "highlight commonalities", 
      "truth", 
      "knowledge", 
      "CDS", 
      "commonalities", 
      "method", 
      "coverage", 
      "readers", 
      "time", 
      "summary", 
      "additional insight", 
      "judgments", 
      "criteria", 
      "insights", 
      "procedure", 
      "overlap", 
      "rake", 
      "strength", 
      "new materials", 
      "differences", 
      "materials", 
      "approach"
    ], 
    "name": "Collection-Document Summaries", 
    "pagination": "638-643", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1101242765"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-319-76941-7_56"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-319-76941-7_56", 
      "https://app.dimensions.ai/details/publication/pub.1101242765"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-06-01T22:35", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/chapter/chapter_447.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-319-76941-7_56"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'


 

This table displays all metadata directly associated to this object as RDF triples.

134 TRIPLES      23 PREDICATES      67 URIs      60 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-319-76941-7_56 schema:about anzsrc-for:17
2 anzsrc-for:1701
3 schema:author Nce36688b29ad431697c4c5e939eae189
4 schema:datePublished 2018-03-01
5 schema:datePublishedReg 2018-03-01
6 schema:description Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
7 schema:editor Ne9c68f4e6eae4746bd422d037a9fb920
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree true
11 schema:isPartOf Nf84f55e4f90d4f92ab0f5627b71f4487
12 schema:keywords CDS
13 TF-IDF
14 adaption
15 additional insight
16 algorithm
17 approach
18 collection
19 commonalities
20 corpus
21 coverage
22 criteria
23 differences
24 different algorithms
25 different strengths
26 document overlaps
27 documents
28 evaluation
29 evaluation metrics
30 ground truth
31 highlight commonalities
32 human judgment
33 insights
34 judgments
35 keyword extraction method
36 keywords
37 knowledge
38 materials
39 method
40 metrics
41 new materials
42 overlap
43 procedure
44 rake
45 readers
46 same time
47 single document
48 strength
49 summary
50 text
51 time
52 topical coverage
53 truth
54 schema:name Collection-Document Summaries
55 schema:pagination 638-643
56 schema:productId N5174e8452d614a14ad04b2a253e7c54e
57 Nfc76aa5b8f1e4bdfb943a22756f037c9
58 schema:publisher Nfb2b2ccc5c7549d4bb2810ad007bdda0
59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101242765
60 https://doi.org/10.1007/978-3-319-76941-7_56
61 schema:sdDatePublished 2022-06-01T22:35
62 schema:sdLicense https://scigraph.springernature.com/explorer/license/
63 schema:sdPublisher Ncc91c155c42046bc85d9118b6c3d115b
64 schema:url https://doi.org/10.1007/978-3-319-76941-7_56
65 sgo:license sg:explorer/license/
66 sgo:sdDataset chapters
67 rdf:type schema:Chapter
68 N05984464fc92485ca0d7bbb84d66bbd8 schema:familyName Piwowarski
69 schema:givenName Benjamin
70 rdf:type schema:Person
71 N0c84dfa32bf14490930623992ab90be2 schema:familyName Hanbury
72 schema:givenName Allan
73 rdf:type schema:Person
74 N2c0d2d8427144e2db0c1a75220dc77d9 rdf:first N05984464fc92485ca0d7bbb84d66bbd8
75 rdf:rest N68386b95860147689f356e7413060675
76 N4f6120f003f44c438b69dc9878db2cdd schema:familyName Pasi
77 schema:givenName Gabriella
78 rdf:type schema:Person
79 N5174e8452d614a14ad04b2a253e7c54e schema:name dimensions_id
80 schema:value pub.1101242765
81 rdf:type schema:PropertyValue
82 N68386b95860147689f356e7413060675 rdf:first Nd14d1fb00bc84cdc80486121a4c6d801
83 rdf:rest N9cb7b89e52964ab1be0cf8b188613cdd
84 N706fc05e9cac472296f629f01c60e7e6 rdf:first sg:person.010257616672.34
85 rdf:rest rdf:nil
86 N94e6742cd4204b1ca3cab0af71505010 rdf:first sg:person.016026412441.95
87 rdf:rest N706fc05e9cac472296f629f01c60e7e6
88 N9cb7b89e52964ab1be0cf8b188613cdd rdf:first N0c84dfa32bf14490930623992ab90be2
89 rdf:rest rdf:nil
90 Ncc91c155c42046bc85d9118b6c3d115b schema:name Springer Nature - SN SciGraph project
91 rdf:type schema:Organization
92 Nce36688b29ad431697c4c5e939eae189 rdf:first sg:person.011012402743.21
93 rdf:rest N94e6742cd4204b1ca3cab0af71505010
94 Nd14d1fb00bc84cdc80486121a4c6d801 schema:familyName Azzopardi
95 schema:givenName Leif
96 rdf:type schema:Person
97 Ne9c68f4e6eae4746bd422d037a9fb920 rdf:first N4f6120f003f44c438b69dc9878db2cdd
98 rdf:rest N2c0d2d8427144e2db0c1a75220dc77d9
99 Nf84f55e4f90d4f92ab0f5627b71f4487 schema:isbn 978-3-319-76940-0
100 978-3-319-76941-7
101 schema:name Advances in Information Retrieval
102 rdf:type schema:Book
103 Nfb2b2ccc5c7549d4bb2810ad007bdda0 schema:name Springer Nature
104 rdf:type schema:Organisation
105 Nfc76aa5b8f1e4bdfb943a22756f037c9 schema:name doi
106 schema:value 10.1007/978-3-319-76941-7_56
107 rdf:type schema:PropertyValue
108 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
109 schema:name Psychology and Cognitive Sciences
110 rdf:type schema:DefinedTerm
111 anzsrc-for:1701 schema:inDefinedTermSet anzsrc-for:
112 schema:name Psychology
113 rdf:type schema:DefinedTerm
114 sg:person.010257616672.34 schema:affiliation grid-institutes:grid.11046.32
115 schema:familyName Seifert
116 schema:givenName Christin
117 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010257616672.34
118 rdf:type schema:Person
119 sg:person.011012402743.21 schema:affiliation grid-institutes:grid.461649.8
120 schema:familyName Witt
121 schema:givenName Nils
122 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011012402743.21
123 rdf:type schema:Person
124 sg:person.016026412441.95 schema:affiliation grid-institutes:grid.11046.32
125 schema:familyName Granitzer
126 schema:givenName Michael
127 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016026412441.95
128 rdf:type schema:Person
129 grid-institutes:grid.11046.32 schema:alternateName University of Passau, Innstraße 32, 94032, Passau, Germany
130 schema:name University of Passau, Innstraße 32, 94032, Passau, Germany
131 rdf:type schema:Organization
132 grid-institutes:grid.461649.8 schema:alternateName ZBW-Leibniz Information Centre for Economics, Düsternbrooker Weg 120, 24105, Kiel, Germany
133 schema:name ZBW-Leibniz Information Centre for Economics, Düsternbrooker Weg 120, 24105, Kiel, Germany
134 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...