2018-03-01
AUTHORSNils Witt , Michael Granitzer , Christin Seifert
ABSTRACTLearning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth. More... »
PAGES638-643
Advances in Information Retrieval
ISBN
978-3-319-76940-0
978-3-319-76941-7
http://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56
DOIhttp://dx.doi.org/10.1007/978-3-319-76941-7_56
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1101242765
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Psychology and Cognitive Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Psychology",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "ZBW-Leibniz Information Centre for Economics, D\u00fcsternbrooker Weg 120, 24105, Kiel, Germany",
"id": "http://www.grid.ac/institutes/grid.461649.8",
"name": [
"ZBW-Leibniz Information Centre for Economics, D\u00fcsternbrooker Weg 120, 24105, Kiel, Germany"
],
"type": "Organization"
},
"familyName": "Witt",
"givenName": "Nils",
"id": "sg:person.011012402743.21",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011012402743.21"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany",
"id": "http://www.grid.ac/institutes/grid.11046.32",
"name": [
"University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
],
"type": "Organization"
},
"familyName": "Granitzer",
"givenName": "Michael",
"id": "sg:person.016026412441.95",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016026412441.95"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany",
"id": "http://www.grid.ac/institutes/grid.11046.32",
"name": [
"University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
],
"type": "Organization"
},
"familyName": "Seifert",
"givenName": "Christin",
"id": "sg:person.010257616672.34",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010257616672.34"
],
"type": "Person"
}
],
"datePublished": "2018-03-01",
"datePublishedReg": "2018-03-01",
"description": "Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.",
"editor": [
{
"familyName": "Pasi",
"givenName": "Gabriella",
"type": "Person"
},
{
"familyName": "Piwowarski",
"givenName": "Benjamin",
"type": "Person"
},
{
"familyName": "Azzopardi",
"givenName": "Leif",
"type": "Person"
},
{
"familyName": "Hanbury",
"givenName": "Allan",
"type": "Person"
}
],
"genre": "chapter",
"id": "sg:pub.10.1007/978-3-319-76941-7_56",
"inLanguage": "en",
"isAccessibleForFree": true,
"isPartOf": {
"isbn": [
"978-3-319-76940-0",
"978-3-319-76941-7"
],
"name": "Advances in Information Retrieval",
"type": "Book"
},
"keywords": [
"keyword extraction method",
"document overlaps",
"evaluation metrics",
"TF-IDF",
"different algorithms",
"ground truth",
"single document",
"human judgment",
"algorithm",
"same time",
"topical coverage",
"keywords",
"documents",
"metrics",
"corpus",
"text",
"collection",
"adaption",
"different strengths",
"evaluation",
"highlight commonalities",
"truth",
"knowledge",
"CDS",
"commonalities",
"method",
"coverage",
"readers",
"time",
"summary",
"additional insight",
"judgments",
"criteria",
"insights",
"procedure",
"overlap",
"rake",
"strength",
"new materials",
"differences",
"materials",
"approach"
],
"name": "Collection-Document Summaries",
"pagination": "638-643",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1101242765"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/978-3-319-76941-7_56"
]
}
],
"publisher": {
"name": "Springer Nature",
"type": "Organisation"
},
"sameAs": [
"https://doi.org/10.1007/978-3-319-76941-7_56",
"https://app.dimensions.ai/details/publication/pub.1101242765"
],
"sdDataset": "chapters",
"sdDatePublished": "2022-06-01T22:35",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/chapter/chapter_447.jsonl",
"type": "Chapter",
"url": "https://doi.org/10.1007/978-3-319-76941-7_56"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'
This table displays all metadata directly associated to this object as RDF triples.
134 TRIPLES
23 PREDICATES
67 URIs
60 LITERALS
7 BLANK NODES