Collection-Document Summaries View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018-03-01

AUTHORS

Nils Witt , Michael Granitzer , Christin Seifert

ABSTRACT

Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth. More... »

PAGES

638-643

References to SciGraph publications

  • 2017-01. Recent automatic text summarization techniques: a survey in ARTIFICIAL INTELLIGENCE REVIEW
  • Book

    TITLE

    Advances in Information Retrieval

    ISBN

    978-3-319-76940-0
    978-3-319-76941-7

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56

    DOI

    http://dx.doi.org/10.1007/978-3-319-76941-7_56

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1101242765


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/1701", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Psychology", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/17", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Psychology and Cognitive Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "German National Library of Economics", 
              "id": "https://www.grid.ac/institutes/grid.461649.8", 
              "name": [
                "ZBW-Leibniz Information Centre for Economics, D\u00fcsternbrooker Weg 120, 24105, Kiel, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Witt", 
            "givenName": "Nils", 
            "id": "sg:person.011012402743.21", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011012402743.21"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Passau", 
              "id": "https://www.grid.ac/institutes/grid.11046.32", 
              "name": [
                "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Granitzer", 
            "givenName": "Michael", 
            "id": "sg:person.016026412441.95", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016026412441.95"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Passau", 
              "id": "https://www.grid.ac/institutes/grid.11046.32", 
              "name": [
                "University of Passau, Innstra\u00dfe 32, 94032, Passau, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Seifert", 
            "givenName": "Christin", 
            "id": "sg:person.010257616672.34", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010257616672.34"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/s10462-016-9475-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1020351023", 
              "https://doi.org/10.1007/s10462-016-9475-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s0007087403005338", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1054024191"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1017/s0007087403005338", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1054024191"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/v1/p14-1119", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099110705"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.3115/v1/p14-1119", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1099110705"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2018-03-01", 
        "datePublishedReg": "2018-03-01", 
        "description": "Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.", 
        "editor": [
          {
            "familyName": "Pasi", 
            "givenName": "Gabriella", 
            "type": "Person"
          }, 
          {
            "familyName": "Piwowarski", 
            "givenName": "Benjamin", 
            "type": "Person"
          }, 
          {
            "familyName": "Azzopardi", 
            "givenName": "Leif", 
            "type": "Person"
          }, 
          {
            "familyName": "Hanbury", 
            "givenName": "Allan", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-319-76941-7_56", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": {
          "isbn": [
            "978-3-319-76940-0", 
            "978-3-319-76941-7"
          ], 
          "name": "Advances in Information Retrieval", 
          "type": "Book"
        }, 
        "name": "Collection-Document Summaries", 
        "pagination": "638-643", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-319-76941-7_56"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "8a46511ae5c8f927f0b85592d292a8a05cc66b4830ca6e4a20bdca91cb2d3465"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1101242765"
            ]
          }
        ], 
        "publisher": {
          "location": "Cham", 
          "name": "Springer International Publishing", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-319-76941-7_56", 
          "https://app.dimensions.ai/details/publication/pub.1101242765"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T05:02", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000325_0000000325/records_100817_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F978-3-319-76941-7_56"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-319-76941-7_56'


     

    This table displays all metadata directly associated to this object as RDF triples.

    107 TRIPLES      23 PREDICATES      29 URIs      19 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-319-76941-7_56 schema:about anzsrc-for:17
    2 anzsrc-for:1701
    3 schema:author Nf8176bbf225940d18a37f9811bcbab08
    4 schema:citation sg:pub.10.1007/s10462-016-9475-9
    5 https://doi.org/10.1017/s0007087403005338
    6 https://doi.org/10.3115/v1/p14-1119
    7 schema:datePublished 2018-03-01
    8 schema:datePublishedReg 2018-03-01
    9 schema:description Learning something new from a text requires the reader to build on existing knowledge and add new material at the same time. Therefore, we propose collection-document (CDS) summaries that highlight commonalities and differences between a collection (or a single document) and a single document. We devise evaluation metrics that do not require human judgement, and three algorithms for extracting CDS that are based on single-document keyword-extraction methods. Our evaluation shows that different algorithms have different strengths, e.g. TF-IDF based approach best describes document overlap while the adaption of Rake provides keywords with a broad topical coverage. The proposed criteria and procedure can be used to evaluate document-collection summaries without annotated corpora or provide additional insight in an evaluation with human-generated ground truth.
    10 schema:editor Nb69529b475ef447fbf19ecbb16a34119
    11 schema:genre chapter
    12 schema:inLanguage en
    13 schema:isAccessibleForFree false
    14 schema:isPartOf N4e2d986a5de04083a9120dd410985e0e
    15 schema:name Collection-Document Summaries
    16 schema:pagination 638-643
    17 schema:productId N09691d02a9ff4a79a47add485e10ff14
    18 N6ad60d4a8b874fba97f5893ad94a13c3
    19 N88d81a00d86644599e652bfb47ad4fc6
    20 schema:publisher N38808249f7dc451cbf91acd4358a246c
    21 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101242765
    22 https://doi.org/10.1007/978-3-319-76941-7_56
    23 schema:sdDatePublished 2019-04-16T05:02
    24 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    25 schema:sdPublisher Nb2d0b1afbac448bbb5380913aa8d8845
    26 schema:url https://link.springer.com/10.1007%2F978-3-319-76941-7_56
    27 sgo:license sg:explorer/license/
    28 sgo:sdDataset chapters
    29 rdf:type schema:Chapter
    30 N09691d02a9ff4a79a47add485e10ff14 schema:name doi
    31 schema:value 10.1007/978-3-319-76941-7_56
    32 rdf:type schema:PropertyValue
    33 N2b039214b9d34df0a5b45af86104c318 rdf:first Ncf3e72e4a4614cb0a4349210b68bd799
    34 rdf:rest rdf:nil
    35 N30c94c6d7a8c48008d5dbe2b4c448729 rdf:first sg:person.010257616672.34
    36 rdf:rest rdf:nil
    37 N38808249f7dc451cbf91acd4358a246c schema:location Cham
    38 schema:name Springer International Publishing
    39 rdf:type schema:Organisation
    40 N4e2d986a5de04083a9120dd410985e0e schema:isbn 978-3-319-76940-0
    41 978-3-319-76941-7
    42 schema:name Advances in Information Retrieval
    43 rdf:type schema:Book
    44 N63df89b1ae2248c0a7d2fa9862e91b14 rdf:first Nac383edd1b8e4be4b3fd808d2a8990bf
    45 rdf:rest Nc01afbf109a2403a8d5a10b0d2de7d4f
    46 N6ad60d4a8b874fba97f5893ad94a13c3 schema:name readcube_id
    47 schema:value 8a46511ae5c8f927f0b85592d292a8a05cc66b4830ca6e4a20bdca91cb2d3465
    48 rdf:type schema:PropertyValue
    49 N88d81a00d86644599e652bfb47ad4fc6 schema:name dimensions_id
    50 schema:value pub.1101242765
    51 rdf:type schema:PropertyValue
    52 N9a79b8af20b145189f1e580b7174a490 schema:familyName Azzopardi
    53 schema:givenName Leif
    54 rdf:type schema:Person
    55 Nac383edd1b8e4be4b3fd808d2a8990bf schema:familyName Piwowarski
    56 schema:givenName Benjamin
    57 rdf:type schema:Person
    58 Nb2d0b1afbac448bbb5380913aa8d8845 schema:name Springer Nature - SN SciGraph project
    59 rdf:type schema:Organization
    60 Nb69529b475ef447fbf19ecbb16a34119 rdf:first Ndcaba5d21ea94e3f9fce3e35c3784658
    61 rdf:rest N63df89b1ae2248c0a7d2fa9862e91b14
    62 Nc01afbf109a2403a8d5a10b0d2de7d4f rdf:first N9a79b8af20b145189f1e580b7174a490
    63 rdf:rest N2b039214b9d34df0a5b45af86104c318
    64 Ncdcb8ccf34054315945e5ed21c285b59 rdf:first sg:person.016026412441.95
    65 rdf:rest N30c94c6d7a8c48008d5dbe2b4c448729
    66 Ncf3e72e4a4614cb0a4349210b68bd799 schema:familyName Hanbury
    67 schema:givenName Allan
    68 rdf:type schema:Person
    69 Ndcaba5d21ea94e3f9fce3e35c3784658 schema:familyName Pasi
    70 schema:givenName Gabriella
    71 rdf:type schema:Person
    72 Nf8176bbf225940d18a37f9811bcbab08 rdf:first sg:person.011012402743.21
    73 rdf:rest Ncdcb8ccf34054315945e5ed21c285b59
    74 anzsrc-for:17 schema:inDefinedTermSet anzsrc-for:
    75 schema:name Psychology and Cognitive Sciences
    76 rdf:type schema:DefinedTerm
    77 anzsrc-for:1701 schema:inDefinedTermSet anzsrc-for:
    78 schema:name Psychology
    79 rdf:type schema:DefinedTerm
    80 sg:person.010257616672.34 schema:affiliation https://www.grid.ac/institutes/grid.11046.32
    81 schema:familyName Seifert
    82 schema:givenName Christin
    83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010257616672.34
    84 rdf:type schema:Person
    85 sg:person.011012402743.21 schema:affiliation https://www.grid.ac/institutes/grid.461649.8
    86 schema:familyName Witt
    87 schema:givenName Nils
    88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011012402743.21
    89 rdf:type schema:Person
    90 sg:person.016026412441.95 schema:affiliation https://www.grid.ac/institutes/grid.11046.32
    91 schema:familyName Granitzer
    92 schema:givenName Michael
    93 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016026412441.95
    94 rdf:type schema:Person
    95 sg:pub.10.1007/s10462-016-9475-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020351023
    96 https://doi.org/10.1007/s10462-016-9475-9
    97 rdf:type schema:CreativeWork
    98 https://doi.org/10.1017/s0007087403005338 schema:sameAs https://app.dimensions.ai/details/publication/pub.1054024191
    99 rdf:type schema:CreativeWork
    100 https://doi.org/10.3115/v1/p14-1119 schema:sameAs https://app.dimensions.ai/details/publication/pub.1099110705
    101 rdf:type schema:CreativeWork
    102 https://www.grid.ac/institutes/grid.11046.32 schema:alternateName University of Passau
    103 schema:name University of Passau, Innstraße 32, 94032, Passau, Germany
    104 rdf:type schema:Organization
    105 https://www.grid.ac/institutes/grid.461649.8 schema:alternateName German National Library of Economics
    106 schema:name ZBW-Leibniz Information Centre for Economics, Düsternbrooker Weg 120, 24105, Kiel, Germany
    107 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...