Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2010

AUTHORS

Satya S. Sahoo , Olivier Bodenreider , Pascal Hitzler , Amit Sheth , Krishnaprasad Thirunarayan

ABSTRACT

The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries. More... »

PAGES

461-470

References to SciGraph publications

  • 2005-05. Relations in biomedical ontologies in GENOME BIOLOGY
  • 2003-06-18. Varieties of Contexts in MODELING AND USING CONTEXT
  • Book

    TITLE

    Scientific and Statistical Database Management

    ISBN

    978-3-642-13817-1
    978-3-642-13818-8

    Author Affiliations

    From Grant

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/978-3-642-13818-8_32

    DOI

    http://dx.doi.org/10.1007/978-3-642-13818-8_32

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1036586839

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/25621321


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information Systems", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Wright State University", 
              "id": "https://www.grid.ac/institutes/grid.268333.f", 
              "name": [
                "Kno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sahoo", 
            "givenName": "Satya S.", 
            "id": "sg:person.01000722274.50", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01000722274.50"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "name": [
                "Lister Hill National Center for Biomedical Communications, National Library of Medicine, NIH, Bethesda, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Bodenreider", 
            "givenName": "Olivier", 
            "id": "sg:person.01033503431.82", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01033503431.82"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Wright State University", 
              "id": "https://www.grid.ac/institutes/grid.268333.f", 
              "name": [
                "Kno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hitzler", 
            "givenName": "Pascal", 
            "id": "sg:person.01075256224.32", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01075256224.32"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Wright State University", 
              "id": "https://www.grid.ac/institutes/grid.268333.f", 
              "name": [
                "Kno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sheth", 
            "givenName": "Amit", 
            "id": "sg:person.01322602134.04", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01322602134.04"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Wright State University", 
              "id": "https://www.grid.ac/institutes/grid.268333.f", 
              "name": [
                "Kno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Thirunarayan", 
            "givenName": "Krishnaprasad", 
            "id": "sg:person.014506142601.36", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014506142601.36"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1186/gb-2005-6-5-r46", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015285227", 
              "https://doi.org/10.1186/gb-2005-6-5-r46"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkh063", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017271040"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-44958-2_14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027082927", 
              "https://doi.org/10.1007/3-540-44958-2_14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-44958-2_14", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027082927", 
              "https://doi.org/10.1007/3-540-44958-2_14"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1093/nar/gkh061", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042802800"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2010", 
        "datePublishedReg": "2010-01-01", 
        "description": "The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.", 
        "editor": [
          {
            "familyName": "Gertz", 
            "givenName": "Michael", 
            "type": "Person"
          }, 
          {
            "familyName": "Lud\u00e4scher", 
            "givenName": "Bertram", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/978-3-642-13818-8_32", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.2541418", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": {
          "isbn": [
            "978-3-642-13817-1", 
            "978-3-642-13818-8"
          ], 
          "name": "Scientific and Statistical Database Management", 
          "type": "Book"
        }, 
        "name": "Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data", 
        "pagination": "461-470", 
        "productId": [
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/978-3-642-13818-8_32"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "ae4909f368ba5638449a91d886435977846bd665ac3d5f60ed1446caa2a9d1b8"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1036586839"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "25621321"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/978-3-642-13818-8_32", 
          "https://app.dimensions.ai/details/publication/pub.1036586839"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-15T20:16", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8687_00000353.jsonl", 
        "type": "Chapter", 
        "url": "http://link.springer.com/10.1007/978-3-642-13818-8_32"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-13818-8_32'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-13818-8_32'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-13818-8_32'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-13818-8_32'


     

    This table displays all metadata directly associated to this object as RDF triples.

    120 TRIPLES      23 PREDICATES      32 URIs      21 LITERALS      9 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/978-3-642-13818-8_32 schema:about anzsrc-for:08
    2 anzsrc-for:0806
    3 schema:author Ne20da5c512dd485b98b8bef91ad64a73
    4 schema:citation sg:pub.10.1007/3-540-44958-2_14
    5 sg:pub.10.1186/gb-2005-6-5-r46
    6 https://doi.org/10.1093/nar/gkh061
    7 https://doi.org/10.1093/nar/gkh063
    8 schema:datePublished 2010
    9 schema:datePublishedReg 2010-01-01
    10 schema:description The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of <i>provenance context</i> to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.
    11 schema:editor N0fcf8ac59e694c649bf8473836c55d1e
    12 schema:genre chapter
    13 schema:inLanguage en
    14 schema:isAccessibleForFree true
    15 schema:isPartOf N0da1e7117642498dad6fbbd49a4b1d56
    16 schema:name Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data
    17 schema:pagination 461-470
    18 schema:productId N0803c09e802b4c719bde908b50fe1429
    19 N1270be70af7b482aa1c3f54240be06fe
    20 N6553524097684462b9fce836ff2cd2aa
    21 N896c524ed0c94f948f17b60adae7d754
    22 schema:publisher N49e7e85196524317b072a7ce5e4353c3
    23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036586839
    24 https://doi.org/10.1007/978-3-642-13818-8_32
    25 schema:sdDatePublished 2019-04-15T20:16
    26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    27 schema:sdPublisher Na9af27bc41da478e8015ac63fd523fb3
    28 schema:url http://link.springer.com/10.1007/978-3-642-13818-8_32
    29 sgo:license sg:explorer/license/
    30 sgo:sdDataset chapters
    31 rdf:type schema:Chapter
    32 N0803c09e802b4c719bde908b50fe1429 schema:name doi
    33 schema:value 10.1007/978-3-642-13818-8_32
    34 rdf:type schema:PropertyValue
    35 N0c0e31b8fe774b13b03b659c375cec8e rdf:first sg:person.01033503431.82
    36 rdf:rest N83eb5925322e4e7d888999c1d6bcb990
    37 N0da1e7117642498dad6fbbd49a4b1d56 schema:isbn 978-3-642-13817-1
    38 978-3-642-13818-8
    39 schema:name Scientific and Statistical Database Management
    40 rdf:type schema:Book
    41 N0fcf8ac59e694c649bf8473836c55d1e rdf:first N5b9f5b74742e481ebd61ad1580b1413a
    42 rdf:rest N3933b0c906b34224b5b6bb2682681f43
    43 N1270be70af7b482aa1c3f54240be06fe schema:name pubmed_id
    44 schema:value 25621321
    45 rdf:type schema:PropertyValue
    46 N16c691e1ce404e06b7e0f677fb7ecd23 schema:familyName Ludäscher
    47 schema:givenName Bertram
    48 rdf:type schema:Person
    49 N2ada1ec88bfb4184a4575feb0ed3f7b5 rdf:first sg:person.01322602134.04
    50 rdf:rest Ne5a60498deaa4faeb37a55906fc41a3a
    51 N3933b0c906b34224b5b6bb2682681f43 rdf:first N16c691e1ce404e06b7e0f677fb7ecd23
    52 rdf:rest rdf:nil
    53 N49e7e85196524317b072a7ce5e4353c3 schema:location Berlin, Heidelberg
    54 schema:name Springer Berlin Heidelberg
    55 rdf:type schema:Organisation
    56 N5b9f5b74742e481ebd61ad1580b1413a schema:familyName Gertz
    57 schema:givenName Michael
    58 rdf:type schema:Person
    59 N6553524097684462b9fce836ff2cd2aa schema:name readcube_id
    60 schema:value ae4909f368ba5638449a91d886435977846bd665ac3d5f60ed1446caa2a9d1b8
    61 rdf:type schema:PropertyValue
    62 N83eb5925322e4e7d888999c1d6bcb990 rdf:first sg:person.01075256224.32
    63 rdf:rest N2ada1ec88bfb4184a4575feb0ed3f7b5
    64 N896c524ed0c94f948f17b60adae7d754 schema:name dimensions_id
    65 schema:value pub.1036586839
    66 rdf:type schema:PropertyValue
    67 Na9af27bc41da478e8015ac63fd523fb3 schema:name Springer Nature - SN SciGraph project
    68 rdf:type schema:Organization
    69 Nd8ca0dfe55384121a21002297fc95038 schema:name Lister Hill National Center for Biomedical Communications, National Library of Medicine, NIH, Bethesda, MD, USA
    70 rdf:type schema:Organization
    71 Ne20da5c512dd485b98b8bef91ad64a73 rdf:first sg:person.01000722274.50
    72 rdf:rest N0c0e31b8fe774b13b03b659c375cec8e
    73 Ne5a60498deaa4faeb37a55906fc41a3a rdf:first sg:person.014506142601.36
    74 rdf:rest rdf:nil
    75 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    76 schema:name Information and Computing Sciences
    77 rdf:type schema:DefinedTerm
    78 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
    79 schema:name Information Systems
    80 rdf:type schema:DefinedTerm
    81 sg:grant.2541418 http://pending.schema.org/fundedItem sg:pub.10.1007/978-3-642-13818-8_32
    82 rdf:type schema:MonetaryGrant
    83 sg:person.01000722274.50 schema:affiliation https://www.grid.ac/institutes/grid.268333.f
    84 schema:familyName Sahoo
    85 schema:givenName Satya S.
    86 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01000722274.50
    87 rdf:type schema:Person
    88 sg:person.01033503431.82 schema:affiliation Nd8ca0dfe55384121a21002297fc95038
    89 schema:familyName Bodenreider
    90 schema:givenName Olivier
    91 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01033503431.82
    92 rdf:type schema:Person
    93 sg:person.01075256224.32 schema:affiliation https://www.grid.ac/institutes/grid.268333.f
    94 schema:familyName Hitzler
    95 schema:givenName Pascal
    96 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01075256224.32
    97 rdf:type schema:Person
    98 sg:person.01322602134.04 schema:affiliation https://www.grid.ac/institutes/grid.268333.f
    99 schema:familyName Sheth
    100 schema:givenName Amit
    101 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01322602134.04
    102 rdf:type schema:Person
    103 sg:person.014506142601.36 schema:affiliation https://www.grid.ac/institutes/grid.268333.f
    104 schema:familyName Thirunarayan
    105 schema:givenName Krishnaprasad
    106 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014506142601.36
    107 rdf:type schema:Person
    108 sg:pub.10.1007/3-540-44958-2_14 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027082927
    109 https://doi.org/10.1007/3-540-44958-2_14
    110 rdf:type schema:CreativeWork
    111 sg:pub.10.1186/gb-2005-6-5-r46 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015285227
    112 https://doi.org/10.1186/gb-2005-6-5-r46
    113 rdf:type schema:CreativeWork
    114 https://doi.org/10.1093/nar/gkh061 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042802800
    115 rdf:type schema:CreativeWork
    116 https://doi.org/10.1093/nar/gkh063 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017271040
    117 rdf:type schema:CreativeWork
    118 https://www.grid.ac/institutes/grid.268333.f schema:alternateName Wright State University
    119 schema:name Kno.e.sis Center, Computer Science and Engineering Department, Wright State University, Dayton, OH, USA
    120 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...