This section documents changes to the SciGraph datasets from 2017 onwards.

2019 Q1 (Current release)


Third release. This release includes a complete refactoring of SciGraph data that, following up on users feedback, aims at making the model more intuitive and the data easier to work with.

Highlights:
  • New Datasets. Data about Clinical Trials and Patents connected to Springer Nature publications have been added. This data is sourced from Dimensions.ai.
  • New Ontology. Schema.org is now the main model used to represent SciGraph data.
  • References data. Publications data now include references too (= outgoing citations).
  • Simpler Identifiers. URIs for SciGraph objects have been dramatically simplified, reusing common identifiers whenever possible. In particular all articles and chapters use the URI format prefix ('pub.') + DOI (eg pub.10.1007/s11199-007-9209-1).
  • JSON-LD. JSON-LD is now the primary serialization format used by SN SciGraph.
  • Downloads. Data dumps are now managed externally on FigShare and are referenceable via DOIs.
  • Continuous updates. New publications data is released on a daily basis. All the other datasets are refreshed on a monthly basis.
  • 2018 Q1


    Data Migration: the legacy lod.springer.com platform has been retired and its contents have been migrated into SciGraph.

    Highlights:
  • Datasets. Information about conferences, conference proceedings and conference series previously hosted on lod.springer.com is now included in SN SciGraph. Pre-existing lod.springer.com URIs are being redirected automatically to the SciGraph domain.
  • 2017 Q4


    Second release, including bibliographic metadata for the complete archive of Springer Nature publications.

    Highlights:
  • Datasets. These datasets comprise our SciGraph ontology, SKOS taxonomies and instance data covering the complete archive of Springer Nature publications, i.e. books and journals (1801-2017), conferences, affiliations, funders, research projects and grants.
  • Size. The dataset consists of almost 1 billion triples (23.2 GB compressed, or 205.2 GB uncompressed). The RDF data is dereferenceable (Turtle, N-Triples, RDF/XML) and both HTTP and HTTPS protocols are supported.
  • Distribution. The SciGraph datasets are being distributed as RDF data in a set of 78 files (.tar.bz2), which include an N-Triples data file (.nt) and a LICENSE.txt file.
  • Multiple licenses. The majority of SciGraph data is being released under a Creative Commons Attribution (CC BY) 4.0 International License, with a small portion of the data (specifically abstracts and grants) separately licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 International License.
  • Schema mappings. To align the SciGraph ontology with other well-known vocabularies we include several mappings and have used extensively two external datasets: ANZSRC (Australian and New Zealand Standard Research Classification) Fields of Research codes, and GRID (Global Research Identifier Database) identifiers.
  • 2017 Q1


    First release. Includes five years of all Springer Nature journal articles (2012-2016) and the SciGraph ontology.

    Highlights:
  • Size. 155m triples (3.7 GB compressed, or 32 GB uncompressed)
  • License. All the data is available under a Creative Commons Attribution-NonCommercial (CC-BY-NC) 4.0 International License.
  • Ontology. The Springer Nature SciGraph Core Ontology (v1.0.0) is the main model encoding the semantics of the SN SciGraph 2017Q1 data release. It is implemented as an OWL 2 ontology, with a DL expressivity ALHIF(D), and consists of 45 classes and 206 properties.
  •