This section documents changes to the SN SciGraph datasets from 2017 onwards.
2021 Q4 (Current release)
Fourth release. This release improves the general quality and reliability of the SciGraph datasets. All datasets have been updated so to reflect the latest Springer Nature and Dimensions source data, in particular by using the Dimensions on Google BigQuery platform to improve data quality.
All datasets are refreshed on a monthly basis, during the first week of the calendar month.
The Datasets at a glance page has been updated so to provide more detailed statistics.
SN SciGraph has been integrated with the Springer Nature API Portal.
The RDF/JSONLD data is now accessible via the Springer Nature REST APIs by using `jsonld` in path, e.g.,
Third release. This release includes a complete refactoring of SciGraph data that, following up on users feedback, aims at making the model more intuitive and the data easier to work with.
Data about Clinical Trials and Patents connected to Springer Nature publications have been added. This data is sourced from Dimensions.ai.
Schema.org is now the main model used to represent SciGraph data.
Publications data now include references too (= outgoing citations).
URIs for SciGraph objects have been dramatically simplified, reusing common identifiers whenever possible. In particular all articles and chapters use the URI format prefix ('pub.') + DOI (eg
JSON-LD is now the primary serialization format used by SN SciGraph.
Data dumps are now managed externally on FigShare and are referenceable via DOIs.
New publications data is released on a daily basis. All the other datasets are refreshed on a monthly basis.
Data Migration: the legacy lod.springer.com platform has been retired and its contents have been migrated into SciGraph.
Datasets. Information about conferences, conference proceedings and conference series previously hosted on lod.springer.com is now included in SN SciGraph. Pre-existing lod.springer.com URIs are being redirected automatically to the SciGraph domain.
Second release, including bibliographic metadata for the complete archive of Springer Nature publications.
Datasets. These datasets comprise our SciGraph ontology, SKOS taxonomies and instance data covering the complete archive of Springer Nature publications, i.e. books and journals (1801-2017), conferences, affiliations, funders, research projects and grants.
Size. The dataset consists of almost 1 billion triples (23.2 GB compressed, or 205.2 GB uncompressed). The RDF data is dereferenceable (Turtle, N-Triples, RDF/XML) and both HTTP and HTTPS protocols are supported.
Distribution. The SciGraph datasets are being distributed as RDF data in a set of 78 files (.tar.bz2), which include an N-Triples data file (.nt) and a LICENSE.txt file.
Multiple licenses. The majority of SciGraph data is being released under a Creative Commons Attribution (CC BY) 4.0 International License, with a small portion of the data (specifically abstracts and grants) separately licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 International License.
Schema mappings. To align the SciGraph ontology with other well-known vocabularies we include several mappings and have used extensively two external datasets: ANZSRC (Australian and New Zealand Standard Research Classification) Fields of Research codes, and GRID (Global Research Identifier Database) identifiers.
First release. Includes five years of all Springer Nature journal articles (2012-2016) and the SciGraph ontology.
Size. 155m triples (3.7 GB compressed, or 32 GB uncompressed)
License. All the data is available under a Creative Commons Attribution-NonCommercial (CC-BY-NC) 4.0 International License.
Ontology. The Springer Nature SciGraph Core Ontology (v1.0.0) is the main model encoding the semantics of the SN SciGraph 2017Q1 data release. It is implemented as an OWL 2 ontology, with a DL expressivity ALHIF(D), and consists of 45 classes and 206 properties.