This section documents changes to the SciGraph datasets from 2017 onwards.
2019 Q1 (Current release)
Third release. This release includes a complete refactoring of SciGraph data that, following up on users feedback, aims at making the model more intuitive and the data easier to work with.
Data about Clinical Trials and Patents connected to Springer Nature publications have been added. This data is sourced from Dimensions.ai.
Schema.org is now the main model used to represent SciGraph data.
Publications data now include references too (= outgoing citations).
URIs for SciGraph objects have been dramatically simplified, reusing common identifiers whenever possible. In particular all articles and chapters use the URI format prefix ('pub.') + DOI (eg
JSON-LD is now the primary serialization format used by SN SciGraph.
Data dumps are now managed externally on FigShare and are referenceable via DOIs.
New publications data is released on a daily basis. All the other datasets are refreshed on a monthly basis.
Data Migration: the legacy lod.springer.com platform has been retired and its contents have been migrated into SciGraph.
Datasets. Information about conferences, conference proceedings and conference series previously hosted on lod.springer.com is now included in SN SciGraph. Pre-existing lod.springer.com URIs are being redirected automatically to the SciGraph domain.
Second release, including bibliographic metadata for the complete archive of Springer Nature publications.
Datasets. These datasets comprise our SciGraph ontology, SKOS taxonomies and instance data covering the complete archive of Springer Nature publications, i.e. books and journals (1801-2017), conferences, affiliations, funders, research projects and grants.
Size. The dataset consists of almost 1 billion triples (23.2 GB compressed, or 205.2 GB uncompressed). The RDF data is dereferenceable (Turtle, N-Triples, RDF/XML) and both HTTP and HTTPS protocols are supported.
Distribution. The SciGraph datasets are being distributed as RDF data in a set of 78 files (.tar.bz2), which include an N-Triples data file (.nt) and a LICENSE.txt file.
Multiple licenses. The majority of SciGraph data is being released under a Creative Commons Attribution (CC BY) 4.0 International License, with a small portion of the data (specifically abstracts and grants) separately licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 International License.
Schema mappings. To align the SciGraph ontology with other well-known vocabularies we include several mappings and have used extensively two external datasets: ANZSRC (Australian and New Zealand Standard Research Classification) Fields of Research codes, and GRID (Global Research Identifier Database) identifiers.
First release. Includes five years of all Springer Nature journal articles (2012-2016) and the SciGraph ontology.
Size. 155m triples (3.7 GB compressed, or 32 GB uncompressed)
License. All the data is available under a Creative Commons Attribution-NonCommercial (CC-BY-NC) 4.0 International License.
Ontology. The Springer Nature SciGraph Core Ontology (v1.0.0) is the main model encoding the semantics of the SN SciGraph 2017Q1 data release. It is implemented as an OWL 2 ontology, with a DL expressivity ALHIF(D), and consists of 45 classes and 206 properties.