On this page you can find information about the latest SciGraph data release from November 2017 distributed as a set of N-Triples files. (This release replaces the earlier February 2017 release. See the Changelog page for a list of changes to the data models.)

File organization

The datasets are primarily organized into 12 logical types (see the diagram below) and spread across ontologies, taxonomies, and instances, with two of these types (articles and book chapters) being partitioned into 17 year ranges. There is a secondary organization by license with the majority of the data being made available as CC BY (organized as 9 full datasets plus 34 partial datasets by year range), and a smaller portion of the data being made available as CC BY-NC (organized as 1 full dataset plus 34 partial datasets by year range).

This results in 78 separate physical datasets. Each dataset is provided as a .tar file which includes a .nt file with N-Triples as data file, and a LICENSE.txt file as license file. The .tar files are compressed using bzip2.

Datasets

The file datasets/download-urls.txt lists all the dataset download URLs for easy processing.

Dataset
 
Year
Range
Things*
 
Triples
(CC BY)
File size
(CC BY)
Links
(CC BY)
Triples
(CC BY-NC)
File size
(CC BY-NC)
Links
(CC BY-NC)
Ontologies
Ontologies 47 1,731 19K (299K) .tar.bz2 , .ttl
 
Taxonomies
Data Repositories 112 1,168 18K (210K) .tar.bz2 , .ttl
Product Market Codes 1465 13,193 77K (2.2M) .tar.bz2 , .ttl
Subjects 2685 32,008 367K (5.6M) .tar.bz2 , .ttl
Technical Article Types 19 152 6.9K (28K) .tar.bz2 , .ttl
 
Instances
Articles 2017 259,640 39,875,389 789M (7.4G) .tar.bz2 227,758 81M (321M) .tar.bz2
2016 349,877 55,620,350 1.1G (11G) .tar.bz2 313,729 110M (440M) .tar.bz2
2015 334,089 50,501,417 994M (9.4G) .tar.bz2 288,949 101M (404M) .tar.bz2
2014 358,047 50,030,226 985M (9.3G) .tar.bz2 300,979 105M (412M) .tar.bz2
2013 511,182 48,186,534 937M (8.9G) .tar.bz2 273,033 92M (363M) .tar.bz2
2012 308,147 42,139,349 824M (7.8G) .tar.bz2 263,570 94M (373M) .tar.bz2
2011 260,805 34,740,800 679M (6.5G) .tar.bz2 228,616 76M (298M) .tar.bz2
2009-2010 464,686 55,458,690 1.1G (11G) .tar.bz2 405,303 131M (514M) .tar.bz2
2006-2008 629,018 66,966,590 1.3G (13G) .tar.bz2 541,628 167M (665M) .tar.bz2
2001-2005 704,314 68,239,864 1.4G (13G) .tar.bz2 587,574 179M (696M) .tar.bz2
1996-2000 527,498 48,104,113 950M (9.0G) .tar.bz2 429,007 127M (495M) .tar.bz2
1991-1995 402,206 36,089,716 698M (6.7G) .tar.bz2 318,068 87M (346M) .tar.bz2
1981-1990 608,755 48,817,763 943M (9.1G) .tar.bz2 455,960 117M (466M) .tar.bz2
1971-1980 448,853 30,396,043 586M (5.7G) .tar.bz2 329,229 78M (317M) .tar.bz2
1951-1970 445,814 23,672,213 454M (4.4G) .tar.bz2 261,914 61M (253M) .tar.bz2
1901-1950 400,050 16,317,299 316M (3.0G) .tar.bz2 152,588 38M (150M) .tar.bz2
1801-1900 221,639 6,043,416 114M (1.1G) .tar.bz2 38,724 11M (41M) .tar.bz2
Books 240,396 7,974,731 172M (1.6G) .tar.bz2
Book Chapters 2017 169,500 9,916,607 209M (1.9G) .tar.bz2 164,491 41M (166M) .tar.bz2
2016 200,502 11,822,553 249M (2.2G) .tar.bz2 192,820 48M (195M) .tar.bz2
2015 192,789 10,715,212 229M (2.0G) .tar.bz2 185,282 47M (188M) .tar.bz2
2014 189,569 10,562,713 224M (2.0G) .tar.bz2 179,595 44M (179M) .tar.bz2
2013 173,440 9,744,338 206M (1.8G) .tar.bz2 169,155 40M (163M) .tar.bz2
2012 147,443 8,561,880 180M (1.6G) .tar.bz2 144,180 35M (139M) .tar.bz2
2011 141,508 8,067,536 169M (1.5G) .tar.bz2 139,629 34M (136M) .tar.bz2
2009-2010 300,936 16,522,736 342M (3.1G) .tar.bz2 286,473 64M (273M) .tar.bz2
2006-2008 384,332 19,773,720 411M (3.6G) .tar.bz2 309,066 73M (301M) .tar.bz2
2001-2005 437,011 23,150,356 480M (4.2G) .tar.bz2 383,663 92M (364M) .tar.bz2
1996-2000 378,907 19,833,601 411M (3.6G) .tar.bz2 363,878 88M (342M) .tar.bz2
1991-1995 337,594 17,894,179 371M (3.3G) .tar.bz2 323,407 75M (291M) .tar.bz2
1981-1990 496,231 25,201,901 521M (4.6G) .tar.bz2 466,938 105M (408M) .tar.bz2
1971-1980 206,727 9,816,984 202M (1.8G) .tar.bz2 185,462 41M (157M) .tar.bz2
1951-1970 115,305 4,955,368 102M (913M) .tar.bz2 107,212 24M (94M) .tar.bz2
1901-1950 102,027 3,965,991 81M (727M) .tar.bz2 94,592 21M (85M) .tar.bz2
1801-1900 13,659 486,595 9.8M (90M) .tar.bz2 11,381 2.4M (9.6M) .tar.bz2
Conferences 9,825 19,650 425K (3.8M) .tar.bz2
Grants 93,497 2,082,407 84M (633M) .tar.bz2
Index Checks 6,497,545 38,985,270 871M (7.7G) .tar.bz2
Journals 5,035 173,072 3.5M (33M) .tar.bz2
 
Totals 979,393,017 20.5G (194G) 11,206,260 2.7G (11.2G)
 
Ancillary
Mappings 4,886 mappings/
Shapes 1,200 shapes/
 
External
ANZSRC-FOR taxonomy
GRID ontology
GRID database

* Things refers to the primary type for each dataset.

Datasets at a glance


Public Release 20177-11-07

Ontologies

This set of datasets provides OWL ontologies. See the Ontology page for further details.

Taxonomies

This set of datasets provides SKOS taxonomies. See the Taxonomies page for further details.

Instances

This set of datasets provides instance level data.

  • Articles

    This dataset provides bibliographic metadata for the complete set of Springer Nature journal articles. Included also are abstracts, information about authors and affiliations (as strings). Within this dataset there are links to the SciGraph subjects dataset and to ANZSRC Field of Research codes. Furthermore in some cases affiliation strings have been disambiguated with GRID identifiers.

  • Books

    This dataset provides metadata about the complete set of Springer Nature books.

  • Book Chapters

    This dataset provides bibliographic metadata for the complete set of Springer Nature book chapters]. Included also are abstracts, information about authors and affiliations (as strings). Within this dataset there are links to ANZSRC Field of Research codes. Furthermore in some cases affiliation strings have been disambiguated with GRID identifiers.

  • Conferences

    This dataset provides metadata about scholarly conferences, i.e. formal meetings of people with a shared interest, typically those that take place over several days.

  • Grants

    This dataset provides metadata about research grants (including abstracts) linked to Springer Nature articles. Research grants are categorized using ANZSRC Field of Research codes. Funding and recipient organizations are also marked with GRID identifiers.

  • Index Checks

    This dataset provides metadata about events which annotate a publication with a public indexing database check.

  • Journals

    This dataset provides metadata about the complete set of Springer Nature journals. Note that our model, similarly to BIBFRAME, maintains a fundamental distinction between product level journal info (as expressed by ISSNs) and journal brand level (the more abstract work).

Ancillary

This set of datasets provides ancillary information hosted on GitHub and is not a part of the distribution proper.

  • Mappings

    The mappings/ folder contains an initial set of semantic links to other ontologies and datasets. We are continuously improving it, but feel free to submit a pull request if you think you could help!

  • Shapes

    The shapes/ folder contains examples of the SHACL shapes we used for data extraction. These shapes do not currently include restrictions for validation but we are hoping to add these at a later date.