True Randomness from Big Data View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2016-09-26

AUTHORS

Periklis A. Papakonstantinou, David P. Woodruff, Guang Yang

ABSTRACT

Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests. More... »

PAGES

33740

Identifiers

URI

http://scigraph.springernature.com/pub.10.1038/srep33740

DOI

http://dx.doi.org/10.1038/srep33740

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1025956327

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/27666514


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Rutgers University, MSIS, 08853, Piscataway, NJ, USA", 
          "id": "http://www.grid.ac/institutes/grid.430387.b", 
          "name": [
            "Rutgers University, MSIS, 08853, Piscataway, NJ, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Papakonstantinou", 
        "givenName": "Periklis A.", 
        "id": "sg:person.014524643545.01", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014524643545.01"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Research Almaden, 95120, San Jose, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Research Almaden, 95120, San Jose, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Woodruff", 
        "givenName": "David P.", 
        "id": "sg:person.012727410605.86", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute for Computing Technology, 100190, CAS, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.424936.e", 
          "name": [
            "Institute for Computing Technology, 100190, CAS, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yang", 
        "givenName": "Guang", 
        "id": "sg:person.014126542041.49", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014126542041.49"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/bf01940870", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041202529", 
          "https://doi.org/10.1007/bf01940870"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-22012-8_2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003761671", 
          "https://doi.org/10.1007/978-3-642-22012-8_2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/a:1020281327116", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028452745", 
          "https://doi.org/10.1023/a:1020281327116"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/srep01627", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1006347763", 
          "https://doi.org/10.1038/srep01627"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/srep05490", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039853207", 
          "https://doi.org/10.1038/srep05490"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature09008", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004387819", 
          "https://doi.org/10.1038/nature09008"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-03163-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010701432", 
          "https://doi.org/10.1007/978-3-642-03163-2"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2016-09-26", 
    "datePublishedReg": "2016-09-26", 
    "description": "Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.", 
    "genre": "article", 
    "id": "sg:pub.10.1038/srep33740", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.8292170", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.8299724", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.8296787", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1045337", 
        "issn": [
          "2045-2322"
        ], 
        "name": "Scientific Reports", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "6"
      }
    ], 
    "keywords": [
      "random bits", 
      "data sets", 
      "scientific data sets", 
      "massive data sets", 
      "high-quality random bits", 
      "uniform random bits", 
      "search logs", 
      "real data sets", 
      "sensor networks", 
      "big data", 
      "random variables", 
      "statistical assumptions", 
      "social networks", 
      "true randomness", 
      "previous approaches", 
      "system simulation", 
      "large enough source", 
      "least matches", 
      "difficult task", 
      "biggest source", 
      "such data", 
      "extractor constructions", 
      "bits", 
      "general method", 
      "sampling process", 
      "network", 
      "uncertain events", 
      "set", 
      "cryptography", 
      "gigabytes", 
      "extractor", 
      "task", 
      "randomness", 
      "enough sources", 
      "experimental findings", 
      "data", 
      "method", 
      "logs", 
      "simulations", 
      "assumption", 
      "applications", 
      "match", 
      "variables", 
      "approach", 
      "source", 
      "view", 
      "generation", 
      "construction", 
      "process", 
      "form", 
      "contribution", 
      "size", 
      "literature", 
      "genomics", 
      "expander", 
      "events", 
      "samples", 
      "test", 
      "individuals", 
      "study", 
      "standardized tests", 
      "outcomes", 
      "findings"
    ], 
    "name": "True Randomness from Big Data", 
    "pagination": "33740", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1025956327"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1038/srep33740"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "27666514"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1038/srep33740", 
      "https://app.dimensions.ai/details/publication/pub.1025956327"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-11-24T20:59", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221124/entities/gbq_results/article/article_683.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1038/srep33740"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/srep33740'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/srep33740'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/srep33740'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/srep33740'


 

This table displays all metadata directly associated to this object as RDF triples.

185 TRIPLES      21 PREDICATES      97 URIs      80 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1038/srep33740 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 anzsrc-for:08
4 anzsrc-for:0806
5 schema:author N37600505a74c499a82ebcb41a084ed0a
6 schema:citation sg:pub.10.1007/978-3-642-03163-2
7 sg:pub.10.1007/978-3-642-22012-8_2
8 sg:pub.10.1007/bf01940870
9 sg:pub.10.1023/a:1020281327116
10 sg:pub.10.1038/nature09008
11 sg:pub.10.1038/srep01627
12 sg:pub.10.1038/srep05490
13 schema:datePublished 2016-09-26
14 schema:datePublishedReg 2016-09-26
15 schema:description Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.
16 schema:genre article
17 schema:isAccessibleForFree true
18 schema:isPartOf N48d1450189704fdbb4a68af3e10a7a5a
19 N83f852d7c13a47028fa33063181b948c
20 sg:journal.1045337
21 schema:keywords applications
22 approach
23 assumption
24 big data
25 biggest source
26 bits
27 construction
28 contribution
29 cryptography
30 data
31 data sets
32 difficult task
33 enough sources
34 events
35 expander
36 experimental findings
37 extractor
38 extractor constructions
39 findings
40 form
41 general method
42 generation
43 genomics
44 gigabytes
45 high-quality random bits
46 individuals
47 large enough source
48 least matches
49 literature
50 logs
51 massive data sets
52 match
53 method
54 network
55 outcomes
56 previous approaches
57 process
58 random bits
59 random variables
60 randomness
61 real data sets
62 samples
63 sampling process
64 scientific data sets
65 search logs
66 sensor networks
67 set
68 simulations
69 size
70 social networks
71 source
72 standardized tests
73 statistical assumptions
74 study
75 such data
76 system simulation
77 task
78 test
79 true randomness
80 uncertain events
81 uniform random bits
82 variables
83 view
84 schema:name True Randomness from Big Data
85 schema:pagination 33740
86 schema:productId N275fdcd2ef9548e5b0529dbdbd64e9b2
87 N3f03bf0c0fb0461eb9ad25a391ab7420
88 N6a308e172b7b4c1589aefff8b4d46d23
89 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025956327
90 https://doi.org/10.1038/srep33740
91 schema:sdDatePublished 2022-11-24T20:59
92 schema:sdLicense https://scigraph.springernature.com/explorer/license/
93 schema:sdPublisher Nb64be032e2234486b087bf19eb161ab4
94 schema:url https://doi.org/10.1038/srep33740
95 sgo:license sg:explorer/license/
96 sgo:sdDataset articles
97 rdf:type schema:ScholarlyArticle
98 N275fdcd2ef9548e5b0529dbdbd64e9b2 schema:name pubmed_id
99 schema:value 27666514
100 rdf:type schema:PropertyValue
101 N37600505a74c499a82ebcb41a084ed0a rdf:first sg:person.014524643545.01
102 rdf:rest Nf6bf6972dc8a4e3b914cf2e178b22b88
103 N3f03bf0c0fb0461eb9ad25a391ab7420 schema:name doi
104 schema:value 10.1038/srep33740
105 rdf:type schema:PropertyValue
106 N48d1450189704fdbb4a68af3e10a7a5a schema:volumeNumber 6
107 rdf:type schema:PublicationVolume
108 N6a308e172b7b4c1589aefff8b4d46d23 schema:name dimensions_id
109 schema:value pub.1025956327
110 rdf:type schema:PropertyValue
111 N83f852d7c13a47028fa33063181b948c schema:issueNumber 1
112 rdf:type schema:PublicationIssue
113 Nb4482597ae4d410f825b013c4861d51b rdf:first sg:person.014126542041.49
114 rdf:rest rdf:nil
115 Nb64be032e2234486b087bf19eb161ab4 schema:name Springer Nature - SN SciGraph project
116 rdf:type schema:Organization
117 Nf6bf6972dc8a4e3b914cf2e178b22b88 rdf:first sg:person.012727410605.86
118 rdf:rest Nb4482597ae4d410f825b013c4861d51b
119 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
120 schema:name Mathematical Sciences
121 rdf:type schema:DefinedTerm
122 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
123 schema:name Statistics
124 rdf:type schema:DefinedTerm
125 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
126 schema:name Information and Computing Sciences
127 rdf:type schema:DefinedTerm
128 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
129 schema:name Information Systems
130 rdf:type schema:DefinedTerm
131 sg:grant.8292170 http://pending.schema.org/fundedItem sg:pub.10.1038/srep33740
132 rdf:type schema:MonetaryGrant
133 sg:grant.8296787 http://pending.schema.org/fundedItem sg:pub.10.1038/srep33740
134 rdf:type schema:MonetaryGrant
135 sg:grant.8299724 http://pending.schema.org/fundedItem sg:pub.10.1038/srep33740
136 rdf:type schema:MonetaryGrant
137 sg:journal.1045337 schema:issn 2045-2322
138 schema:name Scientific Reports
139 schema:publisher Springer Nature
140 rdf:type schema:Periodical
141 sg:person.012727410605.86 schema:affiliation grid-institutes:grid.481551.c
142 schema:familyName Woodruff
143 schema:givenName David P.
144 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86
145 rdf:type schema:Person
146 sg:person.014126542041.49 schema:affiliation grid-institutes:grid.424936.e
147 schema:familyName Yang
148 schema:givenName Guang
149 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014126542041.49
150 rdf:type schema:Person
151 sg:person.014524643545.01 schema:affiliation grid-institutes:grid.430387.b
152 schema:familyName Papakonstantinou
153 schema:givenName Periklis A.
154 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014524643545.01
155 rdf:type schema:Person
156 sg:pub.10.1007/978-3-642-03163-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010701432
157 https://doi.org/10.1007/978-3-642-03163-2
158 rdf:type schema:CreativeWork
159 sg:pub.10.1007/978-3-642-22012-8_2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003761671
160 https://doi.org/10.1007/978-3-642-22012-8_2
161 rdf:type schema:CreativeWork
162 sg:pub.10.1007/bf01940870 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041202529
163 https://doi.org/10.1007/bf01940870
164 rdf:type schema:CreativeWork
165 sg:pub.10.1023/a:1020281327116 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028452745
166 https://doi.org/10.1023/a:1020281327116
167 rdf:type schema:CreativeWork
168 sg:pub.10.1038/nature09008 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004387819
169 https://doi.org/10.1038/nature09008
170 rdf:type schema:CreativeWork
171 sg:pub.10.1038/srep01627 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006347763
172 https://doi.org/10.1038/srep01627
173 rdf:type schema:CreativeWork
174 sg:pub.10.1038/srep05490 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039853207
175 https://doi.org/10.1038/srep05490
176 rdf:type schema:CreativeWork
177 grid-institutes:grid.424936.e schema:alternateName Institute for Computing Technology, 100190, CAS, Beijing, China
178 schema:name Institute for Computing Technology, 100190, CAS, Beijing, China
179 rdf:type schema:Organization
180 grid-institutes:grid.430387.b schema:alternateName Rutgers University, MSIS, 08853, Piscataway, NJ, USA
181 schema:name Rutgers University, MSIS, 08853, Piscataway, NJ, USA
182 rdf:type schema:Organization
183 grid-institutes:grid.481551.c schema:alternateName IBM Research Almaden, 95120, San Jose, CA, USA
184 schema:name IBM Research Almaden, 95120, San Jose, CA, USA
185 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...