Optimal Random Sampling from Distributed Streams Revisited View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2011

AUTHORS

Srikanta Tirthapura , David P. Woodruff

ABSTRACT

We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites. More... »

PAGES

283-297

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-642-24100-0_27

DOI

http://dx.doi.org/10.1007/978-3-642-24100-0_27

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1030017397


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Dept. of ECE, Iowa State University, 50011, Ames, IA, USA", 
          "id": "http://www.grid.ac/institutes/grid.34421.30", 
          "name": [
            "Dept. of ECE, Iowa State University, 50011, Ames, IA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tirthapura", 
        "givenName": "Srikanta", 
        "id": "sg:person.014015211561.37", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Almaden Research Center, 95120, San Jose, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden Research Center, 95120, San Jose, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Woodruff", 
        "givenName": "David P.", 
        "id": "sg:person.012727410605.86", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2011", 
    "datePublishedReg": "2011-01-01", 
    "description": "We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.", 
    "editor": [
      {
        "familyName": "Peleg", 
        "givenName": "David", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-642-24100-0_27", 
    "isAccessibleForFree": true, 
    "isPartOf": {
      "isbn": [
        "978-3-642-24099-7", 
        "978-3-642-24100-0"
      ], 
      "name": "Distributed Computing", 
      "type": "Book"
    }, 
    "keywords": [
      "large data streams", 
      "improved algorithm", 
      "Distributed Streams", 
      "data streams", 
      "heavy hitters", 
      "central coordinator", 
      "uniform random sample", 
      "input elements", 
      "set of elements", 
      "prior work", 
      "algorithm", 
      "optimal number", 
      "constant factor", 
      "messages", 
      "set", 
      "hitters", 
      "streams", 
      "computation", 
      "coordinator", 
      "large probability", 
      "protocol", 
      "system", 
      "number", 
      "work", 
      "random sampling", 
      "elements", 
      "multiple sites", 
      "probability", 
      "point", 
      "time", 
      "total number", 
      "sampling", 
      "random sample", 
      "sites", 
      "factors", 
      "samples", 
      "byproducts"
    ], 
    "name": "Optimal Random Sampling from Distributed Streams Revisited", 
    "pagination": "283-297", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1030017397"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-642-24100-0_27"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-642-24100-0_27", 
      "https://app.dimensions.ai/details/publication/pub.1030017397"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-12-01T06:53", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/chapter/chapter_431.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-642-24100-0_27"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24100-0_27'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24100-0_27'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24100-0_27'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-642-24100-0_27'


 

This table displays all metadata directly associated to this object as RDF triples.

106 TRIPLES      22 PREDICATES      62 URIs      55 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-642-24100-0_27 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Ne47562f542294da8be0b1ef85dfe04ef
4 schema:datePublished 2011
5 schema:datePublishedReg 2011-01-01
6 schema:description We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.
7 schema:editor Na1488c93f54746f199b8796b69a5c332
8 schema:genre chapter
9 schema:isAccessibleForFree true
10 schema:isPartOf N42507bd0da044becb210af67a991a43e
11 schema:keywords Distributed Streams
12 algorithm
13 byproducts
14 central coordinator
15 computation
16 constant factor
17 coordinator
18 data streams
19 elements
20 factors
21 heavy hitters
22 hitters
23 improved algorithm
24 input elements
25 large data streams
26 large probability
27 messages
28 multiple sites
29 number
30 optimal number
31 point
32 prior work
33 probability
34 protocol
35 random sample
36 random sampling
37 samples
38 sampling
39 set
40 set of elements
41 sites
42 streams
43 system
44 time
45 total number
46 uniform random sample
47 work
48 schema:name Optimal Random Sampling from Distributed Streams Revisited
49 schema:pagination 283-297
50 schema:productId N05daa1e986fd4faca0a539c711857b6f
51 N7195f7cb487a48b481108141cae97035
52 schema:publisher N6b42d31ba64448d18eeb933fe6ff934a
53 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030017397
54 https://doi.org/10.1007/978-3-642-24100-0_27
55 schema:sdDatePublished 2022-12-01T06:53
56 schema:sdLicense https://scigraph.springernature.com/explorer/license/
57 schema:sdPublisher N551203ca86ec494e90525db59f38a1dd
58 schema:url https://doi.org/10.1007/978-3-642-24100-0_27
59 sgo:license sg:explorer/license/
60 sgo:sdDataset chapters
61 rdf:type schema:Chapter
62 N05daa1e986fd4faca0a539c711857b6f schema:name dimensions_id
63 schema:value pub.1030017397
64 rdf:type schema:PropertyValue
65 N2a841fe615924bf08c6fc24526fbb06c schema:familyName Peleg
66 schema:givenName David
67 rdf:type schema:Person
68 N42507bd0da044becb210af67a991a43e schema:isbn 978-3-642-24099-7
69 978-3-642-24100-0
70 schema:name Distributed Computing
71 rdf:type schema:Book
72 N551203ca86ec494e90525db59f38a1dd schema:name Springer Nature - SN SciGraph project
73 rdf:type schema:Organization
74 N6b42d31ba64448d18eeb933fe6ff934a schema:name Springer Nature
75 rdf:type schema:Organisation
76 N7195f7cb487a48b481108141cae97035 schema:name doi
77 schema:value 10.1007/978-3-642-24100-0_27
78 rdf:type schema:PropertyValue
79 N995ed3b3f1f34260829798fd074fc22e rdf:first sg:person.012727410605.86
80 rdf:rest rdf:nil
81 Na1488c93f54746f199b8796b69a5c332 rdf:first N2a841fe615924bf08c6fc24526fbb06c
82 rdf:rest rdf:nil
83 Ne47562f542294da8be0b1ef85dfe04ef rdf:first sg:person.014015211561.37
84 rdf:rest N995ed3b3f1f34260829798fd074fc22e
85 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
86 schema:name Information and Computing Sciences
87 rdf:type schema:DefinedTerm
88 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
89 schema:name Artificial Intelligence and Image Processing
90 rdf:type schema:DefinedTerm
91 sg:person.012727410605.86 schema:affiliation grid-institutes:grid.481551.c
92 schema:familyName Woodruff
93 schema:givenName David P.
94 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86
95 rdf:type schema:Person
96 sg:person.014015211561.37 schema:affiliation grid-institutes:grid.34421.30
97 schema:familyName Tirthapura
98 schema:givenName Srikanta
99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37
100 rdf:type schema:Person
101 grid-institutes:grid.34421.30 schema:alternateName Dept. of ECE, Iowa State University, 50011, Ames, IA, USA
102 schema:name Dept. of ECE, Iowa State University, 50011, Ames, IA, USA
103 rdf:type schema:Organization
104 grid-institutes:grid.481551.c schema:alternateName IBM Almaden Research Center, 95120, San Jose, CA, USA
105 schema:name IBM Almaden Research Center, 95120, San Jose, CA, USA
106 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...