Space-Efficient Estimation of Statistics Over Sub-Sampled Streams View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2015-02-18

AUTHORS

Andrew McGregor, A. Pavan, Srikanta Tirthapura, David P. Woodruff

ABSTRACT

In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sub-sample the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, in many cases, the estimation of aggregates on the original stream cannot be accomplished through simply estimating them on the sampled stream, followed by a normalization. We present algorithms for estimating frequency moments, support size, entropy, and heavy hitters of the original stream, through a single pass over the sampled stream. More... »

PAGES

787-811

References to SciGraph publications

  • 2009. Stream Sampling in ENCYCLOPEDIA OF DATABASE SYSTEMS
  • 2011. Optimal Random Sampling from Distributed Streams Revisited in DISTRIBUTED COMPUTING
  • 2009. Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams in AUTOMATA, LANGUAGES AND PROGRAMMING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s00453-015-9974-0

    DOI

    http://dx.doi.org/10.1007/s00453-015-9974-0

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1026929475


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Statistics", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Massachusetts, Amherst, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.266683.f", 
              "name": [
                "University of Massachusetts, Amherst, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "McGregor", 
            "givenName": "Andrew", 
            "id": "sg:person.0764263144.94", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0764263144.94"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Iowa State University, Ames, IA, USA", 
              "id": "http://www.grid.ac/institutes/grid.34421.30", 
              "name": [
                "Iowa State University, Ames, IA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Pavan", 
            "givenName": "A.", 
            "id": "sg:person.012331372077.00", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012331372077.00"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Iowa State University, Ames, IA, USA", 
              "id": "http://www.grid.ac/institutes/grid.34421.30", 
              "name": [
                "Iowa State University, Ames, IA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Tirthapura", 
            "givenName": "Srikanta", 
            "id": "sg:person.014015211561.37", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "IBM Almaden, San Jose, CA, USA", 
              "id": "http://www.grid.ac/institutes/grid.481551.c", 
              "name": [
                "IBM Almaden, San Jose, CA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Woodruff", 
            "givenName": "David P.", 
            "id": "sg:person.012727410605.86", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/978-3-642-02927-1_43", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008585028", 
              "https://doi.org/10.1007/978-3-642-02927-1_43"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-24100-0_27", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1030017397", 
              "https://doi.org/10.1007/978-3-642-24100-0_27"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-0-387-39940-9_372", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1001459513", 
              "https://doi.org/10.1007/978-0-387-39940-9_372"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2015-02-18", 
        "datePublishedReg": "2015-02-18", 
        "description": "In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sub-sample the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, in many cases, the estimation of aggregates on the original stream cannot be accomplished through simply estimating them on the sampled stream, followed by a normalization. We present algorithms for estimating frequency moments, support size, entropy, and heavy hitters of the original stream, through a single pass over the sampled stream.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s00453-015-9974-0", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.3109340", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1047644", 
            "issn": [
              "0178-4617", 
              "1432-0541"
            ], 
            "name": "Algorithmica", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "2", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "74"
          }
        ], 
        "keywords": [
          "data arrival rate", 
          "frequency moments", 
          "support size", 
          "arrival rate", 
          "common solution", 
          "heavy hitters", 
          "estimation", 
          "estimation of aggregates", 
          "statistics", 
          "algorithm", 
          "moment", 
          "original stream", 
          "solution", 
          "data streams", 
          "hitters", 
          "properties", 
          "single pass", 
          "streams", 
          "cases", 
          "elements", 
          "situation", 
          "size", 
          "normalization", 
          "pass", 
          "samples", 
          "sub", 
          "aggregates", 
          "rate"
        ], 
        "name": "Space-Efficient Estimation of Statistics Over Sub-Sampled Streams", 
        "pagination": "787-811", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1026929475"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s00453-015-9974-0"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s00453-015-9974-0", 
          "https://app.dimensions.ai/details/publication/pub.1026929475"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-12-01T06:33", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_678.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s00453-015-9974-0"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s00453-015-9974-0'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s00453-015-9974-0'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s00453-015-9974-0'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s00453-015-9974-0'


     

    This table displays all metadata directly associated to this object as RDF triples.

    126 TRIPLES      21 PREDICATES      55 URIs      44 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s00453-015-9974-0 schema:about anzsrc-for:01
    2 anzsrc-for:0104
    3 schema:author N47c717a5d95a40f2926562cb796fb2e7
    4 schema:citation sg:pub.10.1007/978-0-387-39940-9_372
    5 sg:pub.10.1007/978-3-642-02927-1_43
    6 sg:pub.10.1007/978-3-642-24100-0_27
    7 schema:datePublished 2015-02-18
    8 schema:datePublishedReg 2015-02-18
    9 schema:description In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sub-sample the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, in many cases, the estimation of aggregates on the original stream cannot be accomplished through simply estimating them on the sampled stream, followed by a normalization. We present algorithms for estimating frequency moments, support size, entropy, and heavy hitters of the original stream, through a single pass over the sampled stream.
    10 schema:genre article
    11 schema:isAccessibleForFree true
    12 schema:isPartOf N01f05a244fe54386bfbe9b41798b73c1
    13 N7c8c9dc21f2b48f3bcbf54f5cc9864b0
    14 sg:journal.1047644
    15 schema:keywords aggregates
    16 algorithm
    17 arrival rate
    18 cases
    19 common solution
    20 data arrival rate
    21 data streams
    22 elements
    23 estimation
    24 estimation of aggregates
    25 frequency moments
    26 heavy hitters
    27 hitters
    28 moment
    29 normalization
    30 original stream
    31 pass
    32 properties
    33 rate
    34 samples
    35 single pass
    36 situation
    37 size
    38 solution
    39 statistics
    40 streams
    41 sub
    42 support size
    43 schema:name Space-Efficient Estimation of Statistics Over Sub-Sampled Streams
    44 schema:pagination 787-811
    45 schema:productId Na98c29daf3514b9a8a8f5d151190b0ae
    46 Naaa182ee73bb49eea69f337b996716c4
    47 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026929475
    48 https://doi.org/10.1007/s00453-015-9974-0
    49 schema:sdDatePublished 2022-12-01T06:33
    50 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    51 schema:sdPublisher Nbd8328437798403abb3582f554d7edda
    52 schema:url https://doi.org/10.1007/s00453-015-9974-0
    53 sgo:license sg:explorer/license/
    54 sgo:sdDataset articles
    55 rdf:type schema:ScholarlyArticle
    56 N01f05a244fe54386bfbe9b41798b73c1 schema:volumeNumber 74
    57 rdf:type schema:PublicationVolume
    58 N156072997aa643448299fa6dd213c024 rdf:first sg:person.012727410605.86
    59 rdf:rest rdf:nil
    60 N47c717a5d95a40f2926562cb796fb2e7 rdf:first sg:person.0764263144.94
    61 rdf:rest N94fef826bda544e4a3472302652df8dd
    62 N7c8c9dc21f2b48f3bcbf54f5cc9864b0 schema:issueNumber 2
    63 rdf:type schema:PublicationIssue
    64 N84eb4f3b5c6d4457af338bea05d6c632 rdf:first sg:person.014015211561.37
    65 rdf:rest N156072997aa643448299fa6dd213c024
    66 N94fef826bda544e4a3472302652df8dd rdf:first sg:person.012331372077.00
    67 rdf:rest N84eb4f3b5c6d4457af338bea05d6c632
    68 Na98c29daf3514b9a8a8f5d151190b0ae schema:name doi
    69 schema:value 10.1007/s00453-015-9974-0
    70 rdf:type schema:PropertyValue
    71 Naaa182ee73bb49eea69f337b996716c4 schema:name dimensions_id
    72 schema:value pub.1026929475
    73 rdf:type schema:PropertyValue
    74 Nbd8328437798403abb3582f554d7edda schema:name Springer Nature - SN SciGraph project
    75 rdf:type schema:Organization
    76 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    77 schema:name Mathematical Sciences
    78 rdf:type schema:DefinedTerm
    79 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
    80 schema:name Statistics
    81 rdf:type schema:DefinedTerm
    82 sg:grant.3109340 http://pending.schema.org/fundedItem sg:pub.10.1007/s00453-015-9974-0
    83 rdf:type schema:MonetaryGrant
    84 sg:journal.1047644 schema:issn 0178-4617
    85 1432-0541
    86 schema:name Algorithmica
    87 schema:publisher Springer Nature
    88 rdf:type schema:Periodical
    89 sg:person.012331372077.00 schema:affiliation grid-institutes:grid.34421.30
    90 schema:familyName Pavan
    91 schema:givenName A.
    92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012331372077.00
    93 rdf:type schema:Person
    94 sg:person.012727410605.86 schema:affiliation grid-institutes:grid.481551.c
    95 schema:familyName Woodruff
    96 schema:givenName David P.
    97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86
    98 rdf:type schema:Person
    99 sg:person.014015211561.37 schema:affiliation grid-institutes:grid.34421.30
    100 schema:familyName Tirthapura
    101 schema:givenName Srikanta
    102 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37
    103 rdf:type schema:Person
    104 sg:person.0764263144.94 schema:affiliation grid-institutes:grid.266683.f
    105 schema:familyName McGregor
    106 schema:givenName Andrew
    107 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0764263144.94
    108 rdf:type schema:Person
    109 sg:pub.10.1007/978-0-387-39940-9_372 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001459513
    110 https://doi.org/10.1007/978-0-387-39940-9_372
    111 rdf:type schema:CreativeWork
    112 sg:pub.10.1007/978-3-642-02927-1_43 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008585028
    113 https://doi.org/10.1007/978-3-642-02927-1_43
    114 rdf:type schema:CreativeWork
    115 sg:pub.10.1007/978-3-642-24100-0_27 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030017397
    116 https://doi.org/10.1007/978-3-642-24100-0_27
    117 rdf:type schema:CreativeWork
    118 grid-institutes:grid.266683.f schema:alternateName University of Massachusetts, Amherst, MA, USA
    119 schema:name University of Massachusetts, Amherst, MA, USA
    120 rdf:type schema:Organization
    121 grid-institutes:grid.34421.30 schema:alternateName Iowa State University, Ames, IA, USA
    122 schema:name Iowa State University, Ames, IA, USA
    123 rdf:type schema:Organization
    124 grid-institutes:grid.481551.c schema:alternateName IBM Almaden, San Jose, CA, USA
    125 schema:name IBM Almaden, San Jose, CA, USA
    126 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...