A General Method for Estimating Correlated Aggregates Over a Data Stream View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2014-08-06

AUTHORS

Srikanta Tirthapura, David P. Woodruff

ABSTRACT

On a stream S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\fancyscript{S}}$$\end{document} of two dimensional data items (x,y)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x,y)$$\end{document} where x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} is an item identifier and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document} is a numerical attribute, a correlated aggregate query C(σ,AGG,S)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(\sigma ,AGG,{\fancyscript{S}})$$\end{document} asks to first apply a selection predicate σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} along the y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document} dimension, followed by an aggregation AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} along the x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} dimension. For selection predicates of the form (yc)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y > c)$$\end{document}, where parameter c\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c$$\end{document} is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} to the streaming computation of AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes. More... »

PAGES

235-260

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s00453-014-9917-1

DOI

http://dx.doi.org/10.1007/s00453-014-9917-1

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1007957239


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Iowa State University, Ames, IA, USA", 
          "id": "http://www.grid.ac/institutes/grid.34421.30", 
          "name": [
            "Iowa State University, Ames, IA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tirthapura", 
        "givenName": "Srikanta", 
        "id": "sg:person.014015211561.37", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "IBM Almaden, San Jose, CA, USA", 
          "id": "http://www.grid.ac/institutes/grid.481551.c", 
          "name": [
            "IBM Almaden, San Jose, CA, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Woodruff", 
        "givenName": "David P.", 
        "id": "sg:person.012727410605.86", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/s00224-004-1156-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018926556", 
          "https://doi.org/10.1007/s00224-004-1156-4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-38768-5_56", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023298832", 
          "https://doi.org/10.1007/978-3-642-38768-5_56"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-12450-1_5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035022927", 
          "https://doi.org/10.1007/978-3-642-12450-1_5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-70918-3_40", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037637468", 
          "https://doi.org/10.1007/978-3-540-70918-3_40"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s00446-007-0048-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021056418", 
          "https://doi.org/10.1007/s00446-007-0048-7"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2014-08-06", 
    "datePublishedReg": "2014-08-06", 
    "description": "On a stream S\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$${\\fancyscript{S}}$$\\end{document} of two dimensional data items (x,y)\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$(x,y)$$\\end{document} where x\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$x$$\\end{document} is an item identifier and y\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$y$$\\end{document} is a numerical attribute, a correlated aggregate query C(\u03c3,AGG,S)\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$C(\\sigma ,AGG,{\\fancyscript{S}})$$\\end{document} asks to first apply a selection predicate \u03c3\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$\\sigma $$\\end{document} along the y\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$y$$\\end{document} dimension, followed by an aggregation AGG\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$AGG$$\\end{document} along the x\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$x$$\\end{document} dimension. For selection predicates of the form (yc)\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$(y > c)$$\\end{document}, where parameter c\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$c$$\\end{document} is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate AGG\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$AGG$$\\end{document} to the streaming computation of AGG\\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$$AGG$$\\end{document} over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes.", 
    "genre": "article", 
    "id": "sg:pub.10.1007/s00453-014-9917-1", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.3092935", 
        "type": "MonetaryGrant"
      }, 
      {
        "id": "sg:grant.3093438", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1047644", 
        "issn": [
          "0178-4617", 
          "1432-0541"
        ], 
        "name": "Algorithmica", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "73"
      }
    ], 
    "keywords": [
      "streams", 
      "entire stream", 
      "aggregates", 
      "estimation", 
      "strong space", 
      "passes", 
      "time", 
      "certain conditions", 
      "conditions", 
      "statistics", 
      "validation", 
      "setting", 
      "data", 
      "data streams", 
      "attributes", 
      "form", 
      "parameters", 
      "main results", 
      "results", 
      "method", 
      "space", 
      "number", 
      "dimensions", 
      "aggregation", 
      "algorithm", 
      "bounds", 
      "general method", 
      "computation", 
      "space algorithm", 
      "moment", 
      "requirements", 
      "storage solutions", 
      "solution", 
      "cases", 
      "identifiers", 
      "lower bounds", 
      "frequency moments", 
      "weight", 
      "logarithmic number", 
      "data items", 
      "items", 
      "item identifier", 
      "numerical attributes", 
      "correlated aggregate query", 
      "aggregate queries", 
      "selection predicates", 
      "predicates", 
      "query time", 
      "correlated aggregates", 
      "sublinear space algorithms", 
      "large family", 
      "family", 
      "experimental validation", 
      "memory requirements", 
      "processing time", 
      "small-space algorithm", 
      "queries"
    ], 
    "name": "A General Method for Estimating Correlated Aggregates Over a Data Stream", 
    "pagination": "235-260", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1007957239"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s00453-014-9917-1"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s00453-014-9917-1", 
      "https://app.dimensions.ai/details/publication/pub.1007957239"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-12-01T06:32", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_634.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1007/s00453-014-9917-1"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s00453-014-9917-1'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s00453-014-9917-1'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s00453-014-9917-1'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s00453-014-9917-1'


 

This table displays all metadata directly associated to this object as RDF triples.

148 TRIPLES      21 PREDICATES      86 URIs      73 LITERALS      6 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s00453-014-9917-1 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Nf3f99f8f460144b39a726a6be2fad2f6
4 schema:citation sg:pub.10.1007/978-3-540-70918-3_40
5 sg:pub.10.1007/978-3-642-12450-1_5
6 sg:pub.10.1007/978-3-642-38768-5_56
7 sg:pub.10.1007/s00224-004-1156-4
8 sg:pub.10.1007/s00446-007-0048-7
9 schema:datePublished 2014-08-06
10 schema:datePublishedReg 2014-08-06
11 schema:description On a stream S\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\fancyscript{S}}$$\end{document} of two dimensional data items (x,y)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x,y)$$\end{document} where x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} is an item identifier and y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document} is a numerical attribute, a correlated aggregate query C(σ,AGG,S)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(\sigma ,AGG,{\fancyscript{S}})$$\end{document} asks to first apply a selection predicate σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} along the y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y$$\end{document} dimension, followed by an aggregation AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} along the x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x$$\end{document} dimension. For selection predicates of the form (y<c)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y < c)$$\end{document} or (y>c)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(y > c)$$\end{document}, where parameter c\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c$$\end{document} is provided at query time, we present new streaming algorithms and lower bounds for estimating correlated aggregates. Our main result is a general method that reduces the estimation of a correlated aggregate AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} to the streaming computation of AGG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$AGG$$\end{document} over an entire stream, for an aggregate that satisfies certain conditions. This results in the first sublinear space algorithms for the correlated estimation of a large family of statistics, including frequency moments. Our experimental validation shows that the memory requirements of these algorithms are significantly smaller than existing linear storage solutions, and that these achieve a fast per-record processing time. We also study the setting when items have weights. In the case when weights can be negative, we give a strong space lower bound which holds even if the algorithm is allowed up to a logarithmic number of passes over the data. We complement this with a small space algorithm which uses a logarithmic number of passes.
12 schema:genre article
13 schema:isAccessibleForFree true
14 schema:isPartOf N5153e63495674c71bd0808d1d337b4b6
15 N6ac67d03539f4104b172ec4a88d3b4d5
16 sg:journal.1047644
17 schema:keywords aggregate queries
18 aggregates
19 aggregation
20 algorithm
21 attributes
22 bounds
23 cases
24 certain conditions
25 computation
26 conditions
27 correlated aggregate query
28 correlated aggregates
29 data
30 data items
31 data streams
32 dimensions
33 entire stream
34 estimation
35 experimental validation
36 family
37 form
38 frequency moments
39 general method
40 identifiers
41 item identifier
42 items
43 large family
44 logarithmic number
45 lower bounds
46 main results
47 memory requirements
48 method
49 moment
50 number
51 numerical attributes
52 parameters
53 passes
54 predicates
55 processing time
56 queries
57 query time
58 requirements
59 results
60 selection predicates
61 setting
62 small-space algorithm
63 solution
64 space
65 space algorithm
66 statistics
67 storage solutions
68 streams
69 strong space
70 sublinear space algorithms
71 time
72 validation
73 weight
74 schema:name A General Method for Estimating Correlated Aggregates Over a Data Stream
75 schema:pagination 235-260
76 schema:productId N128d856761454c0c866390917f68c08b
77 Nda87192057094303a7ed00632333b8b6
78 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007957239
79 https://doi.org/10.1007/s00453-014-9917-1
80 schema:sdDatePublished 2022-12-01T06:32
81 schema:sdLicense https://scigraph.springernature.com/explorer/license/
82 schema:sdPublisher Nbb56598b45ed48aba5d788b409828c7e
83 schema:url https://doi.org/10.1007/s00453-014-9917-1
84 sgo:license sg:explorer/license/
85 sgo:sdDataset articles
86 rdf:type schema:ScholarlyArticle
87 N128d856761454c0c866390917f68c08b schema:name dimensions_id
88 schema:value pub.1007957239
89 rdf:type schema:PropertyValue
90 N5153e63495674c71bd0808d1d337b4b6 schema:issueNumber 2
91 rdf:type schema:PublicationIssue
92 N6ac67d03539f4104b172ec4a88d3b4d5 schema:volumeNumber 73
93 rdf:type schema:PublicationVolume
94 Nacce3c0e2f114dff99fe5870846c5983 rdf:first sg:person.012727410605.86
95 rdf:rest rdf:nil
96 Nbb56598b45ed48aba5d788b409828c7e schema:name Springer Nature - SN SciGraph project
97 rdf:type schema:Organization
98 Nda87192057094303a7ed00632333b8b6 schema:name doi
99 schema:value 10.1007/s00453-014-9917-1
100 rdf:type schema:PropertyValue
101 Nf3f99f8f460144b39a726a6be2fad2f6 rdf:first sg:person.014015211561.37
102 rdf:rest Nacce3c0e2f114dff99fe5870846c5983
103 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
104 schema:name Information and Computing Sciences
105 rdf:type schema:DefinedTerm
106 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
107 schema:name Artificial Intelligence and Image Processing
108 rdf:type schema:DefinedTerm
109 sg:grant.3092935 http://pending.schema.org/fundedItem sg:pub.10.1007/s00453-014-9917-1
110 rdf:type schema:MonetaryGrant
111 sg:grant.3093438 http://pending.schema.org/fundedItem sg:pub.10.1007/s00453-014-9917-1
112 rdf:type schema:MonetaryGrant
113 sg:journal.1047644 schema:issn 0178-4617
114 1432-0541
115 schema:name Algorithmica
116 schema:publisher Springer Nature
117 rdf:type schema:Periodical
118 sg:person.012727410605.86 schema:affiliation grid-institutes:grid.481551.c
119 schema:familyName Woodruff
120 schema:givenName David P.
121 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86
122 rdf:type schema:Person
123 sg:person.014015211561.37 schema:affiliation grid-institutes:grid.34421.30
124 schema:familyName Tirthapura
125 schema:givenName Srikanta
126 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014015211561.37
127 rdf:type schema:Person
128 sg:pub.10.1007/978-3-540-70918-3_40 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037637468
129 https://doi.org/10.1007/978-3-540-70918-3_40
130 rdf:type schema:CreativeWork
131 sg:pub.10.1007/978-3-642-12450-1_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035022927
132 https://doi.org/10.1007/978-3-642-12450-1_5
133 rdf:type schema:CreativeWork
134 sg:pub.10.1007/978-3-642-38768-5_56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023298832
135 https://doi.org/10.1007/978-3-642-38768-5_56
136 rdf:type schema:CreativeWork
137 sg:pub.10.1007/s00224-004-1156-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018926556
138 https://doi.org/10.1007/s00224-004-1156-4
139 rdf:type schema:CreativeWork
140 sg:pub.10.1007/s00446-007-0048-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021056418
141 https://doi.org/10.1007/s00446-007-0048-7
142 rdf:type schema:CreativeWork
143 grid-institutes:grid.34421.30 schema:alternateName Iowa State University, Ames, IA, USA
144 schema:name Iowa State University, Ames, IA, USA
145 rdf:type schema:Organization
146 grid-institutes:grid.481551.c schema:alternateName IBM Almaden, San Jose, CA, USA
147 schema:name IBM Almaden, San Jose, CA, USA
148 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...