Compressed Dictionaries: Space Measures, Data Sets, and Experiments View Full Text


Ontology type: schema:Chapter      Open Access: True


Chapter Info

DATE

2006

AUTHORS

Ankur Gupta , Wing-Kai Hon , Rahul Shah , Jeffrey Scott Vitter

ABSTRACT

In this paper, we present an experimental study of the space-time tradeoffs for the dictionary problem, where we design a data structure to represent set data, which consist of a subset S of n items out of a universe U = {0, 1,...,u – 1} supporting various queries on S. Our primary goal is to reduce the space required for such a dictionary data structure. Many compression schemes have been developed for dictionaries, which fall generally in the categories of combinatorial encodings and data-aware methods and still support queries efficiently. We show that for many (real-world) datasets, data-aware methods lead to a worthwhile compression over combinatorial methods. Additionally, we design a new data-aware building block structure called BSGAP that presents improvements over other data-aware methods. More... »

PAGES

158-169

References to SciGraph publications

  • 1976-12. Design and implementation of an efficient priority queue in MATHEMATICAL SYSTEMS THEORY
  • 1996. Tables in FOUNDATIONS OF SOFTWARE TECHNOLOGY AND THEORETICAL COMPUTER SCIENCE
  • Book

    TITLE

    Experimental Algorithms

    ISBN

    978-3-540-34597-8
    978-3-540-34598-5

    Author Affiliations

    Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/11764298_14

    DOI

    http://dx.doi.org/10.1007/11764298_14

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1007527412


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Purdue University", 
              "id": "https://www.grid.ac/institutes/grid.169077.e", 
              "name": [
                "Department of Computer Sciences, Purdue University, 47907-2066, West Lafayette, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Gupta", 
            "givenName": "Ankur", 
            "id": "sg:person.011551523677.07", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011551523677.07"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Purdue University", 
              "id": "https://www.grid.ac/institutes/grid.169077.e", 
              "name": [
                "Department of Computer Sciences, Purdue University, 47907-2066, West Lafayette, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Hon", 
            "givenName": "Wing-Kai", 
            "id": "sg:person.07456324600.70", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07456324600.70"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Purdue University", 
              "id": "https://www.grid.ac/institutes/grid.169077.e", 
              "name": [
                "Department of Computer Sciences, Purdue University, 47907-2066, West Lafayette, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Shah", 
            "givenName": "Rahul", 
            "id": "sg:person.016536034313.19", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016536034313.19"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Purdue University", 
              "id": "https://www.grid.ac/institutes/grid.169077.e", 
              "name": [
                "Department of Computer Sciences, Purdue University, 47907-2066, West Lafayette, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Vitter", 
            "givenName": "Jeffrey Scott", 
            "id": "sg:person.0613677314.28", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0613677314.28"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1145/301250.301323", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1007611808"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/335305.335344", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016321982"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/0022-0000(84)90020-5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019739354"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/(sici)1097-4571(199310)44:9<508::aid-asi2>3.0.co;2-a", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1024742520"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.tcs.2007.07.042", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1034459767"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-62034-6_35", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1051372882", 
              "https://doi.org/10.1007/3-540-62034-6_35"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01683268", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053048814", 
              "https://doi.org/10.1007/bf01683268"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/bf01683268", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1053048814", 
              "https://doi.org/10.1007/bf01683268"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tit.1975.1055349", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061647584"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1137/s0097539795294165", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1062880068"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/dcc.2002.999952", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1093489537"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/dcc.2006.12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095051701"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/0471749095", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1103193420"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/0471749095", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1103193420"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2006", 
        "datePublishedReg": "2006-01-01", 
        "description": "In this paper, we present an experimental study of the space-time tradeoffs for the dictionary problem, where we design a data structure to represent set data, which consist of a subset S of n items out of a universe U = {0, 1,...,u \u2013 1} supporting various queries on S. Our primary goal is to reduce the space required for such a dictionary data structure. Many compression schemes have been developed for dictionaries, which fall generally in the categories of combinatorial encodings and data-aware methods and still support queries efficiently. We show that for many (real-world) datasets, data-aware methods lead to a worthwhile compression over combinatorial methods. Additionally, we design a new data-aware building block structure called BSGAP that presents improvements over other data-aware methods.", 
        "editor": [
          {
            "familyName": "\u00c0lvarez", 
            "givenName": "Carme", 
            "type": "Person"
          }, 
          {
            "familyName": "Serna", 
            "givenName": "Mar\u00eda", 
            "type": "Person"
          }
        ], 
        "genre": "chapter", 
        "id": "sg:pub.10.1007/11764298_14", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": true, 
        "isPartOf": {
          "isbn": [
            "978-3-540-34597-8", 
            "978-3-540-34598-5"
          ], 
          "name": "Experimental Algorithms", 
          "type": "Book"
        }, 
        "name": "Compressed Dictionaries: Space Measures, Data Sets, and Experiments", 
        "pagination": "158-169", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1007527412"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/11764298_14"
            ]
          }, 
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "23dafd108c54b56c5484e56d08d30c440f07f16bc0507374a8c5069e108e1baf"
            ]
          }
        ], 
        "publisher": {
          "location": "Berlin, Heidelberg", 
          "name": "Springer Berlin Heidelberg", 
          "type": "Organisation"
        }, 
        "sameAs": [
          "https://doi.org/10.1007/11764298_14", 
          "https://app.dimensions.ai/details/publication/pub.1007527412"
        ], 
        "sdDataset": "chapters", 
        "sdDatePublished": "2019-04-16T07:29", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000356_0000000356/records_57868_00000000.jsonl", 
        "type": "Chapter", 
        "url": "https://link.springer.com/10.1007%2F11764298_14"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/11764298_14'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/11764298_14'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/11764298_14'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/11764298_14'


     

    This table displays all metadata directly associated to this object as RDF triples.

    129 TRIPLES      23 PREDICATES      39 URIs      20 LITERALS      8 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/11764298_14 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Nde86e79650b442ae80b96fbc50ae28b1
    4 schema:citation sg:pub.10.1007/3-540-62034-6_35
    5 sg:pub.10.1007/bf01683268
    6 https://doi.org/10.1002/(sici)1097-4571(199310)44:9<508::aid-asi2>3.0.co;2-a
    7 https://doi.org/10.1002/0471749095
    8 https://doi.org/10.1016/0022-0000(84)90020-5
    9 https://doi.org/10.1016/j.tcs.2007.07.042
    10 https://doi.org/10.1109/dcc.2002.999952
    11 https://doi.org/10.1109/dcc.2006.12
    12 https://doi.org/10.1109/tit.1975.1055349
    13 https://doi.org/10.1137/s0097539795294165
    14 https://doi.org/10.1145/301250.301323
    15 https://doi.org/10.1145/335305.335344
    16 schema:datePublished 2006
    17 schema:datePublishedReg 2006-01-01
    18 schema:description In this paper, we present an experimental study of the space-time tradeoffs for the dictionary problem, where we design a data structure to represent set data, which consist of a subset S of n items out of a universe U = {0, 1,...,u – 1} supporting various queries on S. Our primary goal is to reduce the space required for such a dictionary data structure. Many compression schemes have been developed for dictionaries, which fall generally in the categories of combinatorial encodings and data-aware methods and still support queries efficiently. We show that for many (real-world) datasets, data-aware methods lead to a worthwhile compression over combinatorial methods. Additionally, we design a new data-aware building block structure called BSGAP that presents improvements over other data-aware methods.
    19 schema:editor N9e127708415d4a78bcb02cd451f5c076
    20 schema:genre chapter
    21 schema:inLanguage en
    22 schema:isAccessibleForFree true
    23 schema:isPartOf N449fa402b67141dea87c069b0d4f0923
    24 schema:name Compressed Dictionaries: Space Measures, Data Sets, and Experiments
    25 schema:pagination 158-169
    26 schema:productId N52293c6238774544a6a94adc10c5f969
    27 Na213daeb96cb4c8ba17ed20c4e2dc6c7
    28 Nba91baf1d1df41958afaececa90980cd
    29 schema:publisher Ne1ff69df1a8c47bdabd6e4ce55be69c3
    30 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007527412
    31 https://doi.org/10.1007/11764298_14
    32 schema:sdDatePublished 2019-04-16T07:29
    33 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    34 schema:sdPublisher N888852b2b2a64036b0a622884fbfc661
    35 schema:url https://link.springer.com/10.1007%2F11764298_14
    36 sgo:license sg:explorer/license/
    37 sgo:sdDataset chapters
    38 rdf:type schema:Chapter
    39 N16781bd541f043ad9568744ae0b852ba rdf:first sg:person.0613677314.28
    40 rdf:rest rdf:nil
    41 N3c2ed30f3f1548d38852bf0a96665552 schema:familyName Àlvarez
    42 schema:givenName Carme
    43 rdf:type schema:Person
    44 N449fa402b67141dea87c069b0d4f0923 schema:isbn 978-3-540-34597-8
    45 978-3-540-34598-5
    46 schema:name Experimental Algorithms
    47 rdf:type schema:Book
    48 N52293c6238774544a6a94adc10c5f969 schema:name dimensions_id
    49 schema:value pub.1007527412
    50 rdf:type schema:PropertyValue
    51 N6034a36101844965980bd9445f32480e rdf:first Nc0254e49cda24add8540350e9029830e
    52 rdf:rest rdf:nil
    53 N69d9cbe208db4e228bb73760c79cf939 rdf:first sg:person.016536034313.19
    54 rdf:rest N16781bd541f043ad9568744ae0b852ba
    55 N888852b2b2a64036b0a622884fbfc661 schema:name Springer Nature - SN SciGraph project
    56 rdf:type schema:Organization
    57 N9e127708415d4a78bcb02cd451f5c076 rdf:first N3c2ed30f3f1548d38852bf0a96665552
    58 rdf:rest N6034a36101844965980bd9445f32480e
    59 Na12f9b7ddecb43a08f7f79c9eea112c3 rdf:first sg:person.07456324600.70
    60 rdf:rest N69d9cbe208db4e228bb73760c79cf939
    61 Na213daeb96cb4c8ba17ed20c4e2dc6c7 schema:name doi
    62 schema:value 10.1007/11764298_14
    63 rdf:type schema:PropertyValue
    64 Nba91baf1d1df41958afaececa90980cd schema:name readcube_id
    65 schema:value 23dafd108c54b56c5484e56d08d30c440f07f16bc0507374a8c5069e108e1baf
    66 rdf:type schema:PropertyValue
    67 Nc0254e49cda24add8540350e9029830e schema:familyName Serna
    68 schema:givenName María
    69 rdf:type schema:Person
    70 Nde86e79650b442ae80b96fbc50ae28b1 rdf:first sg:person.011551523677.07
    71 rdf:rest Na12f9b7ddecb43a08f7f79c9eea112c3
    72 Ne1ff69df1a8c47bdabd6e4ce55be69c3 schema:location Berlin, Heidelberg
    73 schema:name Springer Berlin Heidelberg
    74 rdf:type schema:Organisation
    75 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    76 schema:name Information and Computing Sciences
    77 rdf:type schema:DefinedTerm
    78 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    79 schema:name Artificial Intelligence and Image Processing
    80 rdf:type schema:DefinedTerm
    81 sg:person.011551523677.07 schema:affiliation https://www.grid.ac/institutes/grid.169077.e
    82 schema:familyName Gupta
    83 schema:givenName Ankur
    84 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011551523677.07
    85 rdf:type schema:Person
    86 sg:person.016536034313.19 schema:affiliation https://www.grid.ac/institutes/grid.169077.e
    87 schema:familyName Shah
    88 schema:givenName Rahul
    89 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016536034313.19
    90 rdf:type schema:Person
    91 sg:person.0613677314.28 schema:affiliation https://www.grid.ac/institutes/grid.169077.e
    92 schema:familyName Vitter
    93 schema:givenName Jeffrey Scott
    94 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0613677314.28
    95 rdf:type schema:Person
    96 sg:person.07456324600.70 schema:affiliation https://www.grid.ac/institutes/grid.169077.e
    97 schema:familyName Hon
    98 schema:givenName Wing-Kai
    99 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07456324600.70
    100 rdf:type schema:Person
    101 sg:pub.10.1007/3-540-62034-6_35 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051372882
    102 https://doi.org/10.1007/3-540-62034-6_35
    103 rdf:type schema:CreativeWork
    104 sg:pub.10.1007/bf01683268 schema:sameAs https://app.dimensions.ai/details/publication/pub.1053048814
    105 https://doi.org/10.1007/bf01683268
    106 rdf:type schema:CreativeWork
    107 https://doi.org/10.1002/(sici)1097-4571(199310)44:9<508::aid-asi2>3.0.co;2-a schema:sameAs https://app.dimensions.ai/details/publication/pub.1024742520
    108 rdf:type schema:CreativeWork
    109 https://doi.org/10.1002/0471749095 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103193420
    110 rdf:type schema:CreativeWork
    111 https://doi.org/10.1016/0022-0000(84)90020-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019739354
    112 rdf:type schema:CreativeWork
    113 https://doi.org/10.1016/j.tcs.2007.07.042 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034459767
    114 rdf:type schema:CreativeWork
    115 https://doi.org/10.1109/dcc.2002.999952 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093489537
    116 rdf:type schema:CreativeWork
    117 https://doi.org/10.1109/dcc.2006.12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095051701
    118 rdf:type schema:CreativeWork
    119 https://doi.org/10.1109/tit.1975.1055349 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061647584
    120 rdf:type schema:CreativeWork
    121 https://doi.org/10.1137/s0097539795294165 schema:sameAs https://app.dimensions.ai/details/publication/pub.1062880068
    122 rdf:type schema:CreativeWork
    123 https://doi.org/10.1145/301250.301323 schema:sameAs https://app.dimensions.ai/details/publication/pub.1007611808
    124 rdf:type schema:CreativeWork
    125 https://doi.org/10.1145/335305.335344 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016321982
    126 rdf:type schema:CreativeWork
    127 https://www.grid.ac/institutes/grid.169077.e schema:alternateName Purdue University
    128 schema:name Department of Computer Sciences, Purdue University, 47907-2066, West Lafayette, IN, USA
    129 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...