Heterogeneous Replicas for Multi-dimensional Data Management View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2020-09-18

AUTHORS

Jialin Qiao , Yuyuan Kang , Xiangdong Huang , Lei Rui , Tian Jiang , Jianmin Wang , Philip S. Yu

ABSTRACT

Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions. More... »

PAGES

20-36

Book

TITLE

Database Systems for Advanced Applications

ISBN

978-3-030-59409-1
978-3-030-59410-7

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-030-59410-7_2

DOI

http://dx.doi.org/10.1007/978-3-030-59410-7_2

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1131073835


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Qiao", 
        "givenName": "Jialin", 
        "id": "sg:person.013540351275.06", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013540351275.06"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kang", 
        "givenName": "Yuyuan", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Huang", 
        "givenName": "Xiangdong", 
        "id": "sg:person.011010233413.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011010233413.90"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Rui", 
        "givenName": "Lei", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Jiang", 
        "givenName": "Tian", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Research Center for Big Data, Tsinghua University, Beijing, China", 
          "id": "http://www.grid.ac/institutes/grid.12527.33", 
          "name": [
            "KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China", 
            "Research Center for Big Data, Tsinghua University, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wang", 
        "givenName": "Jianmin", 
        "id": "sg:person.012303351315.43", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012303351315.43"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "University of Illinois, Champaign, IL, USA", 
          "id": "http://www.grid.ac/institutes/grid.35403.31", 
          "name": [
            "University of Illinois, Champaign, IL, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Yu", 
        "givenName": "Philip S.", 
        "id": "sg:person.011016356115.95", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011016356115.95"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2020-09-18", 
    "datePublishedReg": "2020-09-18", 
    "description": "Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions.", 
    "editor": [
      {
        "familyName": "Nah", 
        "givenName": "Yunmook", 
        "type": "Person"
      }, 
      {
        "familyName": "Cui", 
        "givenName": "Bin", 
        "type": "Person"
      }, 
      {
        "familyName": "Lee", 
        "givenName": "Sang-Won", 
        "type": "Person"
      }, 
      {
        "familyName": "Yu", 
        "givenName": "Jeffrey Xu", 
        "type": "Person"
      }, 
      {
        "familyName": "Moon", 
        "givenName": "Yang-Sae", 
        "type": "Person"
      }, 
      {
        "familyName": "Whang", 
        "givenName": "Steven Euijong", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-030-59410-7_2", 
    "inLanguage": "en", 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-030-59409-1", 
        "978-3-030-59410-7"
      ], 
      "name": "Database Systems for Advanced Applications", 
      "type": "Book"
    }, 
    "keywords": [
      "multi-dimensional data", 
      "heterogeneous replicas", 
      "multi-dimensional data management", 
      "high query throughput", 
      "user behavior analysis", 
      "physical data layout", 
      "web services", 
      "query throughput", 
      "NoSQL systems", 
      "database administrators", 
      "data layout", 
      "data management", 
      "load balance", 
      "art solutions", 
      "cluster monitoring", 
      "target workload", 
      "different replicas", 
      "high availability", 
      "replication strategy", 
      "queries", 
      "behavior analysis", 
      "replication method", 
      "different scenarios", 
      "dataset", 
      "physical layout", 
      "throughput", 
      "layout", 
      "replicas", 
      "optimal layout", 
      "Cassandra", 
      "workload", 
      "TPC", 
      "services", 
      "scenarios", 
      "database", 
      "data", 
      "method", 
      "availability", 
      "administrators", 
      "writing speed", 
      "system", 
      "speed", 
      "efficiency", 
      "monitoring", 
      "solution", 
      "limitations", 
      "management", 
      "strategies", 
      "results", 
      "state", 
      "disk", 
      "analysis", 
      "balance", 
      "occupation", 
      "same physical layout", 
      "additional disk occupation", 
      "disk occupation", 
      "different physical data layouts"
    ], 
    "name": "Heterogeneous Replicas for Multi-dimensional Data Management", 
    "pagination": "20-36", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1131073835"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-030-59410-7_2"
        ]
      }
    ], 
    "publisher": {
      "name": "Springer Nature", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-030-59410-7_2", 
      "https://app.dimensions.ai/details/publication/pub.1131073835"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2022-01-01T19:15", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/chapter/chapter_263.jsonl", 
    "type": "Chapter", 
    "url": "https://doi.org/10.1007/978-3-030-59410-7_2"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-59410-7_2'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-59410-7_2'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-59410-7_2'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-59410-7_2'


 

This table displays all metadata directly associated to this object as RDF triples.

186 TRIPLES      23 PREDICATES      83 URIs      76 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-030-59410-7_2 schema:about anzsrc-for:08
2 anzsrc-for:0806
3 schema:author N2dc94c7b697d4871a278e4ab3c89ecb4
4 schema:datePublished 2020-09-18
5 schema:datePublishedReg 2020-09-18
6 schema:description Multi-dimensional data is widely used in different scenarios, such as cluster monitoring and user behavior analysis for web services. The data is usually managed by distributed databases with a replication strategy, which enhances the availability, fault-tolerance, and I/O throughput. Normally, these replicas share the same physical layout on the disk, which is designed by database administrators according to the target workload. However, it is critical to derive an optimal layout that benefits as many queries as possible, because a layout that accommodates only some queries can negatively impact the others. To tackle this limitation, we propose heterogeneous replicas for multi-dimensional data that provide a higher query throughput without additional disk occupation and without slowing down the writing speed, while still ensuring high availability and load balance. The proposed replication method allows different replicas to be logically identical while having different physical data layouts on the disk. We verified the efficiency of our method in a NoSQL system, Cassandra, with the TPC-H dataset and with a synthetically generated dataset. The results show that our method outperforms state-of-the-art solutions.
7 schema:editor Nfc24679e2195432a930440e110846d03
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf Nf5bca01d9a354ba48a45fff400a639f0
12 schema:keywords Cassandra
13 NoSQL systems
14 TPC
15 additional disk occupation
16 administrators
17 analysis
18 art solutions
19 availability
20 balance
21 behavior analysis
22 cluster monitoring
23 data
24 data layout
25 data management
26 database
27 database administrators
28 dataset
29 different physical data layouts
30 different replicas
31 different scenarios
32 disk
33 disk occupation
34 efficiency
35 heterogeneous replicas
36 high availability
37 high query throughput
38 layout
39 limitations
40 load balance
41 management
42 method
43 monitoring
44 multi-dimensional data
45 multi-dimensional data management
46 occupation
47 optimal layout
48 physical data layout
49 physical layout
50 queries
51 query throughput
52 replicas
53 replication method
54 replication strategy
55 results
56 same physical layout
57 scenarios
58 services
59 solution
60 speed
61 state
62 strategies
63 system
64 target workload
65 throughput
66 user behavior analysis
67 web services
68 workload
69 writing speed
70 schema:name Heterogeneous Replicas for Multi-dimensional Data Management
71 schema:pagination 20-36
72 schema:productId N243e0ca2917c476ca6f5497bc4b0a509
73 Nc35eb5c1c794480fad8a2d9f329c9e05
74 schema:publisher Nfd2c21922082433f84b6ff9e35fc18a6
75 schema:sameAs https://app.dimensions.ai/details/publication/pub.1131073835
76 https://doi.org/10.1007/978-3-030-59410-7_2
77 schema:sdDatePublished 2022-01-01T19:15
78 schema:sdLicense https://scigraph.springernature.com/explorer/license/
79 schema:sdPublisher N43557ae7aca94de8a4343deedab8be92
80 schema:url https://doi.org/10.1007/978-3-030-59410-7_2
81 sgo:license sg:explorer/license/
82 sgo:sdDataset chapters
83 rdf:type schema:Chapter
84 N03c76d2b14bb437189114903bf16aef5 rdf:first Ne31d378ff8d44f169b315d7b49645910
85 rdf:rest N1647d738c43243bb9b027a49cf25f7b9
86 N0e45581a3b4644b488fcb0cba5fa46df rdf:first N27d012efc5704dc2bb5f5899bbf5b3d9
87 rdf:rest N7ddcfc84456f4d2ba18800d3ef0d03be
88 N10b0e5b173054d8a9f772a59cdd02c63 schema:familyName Whang
89 schema:givenName Steven Euijong
90 rdf:type schema:Person
91 N1647d738c43243bb9b027a49cf25f7b9 rdf:first N5da3ee1446f2408f9b05a45e671fab72
92 rdf:rest Nb4fbff65c30a481487ae167a92e86ecb
93 N1b88e495b8b341859c4aadc2dba9c7b6 schema:affiliation grid-institutes:grid.12527.33
94 schema:familyName Kang
95 schema:givenName Yuyuan
96 rdf:type schema:Person
97 N1ff8c0e4a1e5420fa3a406cf2034e255 schema:familyName Moon
98 schema:givenName Yang-Sae
99 rdf:type schema:Person
100 N243e0ca2917c476ca6f5497bc4b0a509 schema:name doi
101 schema:value 10.1007/978-3-030-59410-7_2
102 rdf:type schema:PropertyValue
103 N248c63622cc642bc9529d5a6e4ac63e9 rdf:first sg:person.011010233413.90
104 rdf:rest N03c76d2b14bb437189114903bf16aef5
105 N27d012efc5704dc2bb5f5899bbf5b3d9 schema:familyName Lee
106 schema:givenName Sang-Won
107 rdf:type schema:Person
108 N2dc94c7b697d4871a278e4ab3c89ecb4 rdf:first sg:person.013540351275.06
109 rdf:rest Nf479c1115e574168a648e5814b0b1d60
110 N315a442429134cc5abb632f697b5d774 rdf:first N1ff8c0e4a1e5420fa3a406cf2034e255
111 rdf:rest Nf089ad3cefdf4a01aea84825f4ecd2be
112 N43557ae7aca94de8a4343deedab8be92 schema:name Springer Nature - SN SciGraph project
113 rdf:type schema:Organization
114 N5da3ee1446f2408f9b05a45e671fab72 schema:affiliation grid-institutes:grid.12527.33
115 schema:familyName Jiang
116 schema:givenName Tian
117 rdf:type schema:Person
118 N7ddcfc84456f4d2ba18800d3ef0d03be rdf:first Ndf0ee6e1a00645839cf0c19fca56e4b5
119 rdf:rest N315a442429134cc5abb632f697b5d774
120 N867d4ab7df784af7a1946b000cd018d1 rdf:first Ne3f3d24a40a6428d9a732c43e2026972
121 rdf:rest N0e45581a3b4644b488fcb0cba5fa46df
122 Na2ac45d62dca4f748d4a8e444ad29c0f rdf:first sg:person.011016356115.95
123 rdf:rest rdf:nil
124 Nb0358aba292e425e81d4c8fd2de75ae7 schema:familyName Nah
125 schema:givenName Yunmook
126 rdf:type schema:Person
127 Nb4fbff65c30a481487ae167a92e86ecb rdf:first sg:person.012303351315.43
128 rdf:rest Na2ac45d62dca4f748d4a8e444ad29c0f
129 Nc35eb5c1c794480fad8a2d9f329c9e05 schema:name dimensions_id
130 schema:value pub.1131073835
131 rdf:type schema:PropertyValue
132 Ndf0ee6e1a00645839cf0c19fca56e4b5 schema:familyName Yu
133 schema:givenName Jeffrey Xu
134 rdf:type schema:Person
135 Ne31d378ff8d44f169b315d7b49645910 schema:affiliation grid-institutes:grid.12527.33
136 schema:familyName Rui
137 schema:givenName Lei
138 rdf:type schema:Person
139 Ne3f3d24a40a6428d9a732c43e2026972 schema:familyName Cui
140 schema:givenName Bin
141 rdf:type schema:Person
142 Nf089ad3cefdf4a01aea84825f4ecd2be rdf:first N10b0e5b173054d8a9f772a59cdd02c63
143 rdf:rest rdf:nil
144 Nf479c1115e574168a648e5814b0b1d60 rdf:first N1b88e495b8b341859c4aadc2dba9c7b6
145 rdf:rest N248c63622cc642bc9529d5a6e4ac63e9
146 Nf5bca01d9a354ba48a45fff400a639f0 schema:isbn 978-3-030-59409-1
147 978-3-030-59410-7
148 schema:name Database Systems for Advanced Applications
149 rdf:type schema:Book
150 Nfc24679e2195432a930440e110846d03 rdf:first Nb0358aba292e425e81d4c8fd2de75ae7
151 rdf:rest N867d4ab7df784af7a1946b000cd018d1
152 Nfd2c21922082433f84b6ff9e35fc18a6 schema:name Springer Nature
153 rdf:type schema:Organisation
154 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
155 schema:name Information and Computing Sciences
156 rdf:type schema:DefinedTerm
157 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
158 schema:name Information Systems
159 rdf:type schema:DefinedTerm
160 sg:person.011010233413.90 schema:affiliation grid-institutes:grid.12527.33
161 schema:familyName Huang
162 schema:givenName Xiangdong
163 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011010233413.90
164 rdf:type schema:Person
165 sg:person.011016356115.95 schema:affiliation grid-institutes:grid.35403.31
166 schema:familyName Yu
167 schema:givenName Philip S.
168 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011016356115.95
169 rdf:type schema:Person
170 sg:person.012303351315.43 schema:affiliation grid-institutes:grid.12527.33
171 schema:familyName Wang
172 schema:givenName Jianmin
173 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012303351315.43
174 rdf:type schema:Person
175 sg:person.013540351275.06 schema:affiliation grid-institutes:grid.12527.33
176 schema:familyName Qiao
177 schema:givenName Jialin
178 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013540351275.06
179 rdf:type schema:Person
180 grid-institutes:grid.12527.33 schema:alternateName Research Center for Big Data, Tsinghua University, Beijing, China
181 schema:name KLiss, MOE; BNRist; School of Software, Tsinghua University, Beijing, China
182 Research Center for Big Data, Tsinghua University, Beijing, China
183 rdf:type schema:Organization
184 grid-institutes:grid.35403.31 schema:alternateName University of Illinois, Champaign, IL, USA
185 schema:name University of Illinois, Champaign, IL, USA
186 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...