Ontology type: schema:ScholarlyArticle Open Access: True
2017-07-03
AUTHORSNikos Zacheilas, Vana Kalogeraki
ABSTRACTIn recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors. More... »
PAGES29
http://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7
DOIhttp://dx.doi.org/10.1186/s13639-017-0077-7
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1090335791
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/09",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Engineering",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0906",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Electrical and Electronic Engineering",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Department of Informatics, Athens University of Economics and Business, Athens, Greece",
"id": "http://www.grid.ac/institutes/grid.16299.35",
"name": [
"Department of Informatics, Athens University of Economics and Business, Athens, Greece"
],
"type": "Organization"
},
"familyName": "Zacheilas",
"givenName": "Nikos",
"id": "sg:person.015502310643.13",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015502310643.13"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Informatics, Athens University of Economics and Business, Athens, Greece",
"id": "http://www.grid.ac/institutes/grid.16299.35",
"name": [
"Department of Informatics, Athens University of Economics and Business, Athens, Greece"
],
"type": "Organization"
},
"familyName": "Kalogeraki",
"givenName": "Vana",
"id": "sg:person.011170521233.42",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011170521233.42"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1007/11945918_12",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1042193684",
"https://doi.org/10.1007/11945918_12"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-46131-1_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1044038752",
"https://doi.org/10.1007/978-3-319-46131-1_23"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13639-016-0026-x",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1049544096",
"https://doi.org/10.1186/s13639-016-0026-x"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-46131-1_5",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1037263776",
"https://doi.org/10.1007/978-3-319-46131-1_5"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/s13639-016-0022-1",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1044202872",
"https://doi.org/10.1186/s13639-016-0022-1"
],
"type": "CreativeWork"
}
],
"datePublished": "2017-07-03",
"datePublishedReg": "2017-07-03",
"description": "In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon\u2019s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks\u2019 parameters that allows us to further minimize the user\u2019s spending budget and the jobs\u2019 execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors.",
"genre": "article",
"id": "sg:pub.10.1186/s13639-017-0077-7",
"isAccessibleForFree": true,
"isFundedItemOf": [
{
"id": "sg:grant.5051116",
"type": "MonetaryGrant"
}
],
"isPartOf": [
{
"id": "sg:journal.1136306",
"issn": [
"1687-3955",
"1687-3963"
],
"name": "EURASIP Journal on Embedded Systems",
"publisher": "Springer Nature",
"type": "Periodical"
},
{
"issueNumber": "1",
"type": "PublicationIssue"
},
{
"type": "PublicationVolume",
"volumeNumber": "2017"
}
],
"keywords": [
"cyber-physical systems",
"MapReduce workloads",
"execution time",
"large-scale data processing",
"cluster environment",
"major computing companies",
"MapReduce programming model",
"corresponding monetary cost",
"job execution time",
"detailed experimental evaluation",
"monetary cost",
"optimal resource allocation",
"spending budget",
"cloud providers",
"Amazon EC2",
"automatic configuration",
"user workload",
"O operations",
"novel Pareto",
"following contributions",
"synthetic datasets",
"data processing",
"experimental evaluation",
"programming model",
"basic tasks",
"resource allocation",
"workload",
"EC2",
"scheduler",
"physical systems",
"large amount",
"recent years",
"users",
"performance",
"environment",
"Pareto",
"dataset",
"task",
"cost",
"processing",
"allocation",
"providers",
"budget",
"companies",
"operation",
"system",
"demand",
"need",
"model",
"data",
"competitors",
"order",
"evaluation",
"configuration",
"time",
"amount",
"fact",
"consideration",
"contribution",
"parameters",
"respect",
"years",
"problem",
"approach",
"paper"
],
"name": "A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads",
"pagination": "29",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1090335791"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1186/s13639-017-0077-7"
]
}
],
"sameAs": [
"https://doi.org/10.1186/s13639-017-0077-7",
"https://app.dimensions.ai/details/publication/pub.1090335791"
],
"sdDataset": "articles",
"sdDatePublished": "2022-08-04T17:04",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_724.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1186/s13639-017-0077-7"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'
This table displays all metadata directly associated to this object as RDF triples.
151 TRIPLES
21 PREDICATES
94 URIs
81 LITERALS
6 BLANK NODES
Subject | Predicate | Object | |
---|---|---|---|
1 | sg:pub.10.1186/s13639-017-0077-7 | schema:about | anzsrc-for:09 |
2 | ″ | ″ | anzsrc-for:0906 |
3 | ″ | schema:author | Nca395c937af2456cba637ef79b1d3767 |
4 | ″ | schema:citation | sg:pub.10.1007/11945918_12 |
5 | ″ | ″ | sg:pub.10.1007/978-3-319-46131-1_23 |
6 | ″ | ″ | sg:pub.10.1007/978-3-319-46131-1_5 |
7 | ″ | ″ | sg:pub.10.1186/s13639-016-0022-1 |
8 | ″ | ″ | sg:pub.10.1186/s13639-016-0026-x |
9 | ″ | schema:datePublished | 2017-07-03 |
10 | ″ | schema:datePublishedReg | 2017-07-03 |
11 | ″ | schema:description | In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors. |
12 | ″ | schema:genre | article |
13 | ″ | schema:isAccessibleForFree | true |
14 | ″ | schema:isPartOf | N74fc8a1a199b4e97b0e4a9fc48a0d244 |
15 | ″ | ″ | Nc5571b42c11a48768997661271e4dc6e |
16 | ″ | ″ | sg:journal.1136306 |
17 | ″ | schema:keywords | Amazon EC2 |
18 | ″ | ″ | EC2 |
19 | ″ | ″ | MapReduce programming model |
20 | ″ | ″ | MapReduce workloads |
21 | ″ | ″ | O operations |
22 | ″ | ″ | Pareto |
23 | ″ | ″ | allocation |
24 | ″ | ″ | amount |
25 | ″ | ″ | approach |
26 | ″ | ″ | automatic configuration |
27 | ″ | ″ | basic tasks |
28 | ″ | ″ | budget |
29 | ″ | ″ | cloud providers |
30 | ″ | ″ | cluster environment |
31 | ″ | ″ | companies |
32 | ″ | ″ | competitors |
33 | ″ | ″ | configuration |
34 | ″ | ″ | consideration |
35 | ″ | ″ | contribution |
36 | ″ | ″ | corresponding monetary cost |
37 | ″ | ″ | cost |
38 | ″ | ″ | cyber-physical systems |
39 | ″ | ″ | data |
40 | ″ | ″ | data processing |
41 | ″ | ″ | dataset |
42 | ″ | ″ | demand |
43 | ″ | ″ | detailed experimental evaluation |
44 | ″ | ″ | environment |
45 | ″ | ″ | evaluation |
46 | ″ | ″ | execution time |
47 | ″ | ″ | experimental evaluation |
48 | ″ | ″ | fact |
49 | ″ | ″ | following contributions |
50 | ″ | ″ | job execution time |
51 | ″ | ″ | large amount |
52 | ″ | ″ | large-scale data processing |
53 | ″ | ″ | major computing companies |
54 | ″ | ″ | model |
55 | ″ | ″ | monetary cost |
56 | ″ | ″ | need |
57 | ″ | ″ | novel Pareto |
58 | ″ | ″ | operation |
59 | ″ | ″ | optimal resource allocation |
60 | ″ | ″ | order |
61 | ″ | ″ | paper |
62 | ″ | ″ | parameters |
63 | ″ | ″ | performance |
64 | ″ | ″ | physical systems |
65 | ″ | ″ | problem |
66 | ″ | ″ | processing |
67 | ″ | ″ | programming model |
68 | ″ | ″ | providers |
69 | ″ | ″ | recent years |
70 | ″ | ″ | resource allocation |
71 | ″ | ″ | respect |
72 | ″ | ″ | scheduler |
73 | ″ | ″ | spending budget |
74 | ″ | ″ | synthetic datasets |
75 | ″ | ″ | system |
76 | ″ | ″ | task |
77 | ″ | ″ | time |
78 | ″ | ″ | user workload |
79 | ″ | ″ | users |
80 | ″ | ″ | workload |
81 | ″ | ″ | years |
82 | ″ | schema:name | A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads |
83 | ″ | schema:pagination | 29 |
84 | ″ | schema:productId | N2521755875d44789a2b77e0a982b8482 |
85 | ″ | ″ | Nf87d3bef57f7490f96afb703408c1764 |
86 | ″ | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1090335791 |
87 | ″ | ″ | https://doi.org/10.1186/s13639-017-0077-7 |
88 | ″ | schema:sdDatePublished | 2022-08-04T17:04 |
89 | ″ | schema:sdLicense | https://scigraph.springernature.com/explorer/license/ |
90 | ″ | schema:sdPublisher | Nc6ece35504224dcba756433a128280f5 |
91 | ″ | schema:url | https://doi.org/10.1186/s13639-017-0077-7 |
92 | ″ | sgo:license | sg:explorer/license/ |
93 | ″ | sgo:sdDataset | articles |
94 | ″ | rdf:type | schema:ScholarlyArticle |
95 | N2521755875d44789a2b77e0a982b8482 | schema:name | doi |
96 | ″ | schema:value | 10.1186/s13639-017-0077-7 |
97 | ″ | rdf:type | schema:PropertyValue |
98 | N74fc8a1a199b4e97b0e4a9fc48a0d244 | schema:issueNumber | 1 |
99 | ″ | rdf:type | schema:PublicationIssue |
100 | Na6d3930111d649879966de44abd0d56e | rdf:first | sg:person.011170521233.42 |
101 | ″ | rdf:rest | rdf:nil |
102 | Nc5571b42c11a48768997661271e4dc6e | schema:volumeNumber | 2017 |
103 | ″ | rdf:type | schema:PublicationVolume |
104 | Nc6ece35504224dcba756433a128280f5 | schema:name | Springer Nature - SN SciGraph project |
105 | ″ | rdf:type | schema:Organization |
106 | Nca395c937af2456cba637ef79b1d3767 | rdf:first | sg:person.015502310643.13 |
107 | ″ | rdf:rest | Na6d3930111d649879966de44abd0d56e |
108 | Nf87d3bef57f7490f96afb703408c1764 | schema:name | dimensions_id |
109 | ″ | schema:value | pub.1090335791 |
110 | ″ | rdf:type | schema:PropertyValue |
111 | anzsrc-for:09 | schema:inDefinedTermSet | anzsrc-for: |
112 | ″ | schema:name | Engineering |
113 | ″ | rdf:type | schema:DefinedTerm |
114 | anzsrc-for:0906 | schema:inDefinedTermSet | anzsrc-for: |
115 | ″ | schema:name | Electrical and Electronic Engineering |
116 | ″ | rdf:type | schema:DefinedTerm |
117 | sg:grant.5051116 | http://pending.schema.org/fundedItem | sg:pub.10.1186/s13639-017-0077-7 |
118 | ″ | rdf:type | schema:MonetaryGrant |
119 | sg:journal.1136306 | schema:issn | 1687-3955 |
120 | ″ | ″ | 1687-3963 |
121 | ″ | schema:name | EURASIP Journal on Embedded Systems |
122 | ″ | schema:publisher | Springer Nature |
123 | ″ | rdf:type | schema:Periodical |
124 | sg:person.011170521233.42 | schema:affiliation | grid-institutes:grid.16299.35 |
125 | ″ | schema:familyName | Kalogeraki |
126 | ″ | schema:givenName | Vana |
127 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011170521233.42 |
128 | ″ | rdf:type | schema:Person |
129 | sg:person.015502310643.13 | schema:affiliation | grid-institutes:grid.16299.35 |
130 | ″ | schema:familyName | Zacheilas |
131 | ″ | schema:givenName | Nikos |
132 | ″ | schema:sameAs | https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015502310643.13 |
133 | ″ | rdf:type | schema:Person |
134 | sg:pub.10.1007/11945918_12 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1042193684 |
135 | ″ | ″ | https://doi.org/10.1007/11945918_12 |
136 | ″ | rdf:type | schema:CreativeWork |
137 | sg:pub.10.1007/978-3-319-46131-1_23 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1044038752 |
138 | ″ | ″ | https://doi.org/10.1007/978-3-319-46131-1_23 |
139 | ″ | rdf:type | schema:CreativeWork |
140 | sg:pub.10.1007/978-3-319-46131-1_5 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1037263776 |
141 | ″ | ″ | https://doi.org/10.1007/978-3-319-46131-1_5 |
142 | ″ | rdf:type | schema:CreativeWork |
143 | sg:pub.10.1186/s13639-016-0022-1 | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1044202872 |
144 | ″ | ″ | https://doi.org/10.1186/s13639-016-0022-1 |
145 | ″ | rdf:type | schema:CreativeWork |
146 | sg:pub.10.1186/s13639-016-0026-x | schema:sameAs | https://app.dimensions.ai/details/publication/pub.1049544096 |
147 | ″ | ″ | https://doi.org/10.1186/s13639-016-0026-x |
148 | ″ | rdf:type | schema:CreativeWork |
149 | grid-institutes:grid.16299.35 | schema:alternateName | Department of Informatics, Athens University of Economics and Business, Athens, Greece |
150 | ″ | schema:name | Department of Informatics, Athens University of Economics and Business, Athens, Greece |
151 | ″ | rdf:type | schema:Organization |