A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2017-07-03

AUTHORS

Nikos Zacheilas, Vana Kalogeraki

ABSTRACT

In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors. More... »

PAGES

29

References to SciGraph publications

  • 2006. Advanced Reservation-Based Scheduling of Task Graphs on Clusters in HIGH PERFORMANCE COMPUTING - HIPC 2006
  • 2016-09-03. Intelligent Urban Data Monitoring for Smart Cities in MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES
  • 2016-01-15. Cloud-assisted QoE guarantee scheme based on adaptive cross-layer perceptron of artificial neural network for mobile Internet in EURASIP JOURNAL ON EMBEDDED SYSTEMS
  • 2016-09-03. INSIGHT: Dynamic Traffic Management Using Heterogeneous Urban Data in MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES
  • 2016-03-29. Profit-oriented task scheduling algorithm in Hadoop cluster in EURASIP JOURNAL ON EMBEDDED SYSTEMS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7

    DOI

    http://dx.doi.org/10.1186/s13639-017-0077-7

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1090335791


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/09", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Engineering", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0906", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Electrical and Electronic Engineering", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Informatics, Athens University of Economics and Business, Athens, Greece", 
              "id": "http://www.grid.ac/institutes/grid.16299.35", 
              "name": [
                "Department of Informatics, Athens University of Economics and Business, Athens, Greece"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zacheilas", 
            "givenName": "Nikos", 
            "id": "sg:person.015502310643.13", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015502310643.13"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Informatics, Athens University of Economics and Business, Athens, Greece", 
              "id": "http://www.grid.ac/institutes/grid.16299.35", 
              "name": [
                "Department of Informatics, Athens University of Economics and Business, Athens, Greece"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kalogeraki", 
            "givenName": "Vana", 
            "id": "sg:person.011170521233.42", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011170521233.42"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/11945918_12", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1042193684", 
              "https://doi.org/10.1007/11945918_12"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-46131-1_23", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044038752", 
              "https://doi.org/10.1007/978-3-319-46131-1_23"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13639-016-0026-x", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049544096", 
              "https://doi.org/10.1186/s13639-016-0026-x"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-46131-1_5", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037263776", 
              "https://doi.org/10.1007/978-3-319-46131-1_5"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s13639-016-0022-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044202872", 
              "https://doi.org/10.1186/s13639-016-0022-1"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2017-07-03", 
        "datePublishedReg": "2017-07-03", 
        "description": "In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon\u2019s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks\u2019 parameters that allows us to further minimize the user\u2019s spending budget and the jobs\u2019 execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors.", 
        "genre": "article", 
        "id": "sg:pub.10.1186/s13639-017-0077-7", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.5051116", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1136306", 
            "issn": [
              "1687-3955", 
              "1687-3963"
            ], 
            "name": "EURASIP Journal on Embedded Systems", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "2017"
          }
        ], 
        "keywords": [
          "cyber-physical systems", 
          "MapReduce workloads", 
          "execution time", 
          "large-scale data processing", 
          "cluster environment", 
          "major computing companies", 
          "MapReduce programming model", 
          "corresponding monetary cost", 
          "job execution time", 
          "detailed experimental evaluation", 
          "monetary cost", 
          "optimal resource allocation", 
          "spending budget", 
          "cloud providers", 
          "Amazon EC2", 
          "automatic configuration", 
          "user workload", 
          "O operations", 
          "novel Pareto", 
          "following contributions", 
          "synthetic datasets", 
          "data processing", 
          "experimental evaluation", 
          "programming model", 
          "basic tasks", 
          "resource allocation", 
          "workload", 
          "EC2", 
          "scheduler", 
          "physical systems", 
          "large amount", 
          "recent years", 
          "users", 
          "performance", 
          "environment", 
          "Pareto", 
          "dataset", 
          "task", 
          "cost", 
          "processing", 
          "allocation", 
          "providers", 
          "budget", 
          "companies", 
          "operation", 
          "system", 
          "demand", 
          "need", 
          "model", 
          "data", 
          "competitors", 
          "order", 
          "evaluation", 
          "configuration", 
          "time", 
          "amount", 
          "fact", 
          "consideration", 
          "contribution", 
          "parameters", 
          "respect", 
          "years", 
          "problem", 
          "approach", 
          "paper"
        ], 
        "name": "A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads", 
        "pagination": "29", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1090335791"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1186/s13639-017-0077-7"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1186/s13639-017-0077-7", 
          "https://app.dimensions.ai/details/publication/pub.1090335791"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T17:04", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_724.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1186/s13639-017-0077-7"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13639-017-0077-7'


     

    This table displays all metadata directly associated to this object as RDF triples.

    151 TRIPLES      21 PREDICATES      94 URIs      81 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1186/s13639-017-0077-7 schema:about anzsrc-for:09
    2 anzsrc-for:0906
    3 schema:author Nca395c937af2456cba637ef79b1d3767
    4 schema:citation sg:pub.10.1007/11945918_12
    5 sg:pub.10.1007/978-3-319-46131-1_23
    6 sg:pub.10.1007/978-3-319-46131-1_5
    7 sg:pub.10.1186/s13639-016-0022-1
    8 sg:pub.10.1186/s13639-016-0026-x
    9 schema:datePublished 2017-07-03
    10 schema:datePublishedReg 2017-07-03
    11 schema:description In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors.
    12 schema:genre article
    13 schema:isAccessibleForFree true
    14 schema:isPartOf N74fc8a1a199b4e97b0e4a9fc48a0d244
    15 Nc5571b42c11a48768997661271e4dc6e
    16 sg:journal.1136306
    17 schema:keywords Amazon EC2
    18 EC2
    19 MapReduce programming model
    20 MapReduce workloads
    21 O operations
    22 Pareto
    23 allocation
    24 amount
    25 approach
    26 automatic configuration
    27 basic tasks
    28 budget
    29 cloud providers
    30 cluster environment
    31 companies
    32 competitors
    33 configuration
    34 consideration
    35 contribution
    36 corresponding monetary cost
    37 cost
    38 cyber-physical systems
    39 data
    40 data processing
    41 dataset
    42 demand
    43 detailed experimental evaluation
    44 environment
    45 evaluation
    46 execution time
    47 experimental evaluation
    48 fact
    49 following contributions
    50 job execution time
    51 large amount
    52 large-scale data processing
    53 major computing companies
    54 model
    55 monetary cost
    56 need
    57 novel Pareto
    58 operation
    59 optimal resource allocation
    60 order
    61 paper
    62 parameters
    63 performance
    64 physical systems
    65 problem
    66 processing
    67 programming model
    68 providers
    69 recent years
    70 resource allocation
    71 respect
    72 scheduler
    73 spending budget
    74 synthetic datasets
    75 system
    76 task
    77 time
    78 user workload
    79 users
    80 workload
    81 years
    82 schema:name A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads
    83 schema:pagination 29
    84 schema:productId N2521755875d44789a2b77e0a982b8482
    85 Nf87d3bef57f7490f96afb703408c1764
    86 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090335791
    87 https://doi.org/10.1186/s13639-017-0077-7
    88 schema:sdDatePublished 2022-08-04T17:04
    89 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    90 schema:sdPublisher Nc6ece35504224dcba756433a128280f5
    91 schema:url https://doi.org/10.1186/s13639-017-0077-7
    92 sgo:license sg:explorer/license/
    93 sgo:sdDataset articles
    94 rdf:type schema:ScholarlyArticle
    95 N2521755875d44789a2b77e0a982b8482 schema:name doi
    96 schema:value 10.1186/s13639-017-0077-7
    97 rdf:type schema:PropertyValue
    98 N74fc8a1a199b4e97b0e4a9fc48a0d244 schema:issueNumber 1
    99 rdf:type schema:PublicationIssue
    100 Na6d3930111d649879966de44abd0d56e rdf:first sg:person.011170521233.42
    101 rdf:rest rdf:nil
    102 Nc5571b42c11a48768997661271e4dc6e schema:volumeNumber 2017
    103 rdf:type schema:PublicationVolume
    104 Nc6ece35504224dcba756433a128280f5 schema:name Springer Nature - SN SciGraph project
    105 rdf:type schema:Organization
    106 Nca395c937af2456cba637ef79b1d3767 rdf:first sg:person.015502310643.13
    107 rdf:rest Na6d3930111d649879966de44abd0d56e
    108 Nf87d3bef57f7490f96afb703408c1764 schema:name dimensions_id
    109 schema:value pub.1090335791
    110 rdf:type schema:PropertyValue
    111 anzsrc-for:09 schema:inDefinedTermSet anzsrc-for:
    112 schema:name Engineering
    113 rdf:type schema:DefinedTerm
    114 anzsrc-for:0906 schema:inDefinedTermSet anzsrc-for:
    115 schema:name Electrical and Electronic Engineering
    116 rdf:type schema:DefinedTerm
    117 sg:grant.5051116 http://pending.schema.org/fundedItem sg:pub.10.1186/s13639-017-0077-7
    118 rdf:type schema:MonetaryGrant
    119 sg:journal.1136306 schema:issn 1687-3955
    120 1687-3963
    121 schema:name EURASIP Journal on Embedded Systems
    122 schema:publisher Springer Nature
    123 rdf:type schema:Periodical
    124 sg:person.011170521233.42 schema:affiliation grid-institutes:grid.16299.35
    125 schema:familyName Kalogeraki
    126 schema:givenName Vana
    127 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011170521233.42
    128 rdf:type schema:Person
    129 sg:person.015502310643.13 schema:affiliation grid-institutes:grid.16299.35
    130 schema:familyName Zacheilas
    131 schema:givenName Nikos
    132 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015502310643.13
    133 rdf:type schema:Person
    134 sg:pub.10.1007/11945918_12 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042193684
    135 https://doi.org/10.1007/11945918_12
    136 rdf:type schema:CreativeWork
    137 sg:pub.10.1007/978-3-319-46131-1_23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044038752
    138 https://doi.org/10.1007/978-3-319-46131-1_23
    139 rdf:type schema:CreativeWork
    140 sg:pub.10.1007/978-3-319-46131-1_5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037263776
    141 https://doi.org/10.1007/978-3-319-46131-1_5
    142 rdf:type schema:CreativeWork
    143 sg:pub.10.1186/s13639-016-0022-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044202872
    144 https://doi.org/10.1186/s13639-016-0022-1
    145 rdf:type schema:CreativeWork
    146 sg:pub.10.1186/s13639-016-0026-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1049544096
    147 https://doi.org/10.1186/s13639-016-0026-x
    148 rdf:type schema:CreativeWork
    149 grid-institutes:grid.16299.35 schema:alternateName Department of Informatics, Athens University of Economics and Business, Athens, Greece
    150 schema:name Department of Informatics, Athens University of Economics and Business, Athens, Greece
    151 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...