FTLLS: A fault tolerant, low latency, distributed scheduling approach based on sparrow View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2017-07-31

AUTHORS

Wenzhuo Li, Chuang Lin

ABSTRACT

Big data processing systems are developing towards larger degrees of parallelism and shorter task durations in order to achieve lower response time. Scheduling highly parallel tasks that complete in sub-seconds poses a great challenge to traditional centralized schedulers. Taking the challenge, researchers turn to distributed scheduling approaches to avoid the throughput limitation of centralized schedulers, among which Sparrow is a leading design. However, little effort is devoted to the fault tolerance of Sparrow and there are problems with Sparrow’s sample-based techniques, which gives rise to incomplete jobs and large scheduling latency. We then present Fault Tolerant, Low Latency Sparrow (FTLLS). It extends Sparrow with an assistant machine to handle worker failures and to make better scheduling decisions. Through simulations, it is proved that FTLLS can detect worker failures more quickly than a naive timeout approach and make better scheduling decisions than native Sparrow. Through implementation, the results show that FTLLS guarantees no incomplete jobs at the presence of worker failures and reduces scheduling latencies by over 1.5 × when compared to native Sparrow. In addition, the simplicity of the idea adopted by FTLLS makes it applicable to a wide variety of distributed scheduling approaches. More... »

PAGES

1129-1140

References to SciGraph publications

  • 2013-05-25. Handling partitioning skew in MapReduce using LEEN in PEER-TO-PEER NETWORKING AND APPLICATIONS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s12083-017-0590-4

    DOI

    http://dx.doi.org/10.1007/s12083-017-0590-4

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1090929214


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0803", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Computer Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China", 
              "id": "http://www.grid.ac/institutes/grid.12527.33", 
              "name": [
                "Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China", 
                "Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Li", 
            "givenName": "Wenzhuo", 
            "id": "sg:person.016621423341.22", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016621423341.22"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China", 
              "id": "http://www.grid.ac/institutes/grid.12527.33", 
              "name": [
                "Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China", 
                "Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lin", 
            "givenName": "Chuang", 
            "id": "sg:person.016506017017.91", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016506017017.91"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1007/s12083-013-0213-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1034226206", 
              "https://doi.org/10.1007/s12083-013-0213-7"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2017-07-31", 
        "datePublishedReg": "2017-07-31", 
        "description": "Big data processing systems are developing towards larger degrees of parallelism and shorter task durations in order to achieve lower response time. Scheduling highly parallel tasks that complete in sub-seconds poses a great challenge to traditional centralized schedulers. Taking the challenge, researchers turn to distributed scheduling approaches to avoid the throughput limitation of centralized schedulers, among which Sparrow is a leading design. However, little effort is devoted to the fault tolerance of Sparrow and there are problems with Sparrow\u2019s sample-based techniques, which gives rise to incomplete jobs and large scheduling latency. We then present Fault Tolerant, Low Latency Sparrow (FTLLS). It extends Sparrow with an assistant machine to handle worker failures and to make better scheduling decisions. Through simulations, it is proved that FTLLS can detect worker failures more quickly than a naive timeout approach and make better scheduling decisions than native Sparrow. Through implementation, the results show that FTLLS guarantees no incomplete jobs at the presence of worker failures and reduces scheduling latencies by over 1.5 \u00d7 when compared to native Sparrow. In addition, the simplicity of the idea adopted by FTLLS makes it applicable to a wide variety of distributed scheduling approaches.", 
        "genre": "article", 
        "id": "sg:pub.10.1007/s12083-017-0590-4", 
        "isAccessibleForFree": false, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.8297748", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1136039", 
            "issn": [
              "1936-6442", 
              "1936-6450"
            ], 
            "name": "Peer-to-Peer Networking and Applications", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "5", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "11"
          }
        ], 
        "keywords": [
          "better scheduling decisions", 
          "sampling-based techniques", 
          "centralized scheduler", 
          "scheduling approach", 
          "scheduling decisions", 
          "big data processing systems", 
          "worker failures", 
          "data processing system", 
          "low response time", 
          "shorter task durations", 
          "parallel tasks", 
          "scheduling latency", 
          "low latency", 
          "fault tolerance", 
          "Fault-Tolerant", 
          "processing system", 
          "throughput limitations", 
          "scheduler", 
          "task duration", 
          "response time", 
          "little effort", 
          "latency", 
          "great challenge", 
          "parallelism", 
          "machine", 
          "task", 
          "challenges", 
          "decisions", 
          "jobs", 
          "wide variety", 
          "implementation", 
          "tolerant", 
          "faults", 
          "researchers", 
          "system", 
          "simplicity", 
          "idea", 
          "design", 
          "technique", 
          "simulations", 
          "large degree", 
          "limitations", 
          "order", 
          "efforts", 
          "variety", 
          "time", 
          "results", 
          "failure", 
          "addition", 
          "degree", 
          "tolerance", 
          "rise", 
          "approach", 
          "problem", 
          "presence", 
          "duration", 
          "sparrows"
        ], 
        "name": "FTLLS: A fault tolerant, low latency, distributed scheduling approach based on sparrow", 
        "pagination": "1129-1140", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1090929214"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s12083-017-0590-4"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s12083-017-0590-4", 
          "https://app.dimensions.ai/details/publication/pub.1090929214"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-08-04T17:05", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220804/entities/gbq_results/article/article_722.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1007/s12083-017-0590-4"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s12083-017-0590-4'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s12083-017-0590-4'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s12083-017-0590-4'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s12083-017-0590-4'


     

    This table displays all metadata directly associated to this object as RDF triples.

    128 TRIPLES      21 PREDICATES      82 URIs      73 LITERALS      6 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s12083-017-0590-4 schema:about anzsrc-for:08
    2 anzsrc-for:0803
    3 schema:author N3ec3b72d62a34ace96b914d050d9a308
    4 schema:citation sg:pub.10.1007/s12083-013-0213-7
    5 schema:datePublished 2017-07-31
    6 schema:datePublishedReg 2017-07-31
    7 schema:description Big data processing systems are developing towards larger degrees of parallelism and shorter task durations in order to achieve lower response time. Scheduling highly parallel tasks that complete in sub-seconds poses a great challenge to traditional centralized schedulers. Taking the challenge, researchers turn to distributed scheduling approaches to avoid the throughput limitation of centralized schedulers, among which Sparrow is a leading design. However, little effort is devoted to the fault tolerance of Sparrow and there are problems with Sparrow’s sample-based techniques, which gives rise to incomplete jobs and large scheduling latency. We then present Fault Tolerant, Low Latency Sparrow (FTLLS). It extends Sparrow with an assistant machine to handle worker failures and to make better scheduling decisions. Through simulations, it is proved that FTLLS can detect worker failures more quickly than a naive timeout approach and make better scheduling decisions than native Sparrow. Through implementation, the results show that FTLLS guarantees no incomplete jobs at the presence of worker failures and reduces scheduling latencies by over 1.5 × when compared to native Sparrow. In addition, the simplicity of the idea adopted by FTLLS makes it applicable to a wide variety of distributed scheduling approaches.
    8 schema:genre article
    9 schema:isAccessibleForFree false
    10 schema:isPartOf N20051f7b9c9a415cb9f63090a30d5d48
    11 Ne439ae275a99429fb816269c19d7e70c
    12 sg:journal.1136039
    13 schema:keywords Fault-Tolerant
    14 addition
    15 approach
    16 better scheduling decisions
    17 big data processing systems
    18 centralized scheduler
    19 challenges
    20 data processing system
    21 decisions
    22 degree
    23 design
    24 duration
    25 efforts
    26 failure
    27 fault tolerance
    28 faults
    29 great challenge
    30 idea
    31 implementation
    32 jobs
    33 large degree
    34 latency
    35 limitations
    36 little effort
    37 low latency
    38 low response time
    39 machine
    40 order
    41 parallel tasks
    42 parallelism
    43 presence
    44 problem
    45 processing system
    46 researchers
    47 response time
    48 results
    49 rise
    50 sampling-based techniques
    51 scheduler
    52 scheduling approach
    53 scheduling decisions
    54 scheduling latency
    55 shorter task durations
    56 simplicity
    57 simulations
    58 sparrows
    59 system
    60 task
    61 task duration
    62 technique
    63 throughput limitations
    64 time
    65 tolerance
    66 tolerant
    67 variety
    68 wide variety
    69 worker failures
    70 schema:name FTLLS: A fault tolerant, low latency, distributed scheduling approach based on sparrow
    71 schema:pagination 1129-1140
    72 schema:productId N9bd20c000edc46e085ed1d80db0de4a8
    73 Na06060e33b764e83a61b4440aae2b690
    74 schema:sameAs https://app.dimensions.ai/details/publication/pub.1090929214
    75 https://doi.org/10.1007/s12083-017-0590-4
    76 schema:sdDatePublished 2022-08-04T17:05
    77 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    78 schema:sdPublisher N383536e6e0304dd2a23d9fcce53ad4ae
    79 schema:url https://doi.org/10.1007/s12083-017-0590-4
    80 sgo:license sg:explorer/license/
    81 sgo:sdDataset articles
    82 rdf:type schema:ScholarlyArticle
    83 N20051f7b9c9a415cb9f63090a30d5d48 schema:issueNumber 5
    84 rdf:type schema:PublicationIssue
    85 N383536e6e0304dd2a23d9fcce53ad4ae schema:name Springer Nature - SN SciGraph project
    86 rdf:type schema:Organization
    87 N3ec3b72d62a34ace96b914d050d9a308 rdf:first sg:person.016621423341.22
    88 rdf:rest N845c733402ea4731914051bcdf4d9d12
    89 N845c733402ea4731914051bcdf4d9d12 rdf:first sg:person.016506017017.91
    90 rdf:rest rdf:nil
    91 N9bd20c000edc46e085ed1d80db0de4a8 schema:name dimensions_id
    92 schema:value pub.1090929214
    93 rdf:type schema:PropertyValue
    94 Na06060e33b764e83a61b4440aae2b690 schema:name doi
    95 schema:value 10.1007/s12083-017-0590-4
    96 rdf:type schema:PropertyValue
    97 Ne439ae275a99429fb816269c19d7e70c schema:volumeNumber 11
    98 rdf:type schema:PublicationVolume
    99 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    100 schema:name Information and Computing Sciences
    101 rdf:type schema:DefinedTerm
    102 anzsrc-for:0803 schema:inDefinedTermSet anzsrc-for:
    103 schema:name Computer Software
    104 rdf:type schema:DefinedTerm
    105 sg:grant.8297748 http://pending.schema.org/fundedItem sg:pub.10.1007/s12083-017-0590-4
    106 rdf:type schema:MonetaryGrant
    107 sg:journal.1136039 schema:issn 1936-6442
    108 1936-6450
    109 schema:name Peer-to-Peer Networking and Applications
    110 schema:publisher Springer Nature
    111 rdf:type schema:Periodical
    112 sg:person.016506017017.91 schema:affiliation grid-institutes:grid.12527.33
    113 schema:familyName Lin
    114 schema:givenName Chuang
    115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016506017017.91
    116 rdf:type schema:Person
    117 sg:person.016621423341.22 schema:affiliation grid-institutes:grid.12527.33
    118 schema:familyName Li
    119 schema:givenName Wenzhuo
    120 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016621423341.22
    121 rdf:type schema:Person
    122 sg:pub.10.1007/s12083-013-0213-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034226206
    123 https://doi.org/10.1007/s12083-013-0213-7
    124 rdf:type schema:CreativeWork
    125 grid-institutes:grid.12527.33 schema:alternateName Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China
    126 schema:name Department of Computer Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China
    127 Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China
    128 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...