Running resilient MPI applications on a Dynamic Group of Recommended Processes View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2018-12

AUTHORS

Edson Tavares de Camargo, Elias P. Duarte

ABSTRACT

High-performance computing systems run applications that can take several hours to execute and have to deal with the occurrence of a potentially large number of faults. Most of the existing fault-tolerant strategies for these systems assume crash faults that are permanent events are easily detected. This is not the case in several real systems, in particular in shared clusters, in which even the load variation may cause performance problems that are virtually equivalent to faults. In this work, we present a new model to deal with this problem in which processes execute tests among themselves in order to determine whether the processors (or cores) on which they are running are recommended or non-recommended. Processes classified as recommended form a Dynamic Group of Recommended Processes (DGRP) that runs the application. The DGRP is formed only by processes that have not been tested as non-recommended by all DGRP processes. A process not in the DGRP that is continuously tested as recommended can rejoin the DGRP after a round of consensus executed by DGRP processes. Experimental results are presented obtained from a MPI-based implementation in which the HyperQuickSort parallel sorting algorithm reconfigures itself at runtime to tolerate up to N − 1 faults (in a system with N processes) while sorting up to 1 billion integers. More... »

PAGES

5

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s13173-018-0069-z

DOI

http://dx.doi.org/10.1186/s13173-018-0069-z

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1101506990


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0803", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computer Software", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Federal University of Paran\u00e1", 
          "id": "https://www.grid.ac/institutes/grid.20736.30", 
          "name": [
            "Department of Informatics, Federal University of Paran\u00e1 (UFPR), Curitiba, Brazil", 
            "Federal Technology University of Paran\u00e1 (UTFPR), Toledo, Brazil"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Camargo", 
        "givenName": "Edson Tavares de", 
        "id": "sg:person.012345711215.84", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012345711215.84"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Federal University of Paran\u00e1", 
          "id": "https://www.grid.ac/institutes/grid.20736.30", 
          "name": [
            "Department of Informatics, Federal University of Paran\u00e1 (UFPR), Curitiba, Brazil"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Duarte", 
        "givenName": "Elias P.", 
        "id": "sg:person.012555247627.06", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012555247627.06"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1007/978-3-642-24449-0_29", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003443911", 
          "https://doi.org/10.1007/978-3-642-24449-0_29"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-24449-0_29", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003443911", 
          "https://doi.org/10.1007/978-3-642-24449-0_29"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2807591.2807672", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008236935"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11227-013-0884-0", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008345250", 
          "https://doi.org/10.1007/s11227-013-0884-0"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-24449-0_40", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008760863", 
          "https://doi.org/10.1007/978-3-642-24449-0_40"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-24449-0_40", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008760863", 
          "https://doi.org/10.1007/978-3-642-24449-0_40"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11846802_44", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010542189", 
          "https://doi.org/10.1007/11846802_44"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/11846802_44", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010542189", 
          "https://doi.org/10.1007/11846802_44"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1122971.1122976", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1011235458"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-11294-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013696519", 
          "https://doi.org/10.1007/978-3-642-11294-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-11294-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013696519", 
          "https://doi.org/10.1007/978-3-642-11294-2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2063384.2063443", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017468612"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/52324.52356", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019666286"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2503210.2503271", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023022613"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-74466-5_11", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024736777", 
          "https://doi.org/10.1007/978-3-540-74466-5_11"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-74466-5_11", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024736777", 
          "https://doi.org/10.1007/978-3-540-74466-5_11"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s13173-012-0057-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031862907", 
          "https://doi.org/10.1007/s13173-012-0057-7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2807591.2807665", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031871801"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1023/b:clus.0000039491.64560.8a", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035687554", 
          "https://doi.org/10.1023/b:clus.0000039491.64560.8a"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1002/cpe.2859", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035762765"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-45255-9_47", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036518029", 
          "https://doi.org/10.1007/3-540-45255-9_47"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-45255-9_47", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036518029", 
          "https://doi.org/10.1007/3-540-45255-9_47"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10766-009-0115-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037077187", 
          "https://doi.org/10.1007/s10766-009-0115-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10766-009-0115-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037077187", 
          "https://doi.org/10.1007/s10766-009-0115-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s10766-009-0115-8", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037077187", 
          "https://doi.org/10.1007/s10766-009-0115-8"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-642-40861-8_7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037920462", 
          "https://doi.org/10.1007/978-3-642-40861-8_7"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1122480.1122497", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1038019810"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1922649.1922659", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042071984"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/0167-8191(96)00024-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045623305"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/165854.165874", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1048533918"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-21903-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051107728", 
          "https://doi.org/10.1007/978-3-319-21903-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-21903-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051107728", 
          "https://doi.org/10.1007/978-3-319-21903-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-21903-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051107728", 
          "https://doi.org/10.1007/978-3-319-21903-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-319-21903-5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051107728", 
          "https://doi.org/10.1007/978-3-319-21903-5"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1048935.1050204", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051164176"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2934664", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052134007"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/12.364542", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061088054"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/12.656078", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061088763"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/pgec.1967.264748", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061435674"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tc.1984.1676419", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061533044"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tc.1984.1676420", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061533045"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tc.1984.1676475", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061533087"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tpds.2008.58", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061753396"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tr.2013.2284743", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061783752"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342004046045", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977009"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342004046045", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977009"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342004046052", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977012"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342004046052", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977012"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342006064482", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977078"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342006064482", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977078"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342013488238", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342013488238", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977321"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342014522573", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977347"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1177/1094342014522573", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1063977347"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/sc.2012.49", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093446264"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/dsn.2014.101", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093469657"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/sc.2014.78", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093607107"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/dsn.2014.62", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093682570"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icpads.2011.5", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093889480"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/hpcc.and.euc.2013.107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094228735"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/hpcc.and.euc.2013.107", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094228735"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/ipdps.2012.113", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094560695"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/dsnw.2012.6264677", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094585993"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icitcs.2014.7021746", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094639946"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/clustr.2002.1137727", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094734609"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icpads.2013.37", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095416894"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/ipdpsw.2014.165", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095724544"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-12", 
    "datePublishedReg": "2018-12-01", 
    "description": "High-performance computing systems run applications that can take several hours to execute and have to deal with the occurrence of a potentially large number of faults. Most of the existing fault-tolerant strategies for these systems assume crash faults that are permanent events are easily detected. This is not the case in several real systems, in particular in shared clusters, in which even the load variation may cause performance problems that are virtually equivalent to faults. In this work, we present a new model to deal with this problem in which processes execute tests among themselves in order to determine whether the processors (or cores) on which they are running are recommended or non-recommended. Processes classified as recommended form a Dynamic Group of Recommended Processes (DGRP) that runs the application. The DGRP is formed only by processes that have not been tested as non-recommended by all DGRP processes. A process not in the DGRP that is continuously tested as recommended can rejoin the DGRP after a round of consensus executed by DGRP processes. Experimental results are presented obtained from a MPI-based implementation in which the HyperQuickSort parallel sorting algorithm reconfigures itself at runtime to tolerate up to N \u2212 1 faults (in a system with N processes) while sorting up to 1 billion integers.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s13173-018-0069-z", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1136200", 
        "issn": [
          "0104-6500", 
          "1678-4804"
        ], 
        "name": "Journal of the Brazilian Computer Society", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "24"
      }
    ], 
    "name": "Running resilient MPI applications on a Dynamic Group of Recommended Processes", 
    "pagination": "5", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "598acb3b0c60616c296f6745b3367b494371a40ea31fb4a44bee28d9fe02fa4b"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s13173-018-0069-z"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1101506990"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s13173-018-0069-z", 
      "https://app.dimensions.ai/details/publication/pub.1101506990"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T11:32", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000357_0000000357/records_99328_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2Fs13173-018-0069-z"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s13173-018-0069-z'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s13173-018-0069-z'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s13173-018-0069-z'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s13173-018-0069-z'


 

This table displays all metadata directly associated to this object as RDF triples.

231 TRIPLES      21 PREDICATES      77 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s13173-018-0069-z schema:about anzsrc-for:08
2 anzsrc-for:0803
3 schema:author Nc5668b2fe7ce4c48b3a8b779a4dc1751
4 schema:citation sg:pub.10.1007/11846802_44
5 sg:pub.10.1007/3-540-45255-9_47
6 sg:pub.10.1007/978-3-319-21903-5
7 sg:pub.10.1007/978-3-540-74466-5_11
8 sg:pub.10.1007/978-3-642-11294-2
9 sg:pub.10.1007/978-3-642-24449-0_29
10 sg:pub.10.1007/978-3-642-24449-0_40
11 sg:pub.10.1007/978-3-642-40861-8_7
12 sg:pub.10.1007/s10766-009-0115-8
13 sg:pub.10.1007/s11227-013-0884-0
14 sg:pub.10.1007/s13173-012-0057-7
15 sg:pub.10.1023/b:clus.0000039491.64560.8a
16 https://doi.org/10.1002/cpe.2859
17 https://doi.org/10.1016/0167-8191(96)00024-5
18 https://doi.org/10.1109/12.364542
19 https://doi.org/10.1109/12.656078
20 https://doi.org/10.1109/clustr.2002.1137727
21 https://doi.org/10.1109/dsn.2014.101
22 https://doi.org/10.1109/dsn.2014.62
23 https://doi.org/10.1109/dsnw.2012.6264677
24 https://doi.org/10.1109/hpcc.and.euc.2013.107
25 https://doi.org/10.1109/icitcs.2014.7021746
26 https://doi.org/10.1109/icpads.2011.5
27 https://doi.org/10.1109/icpads.2013.37
28 https://doi.org/10.1109/ipdps.2012.113
29 https://doi.org/10.1109/ipdpsw.2014.165
30 https://doi.org/10.1109/pgec.1967.264748
31 https://doi.org/10.1109/sc.2012.49
32 https://doi.org/10.1109/sc.2014.78
33 https://doi.org/10.1109/tc.1984.1676419
34 https://doi.org/10.1109/tc.1984.1676420
35 https://doi.org/10.1109/tc.1984.1676475
36 https://doi.org/10.1109/tpds.2008.58
37 https://doi.org/10.1109/tr.2013.2284743
38 https://doi.org/10.1145/1048935.1050204
39 https://doi.org/10.1145/1122480.1122497
40 https://doi.org/10.1145/1122971.1122976
41 https://doi.org/10.1145/165854.165874
42 https://doi.org/10.1145/1922649.1922659
43 https://doi.org/10.1145/2063384.2063443
44 https://doi.org/10.1145/2503210.2503271
45 https://doi.org/10.1145/2807591.2807665
46 https://doi.org/10.1145/2807591.2807672
47 https://doi.org/10.1145/2934664
48 https://doi.org/10.1145/52324.52356
49 https://doi.org/10.1177/1094342004046045
50 https://doi.org/10.1177/1094342004046052
51 https://doi.org/10.1177/1094342006064482
52 https://doi.org/10.1177/1094342013488238
53 https://doi.org/10.1177/1094342014522573
54 schema:datePublished 2018-12
55 schema:datePublishedReg 2018-12-01
56 schema:description High-performance computing systems run applications that can take several hours to execute and have to deal with the occurrence of a potentially large number of faults. Most of the existing fault-tolerant strategies for these systems assume crash faults that are permanent events are easily detected. This is not the case in several real systems, in particular in shared clusters, in which even the load variation may cause performance problems that are virtually equivalent to faults. In this work, we present a new model to deal with this problem in which processes execute tests among themselves in order to determine whether the processors (or cores) on which they are running are recommended or non-recommended. Processes classified as recommended form a Dynamic Group of Recommended Processes (DGRP) that runs the application. The DGRP is formed only by processes that have not been tested as non-recommended by all DGRP processes. A process not in the DGRP that is continuously tested as recommended can rejoin the DGRP after a round of consensus executed by DGRP processes. Experimental results are presented obtained from a MPI-based implementation in which the HyperQuickSort parallel sorting algorithm reconfigures itself at runtime to tolerate up to N − 1 faults (in a system with N processes) while sorting up to 1 billion integers.
57 schema:genre research_article
58 schema:inLanguage en
59 schema:isAccessibleForFree true
60 schema:isPartOf N415df0548659452aafe8b5d349f1991b
61 Ndebe86ac8ce44064b4b7249d0f777806
62 sg:journal.1136200
63 schema:name Running resilient MPI applications on a Dynamic Group of Recommended Processes
64 schema:pagination 5
65 schema:productId N2593bd8530d748dd86ac769d5de79cd1
66 N352dffeb8b6f48f983c119ba2cc6b33b
67 N7e6e6d7d6d3b4e45ae1bb3f32a9a20ab
68 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101506990
69 https://doi.org/10.1186/s13173-018-0069-z
70 schema:sdDatePublished 2019-04-11T11:32
71 schema:sdLicense https://scigraph.springernature.com/explorer/license/
72 schema:sdPublisher N34c5e96f35e54e16a6a82874acbb88f9
73 schema:url https://link.springer.com/10.1186%2Fs13173-018-0069-z
74 sgo:license sg:explorer/license/
75 sgo:sdDataset articles
76 rdf:type schema:ScholarlyArticle
77 N2593bd8530d748dd86ac769d5de79cd1 schema:name readcube_id
78 schema:value 598acb3b0c60616c296f6745b3367b494371a40ea31fb4a44bee28d9fe02fa4b
79 rdf:type schema:PropertyValue
80 N34c5e96f35e54e16a6a82874acbb88f9 schema:name Springer Nature - SN SciGraph project
81 rdf:type schema:Organization
82 N352dffeb8b6f48f983c119ba2cc6b33b schema:name doi
83 schema:value 10.1186/s13173-018-0069-z
84 rdf:type schema:PropertyValue
85 N415df0548659452aafe8b5d349f1991b schema:volumeNumber 24
86 rdf:type schema:PublicationVolume
87 N7e6e6d7d6d3b4e45ae1bb3f32a9a20ab schema:name dimensions_id
88 schema:value pub.1101506990
89 rdf:type schema:PropertyValue
90 Naaed340dde4440c3a5e164f84119c842 rdf:first sg:person.012555247627.06
91 rdf:rest rdf:nil
92 Nc5668b2fe7ce4c48b3a8b779a4dc1751 rdf:first sg:person.012345711215.84
93 rdf:rest Naaed340dde4440c3a5e164f84119c842
94 Ndebe86ac8ce44064b4b7249d0f777806 schema:issueNumber 1
95 rdf:type schema:PublicationIssue
96 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
97 schema:name Information and Computing Sciences
98 rdf:type schema:DefinedTerm
99 anzsrc-for:0803 schema:inDefinedTermSet anzsrc-for:
100 schema:name Computer Software
101 rdf:type schema:DefinedTerm
102 sg:journal.1136200 schema:issn 0104-6500
103 1678-4804
104 schema:name Journal of the Brazilian Computer Society
105 rdf:type schema:Periodical
106 sg:person.012345711215.84 schema:affiliation https://www.grid.ac/institutes/grid.20736.30
107 schema:familyName Camargo
108 schema:givenName Edson Tavares de
109 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012345711215.84
110 rdf:type schema:Person
111 sg:person.012555247627.06 schema:affiliation https://www.grid.ac/institutes/grid.20736.30
112 schema:familyName Duarte
113 schema:givenName Elias P.
114 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012555247627.06
115 rdf:type schema:Person
116 sg:pub.10.1007/11846802_44 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010542189
117 https://doi.org/10.1007/11846802_44
118 rdf:type schema:CreativeWork
119 sg:pub.10.1007/3-540-45255-9_47 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036518029
120 https://doi.org/10.1007/3-540-45255-9_47
121 rdf:type schema:CreativeWork
122 sg:pub.10.1007/978-3-319-21903-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051107728
123 https://doi.org/10.1007/978-3-319-21903-5
124 rdf:type schema:CreativeWork
125 sg:pub.10.1007/978-3-540-74466-5_11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024736777
126 https://doi.org/10.1007/978-3-540-74466-5_11
127 rdf:type schema:CreativeWork
128 sg:pub.10.1007/978-3-642-11294-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013696519
129 https://doi.org/10.1007/978-3-642-11294-2
130 rdf:type schema:CreativeWork
131 sg:pub.10.1007/978-3-642-24449-0_29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003443911
132 https://doi.org/10.1007/978-3-642-24449-0_29
133 rdf:type schema:CreativeWork
134 sg:pub.10.1007/978-3-642-24449-0_40 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008760863
135 https://doi.org/10.1007/978-3-642-24449-0_40
136 rdf:type schema:CreativeWork
137 sg:pub.10.1007/978-3-642-40861-8_7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037920462
138 https://doi.org/10.1007/978-3-642-40861-8_7
139 rdf:type schema:CreativeWork
140 sg:pub.10.1007/s10766-009-0115-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037077187
141 https://doi.org/10.1007/s10766-009-0115-8
142 rdf:type schema:CreativeWork
143 sg:pub.10.1007/s11227-013-0884-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008345250
144 https://doi.org/10.1007/s11227-013-0884-0
145 rdf:type schema:CreativeWork
146 sg:pub.10.1007/s13173-012-0057-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031862907
147 https://doi.org/10.1007/s13173-012-0057-7
148 rdf:type schema:CreativeWork
149 sg:pub.10.1023/b:clus.0000039491.64560.8a schema:sameAs https://app.dimensions.ai/details/publication/pub.1035687554
150 https://doi.org/10.1023/b:clus.0000039491.64560.8a
151 rdf:type schema:CreativeWork
152 https://doi.org/10.1002/cpe.2859 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035762765
153 rdf:type schema:CreativeWork
154 https://doi.org/10.1016/0167-8191(96)00024-5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1045623305
155 rdf:type schema:CreativeWork
156 https://doi.org/10.1109/12.364542 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061088054
157 rdf:type schema:CreativeWork
158 https://doi.org/10.1109/12.656078 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061088763
159 rdf:type schema:CreativeWork
160 https://doi.org/10.1109/clustr.2002.1137727 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094734609
161 rdf:type schema:CreativeWork
162 https://doi.org/10.1109/dsn.2014.101 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093469657
163 rdf:type schema:CreativeWork
164 https://doi.org/10.1109/dsn.2014.62 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093682570
165 rdf:type schema:CreativeWork
166 https://doi.org/10.1109/dsnw.2012.6264677 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094585993
167 rdf:type schema:CreativeWork
168 https://doi.org/10.1109/hpcc.and.euc.2013.107 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094228735
169 rdf:type schema:CreativeWork
170 https://doi.org/10.1109/icitcs.2014.7021746 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094639946
171 rdf:type schema:CreativeWork
172 https://doi.org/10.1109/icpads.2011.5 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093889480
173 rdf:type schema:CreativeWork
174 https://doi.org/10.1109/icpads.2013.37 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095416894
175 rdf:type schema:CreativeWork
176 https://doi.org/10.1109/ipdps.2012.113 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094560695
177 rdf:type schema:CreativeWork
178 https://doi.org/10.1109/ipdpsw.2014.165 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095724544
179 rdf:type schema:CreativeWork
180 https://doi.org/10.1109/pgec.1967.264748 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061435674
181 rdf:type schema:CreativeWork
182 https://doi.org/10.1109/sc.2012.49 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093446264
183 rdf:type schema:CreativeWork
184 https://doi.org/10.1109/sc.2014.78 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093607107
185 rdf:type schema:CreativeWork
186 https://doi.org/10.1109/tc.1984.1676419 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061533044
187 rdf:type schema:CreativeWork
188 https://doi.org/10.1109/tc.1984.1676420 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061533045
189 rdf:type schema:CreativeWork
190 https://doi.org/10.1109/tc.1984.1676475 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061533087
191 rdf:type schema:CreativeWork
192 https://doi.org/10.1109/tpds.2008.58 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061753396
193 rdf:type schema:CreativeWork
194 https://doi.org/10.1109/tr.2013.2284743 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061783752
195 rdf:type schema:CreativeWork
196 https://doi.org/10.1145/1048935.1050204 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051164176
197 rdf:type schema:CreativeWork
198 https://doi.org/10.1145/1122480.1122497 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038019810
199 rdf:type schema:CreativeWork
200 https://doi.org/10.1145/1122971.1122976 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011235458
201 rdf:type schema:CreativeWork
202 https://doi.org/10.1145/165854.165874 schema:sameAs https://app.dimensions.ai/details/publication/pub.1048533918
203 rdf:type schema:CreativeWork
204 https://doi.org/10.1145/1922649.1922659 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042071984
205 rdf:type schema:CreativeWork
206 https://doi.org/10.1145/2063384.2063443 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017468612
207 rdf:type schema:CreativeWork
208 https://doi.org/10.1145/2503210.2503271 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023022613
209 rdf:type schema:CreativeWork
210 https://doi.org/10.1145/2807591.2807665 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031871801
211 rdf:type schema:CreativeWork
212 https://doi.org/10.1145/2807591.2807672 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008236935
213 rdf:type schema:CreativeWork
214 https://doi.org/10.1145/2934664 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052134007
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1145/52324.52356 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019666286
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1177/1094342004046045 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063977009
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1177/1094342004046052 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063977012
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1177/1094342006064482 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063977078
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1177/1094342013488238 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063977321
225 rdf:type schema:CreativeWork
226 https://doi.org/10.1177/1094342014522573 schema:sameAs https://app.dimensions.ai/details/publication/pub.1063977347
227 rdf:type schema:CreativeWork
228 https://www.grid.ac/institutes/grid.20736.30 schema:alternateName Federal University of Paraná
229 schema:name Department of Informatics, Federal University of Paraná (UFPR), Curitiba, Brazil
230 Federal Technology University of Paraná (UTFPR), Toledo, Brazil
231 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...