Failure prediction using machine learning in a virtualised HPC system and application View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2019-03-21

AUTHORS

Bashir Mohammed, Irfan Awan, Hassan Ugail, Muhammad Younas

ABSTRACT

Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), classification and regression trees (CART) and linear discriminant analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This finding implies that our method can effectively predict all possible future system and application failures within the system. More... »

PAGES

1-15

References to SciGraph publications

  • 2017-09. Recent advancements in resource allocation techniques for cloud computing environment: a systematic review in CLUSTER COMPUTING
  • 2018-03-30. An adaptive overload threshold selection process using Markov decision processes of virtual machine in cloud data center in CLUSTER COMPUTING
  • 2013. Machine Learning Strategies for Time Series Forecasting in BUSINESS INTELLIGENCE
  • 2017-11-23. Performance prediction of parallel computing models to analyze cloud-based big data applications in CLUSTER COMPUTING
  • 2018-02-07. Software defect prediction techniques using metrics based on neural network classifier in CLUSTER COMPUTING
  • 2018-03-03. Software reliability modeling using increased failure interval with ANN in CLUSTER COMPUTING
  • 2018-03-10. Cost-effective and fault-resilient reusability prediction model by using adaptive genetic algorithm based neural network for web-of-service applications in CLUSTER COMPUTING
  • 2017-09-27. A survey of deep learning-based network anomaly detection in CLUSTER COMPUTING
  • 2017-10. Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques in INFORMATION SYSTEMS FRONTIERS
  • 2017-12. Adaptive resource provisioning method using application-aware machine learning based on job history in heterogeneous infrastructures in CLUSTER COMPUTING
  • 2019-01-05. A study on performance measures for auto-scaling CPU-intensive containerized applications in CLUSTER COMPUTING
  • 2018-01-19. Deep neural network based hybrid approach for software defect prediction using software metrics in CLUSTER COMPUTING
  • 2018-02-20. High performance feature selection algorithms using filter method for cloud-based recommendation system in CLUSTER COMPUTING
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1007/s10586-019-02917-1

    DOI

    http://dx.doi.org/10.1007/s10586-019-02917-1

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1112918736


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Artificial Intelligence and Image Processing", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Information and Computing Sciences", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "University of Bradford", 
              "id": "https://www.grid.ac/institutes/grid.6268.a", 
              "name": [
                "School of Electrical Engineering and Computer Science, University of Bradford, BD7 1DP, Bradford, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Mohammed", 
            "givenName": "Bashir", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Bradford", 
              "id": "https://www.grid.ac/institutes/grid.6268.a", 
              "name": [
                "School of Electrical Engineering and Computer Science, University of Bradford, BD7 1DP, Bradford, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Awan", 
            "givenName": "Irfan", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "University of Bradford", 
              "id": "https://www.grid.ac/institutes/grid.6268.a", 
              "name": [
                "School of Electrical Engineering and Computer Science, University of Bradford, BD7 1DP, Bradford, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Ugail", 
            "givenName": "Hassan", 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Oxford Brookes University", 
              "id": "https://www.grid.ac/institutes/grid.7628.b", 
              "name": [
                "Department of Computing & Communication Technologies, Oxford Brookes University, OX33 1HX, Oxford, UK"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Younas", 
            "givenName": "Muhammad", 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "https://doi.org/10.1016/s0360-8352(02)00036-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003150782"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-016-0684-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014017089", 
              "https://doi.org/10.1007/s10586-016-0684-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-016-0684-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1014017089", 
              "https://doi.org/10.1007/s10586-016-0684-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.2197/ipsjjip.24.371", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017905621"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1016/j.parco.2015.07.001", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027314579"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/1807128.1807161", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037053442"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10796-016-9680-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1040681212", 
              "https://doi.org/10.1007/s10796-016-9680-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10796-016-9680-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1040681212", 
              "https://doi.org/10.1007/s10796-016-9680-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-36318-4_3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1041145295", 
              "https://doi.org/10.1007/978-3-642-36318-4_3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/72.788640", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061219233"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tdsc.2009.4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061585235"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/tdsc.2009.4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1061585235"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1198/106186002317375712", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1064199308"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.2174/1874110x01509010044", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1069226963"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.5120/20435-2768", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1072602064"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1002/spe.2491", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1084516934"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-017-1148-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1091477548", 
              "https://doi.org/10.1007/s10586-017-1148-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-017-1117-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1091976884", 
              "https://doi.org/10.1007/s10586-017-1117-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-017-1385-3", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1092947337", 
              "https://doi.org/10.1007/s10586-017-1385-3"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/services.2014.20", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094140515"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/ccgrid.2010.112", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094155836"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/hpcsim.2009.5192685", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094191003"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/ccdc.2017.7978640", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094231525"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/hase.2014.24", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094319838"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/rams.2016.7448033", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1094507983"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icdcs.2012.56", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095233613"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/sc.2012.11", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095358387"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/bigdataservice.2016.10", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095394676"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/icmcs.2014.6911300", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095529505"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/dsn.2004.1311948", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095724577"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1109/ipdpsw.2016.124", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1095763654"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1696-z", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100470435", 
              "https://doi.org/10.1007/s10586-018-1696-z"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1730-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100857480", 
              "https://doi.org/10.1007/s10586-018-1730-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1901-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101127458", 
              "https://doi.org/10.1007/s10586-018-1901-0"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1901-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101127458", 
              "https://doi.org/10.1007/s10586-018-1901-0"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1942-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101321932", 
              "https://doi.org/10.1007/s10586-018-1942-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1942-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101321932", 
              "https://doi.org/10.1007/s10586-018-1942-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-1942-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101321932", 
              "https://doi.org/10.1007/s10586-018-1942-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2359-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101406227", 
              "https://doi.org/10.1007/s10586-018-2359-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2359-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101406227", 
              "https://doi.org/10.1007/s10586-018-2359-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2359-9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101406227", 
              "https://doi.org/10.1007/s10586-018-2359-9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2408-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101845335", 
              "https://doi.org/10.1007/s10586-018-2408-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2408-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101845335", 
              "https://doi.org/10.1007/s10586-018-2408-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-2408-4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101845335", 
              "https://doi.org/10.1007/s10586-018-2408-4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/3217871.3217876", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1104470384"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "https://doi.org/10.1145/3217871.3217876", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1104470384"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10586-018-02890-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1111158521", 
              "https://doi.org/10.1007/s10586-018-02890-1"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2019-03-21", 
        "datePublishedReg": "2019-03-21", 
        "description": "Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), classification and regression trees (CART) and linear discriminant analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This finding implies that our method can effectively predict all possible future system and application failures within the system.", 
        "genre": "research_article", 
        "id": "sg:pub.10.1007/s10586-019-02917-1", 
        "inLanguage": [
          "en"
        ], 
        "isAccessibleForFree": false, 
        "isPartOf": [
          {
            "id": "sg:journal.1046649", 
            "issn": [
              "1386-7857", 
              "1573-7543"
            ], 
            "name": "Cluster Computing", 
            "type": "Periodical"
          }
        ], 
        "name": "Failure prediction using machine learning in a virtualised HPC system and application", 
        "pagination": "1-15", 
        "productId": [
          {
            "name": "readcube_id", 
            "type": "PropertyValue", 
            "value": [
              "f36f7d76cdfc8996715180641f3d592b95a6951c74e0d4e8cb0df9da724daeaf"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1007/s10586-019-02917-1"
            ]
          }, 
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1112918736"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1007/s10586-019-02917-1", 
          "https://app.dimensions.ai/details/publication/pub.1112918736"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2019-04-11T12:54", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000364_0000000364/records_72859_00000001.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://link.springer.com/10.1007%2Fs10586-019-02917-1"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s10586-019-02917-1'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s10586-019-02917-1'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s10586-019-02917-1'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s10586-019-02917-1'


     

    This table displays all metadata directly associated to this object as RDF triples.

    196 TRIPLES      21 PREDICATES      60 URIs      16 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1007/s10586-019-02917-1 schema:about anzsrc-for:08
    2 anzsrc-for:0801
    3 schema:author Ne75bf3ab259b485f9484f01e51f48d2b
    4 schema:citation sg:pub.10.1007/978-3-642-36318-4_3
    5 sg:pub.10.1007/s10586-016-0684-4
    6 sg:pub.10.1007/s10586-017-1117-8
    7 sg:pub.10.1007/s10586-017-1148-1
    8 sg:pub.10.1007/s10586-017-1385-3
    9 sg:pub.10.1007/s10586-018-02890-1
    10 sg:pub.10.1007/s10586-018-1696-z
    11 sg:pub.10.1007/s10586-018-1730-1
    12 sg:pub.10.1007/s10586-018-1901-0
    13 sg:pub.10.1007/s10586-018-1942-4
    14 sg:pub.10.1007/s10586-018-2359-9
    15 sg:pub.10.1007/s10586-018-2408-4
    16 sg:pub.10.1007/s10796-016-9680-8
    17 https://doi.org/10.1002/spe.2491
    18 https://doi.org/10.1016/j.parco.2015.07.001
    19 https://doi.org/10.1016/s0360-8352(02)00036-0
    20 https://doi.org/10.1109/72.788640
    21 https://doi.org/10.1109/bigdataservice.2016.10
    22 https://doi.org/10.1109/ccdc.2017.7978640
    23 https://doi.org/10.1109/ccgrid.2010.112
    24 https://doi.org/10.1109/dsn.2004.1311948
    25 https://doi.org/10.1109/hase.2014.24
    26 https://doi.org/10.1109/hpcsim.2009.5192685
    27 https://doi.org/10.1109/icdcs.2012.56
    28 https://doi.org/10.1109/icmcs.2014.6911300
    29 https://doi.org/10.1109/ipdpsw.2016.124
    30 https://doi.org/10.1109/rams.2016.7448033
    31 https://doi.org/10.1109/sc.2012.11
    32 https://doi.org/10.1109/services.2014.20
    33 https://doi.org/10.1109/tdsc.2009.4
    34 https://doi.org/10.1145/1807128.1807161
    35 https://doi.org/10.1145/3217871.3217876
    36 https://doi.org/10.1198/106186002317375712
    37 https://doi.org/10.2174/1874110x01509010044
    38 https://doi.org/10.2197/ipsjjip.24.371
    39 https://doi.org/10.5120/20435-2768
    40 schema:datePublished 2019-03-21
    41 schema:datePublishedReg 2019-03-21
    42 schema:description Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), classification and regression trees (CART) and linear discriminant analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This finding implies that our method can effectively predict all possible future system and application failures within the system.
    43 schema:genre research_article
    44 schema:inLanguage en
    45 schema:isAccessibleForFree false
    46 schema:isPartOf sg:journal.1046649
    47 schema:name Failure prediction using machine learning in a virtualised HPC system and application
    48 schema:pagination 1-15
    49 schema:productId N2f6df08679714c369ffa1531b98109e7
    50 N5a11d32bbf80487f9a7d7d0f602c3cb4
    51 N91b20053f8fb4efaaf2264c80bb4363b
    52 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112918736
    53 https://doi.org/10.1007/s10586-019-02917-1
    54 schema:sdDatePublished 2019-04-11T12:54
    55 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    56 schema:sdPublisher N99eafdc4254443c8a33943f4b3c18f30
    57 schema:url https://link.springer.com/10.1007%2Fs10586-019-02917-1
    58 sgo:license sg:explorer/license/
    59 sgo:sdDataset articles
    60 rdf:type schema:ScholarlyArticle
    61 N0eb5a1cfc3d5407ba426af6eb2dd3d2c schema:affiliation https://www.grid.ac/institutes/grid.6268.a
    62 schema:familyName Mohammed
    63 schema:givenName Bashir
    64 rdf:type schema:Person
    65 N2f6df08679714c369ffa1531b98109e7 schema:name dimensions_id
    66 schema:value pub.1112918736
    67 rdf:type schema:PropertyValue
    68 N5222af36eccc4ee5b03c5980e5630127 rdf:first N8fc0138851984196b22854d82671b606
    69 rdf:rest N69034f8c49154e18a5fa947795312949
    70 N5a11d32bbf80487f9a7d7d0f602c3cb4 schema:name readcube_id
    71 schema:value f36f7d76cdfc8996715180641f3d592b95a6951c74e0d4e8cb0df9da724daeaf
    72 rdf:type schema:PropertyValue
    73 N61c20c690fd34fab965753823aa44508 schema:affiliation https://www.grid.ac/institutes/grid.7628.b
    74 schema:familyName Younas
    75 schema:givenName Muhammad
    76 rdf:type schema:Person
    77 N69034f8c49154e18a5fa947795312949 rdf:first N61c20c690fd34fab965753823aa44508
    78 rdf:rest rdf:nil
    79 N8fc0138851984196b22854d82671b606 schema:affiliation https://www.grid.ac/institutes/grid.6268.a
    80 schema:familyName Ugail
    81 schema:givenName Hassan
    82 rdf:type schema:Person
    83 N91b20053f8fb4efaaf2264c80bb4363b schema:name doi
    84 schema:value 10.1007/s10586-019-02917-1
    85 rdf:type schema:PropertyValue
    86 N99eafdc4254443c8a33943f4b3c18f30 schema:name Springer Nature - SN SciGraph project
    87 rdf:type schema:Organization
    88 Ncb1c6240a9cc4a4a8f4e78d74fb109a6 rdf:first Ne1e178befba44727a168022b0bebd5e2
    89 rdf:rest N5222af36eccc4ee5b03c5980e5630127
    90 Ne1e178befba44727a168022b0bebd5e2 schema:affiliation https://www.grid.ac/institutes/grid.6268.a
    91 schema:familyName Awan
    92 schema:givenName Irfan
    93 rdf:type schema:Person
    94 Ne75bf3ab259b485f9484f01e51f48d2b rdf:first N0eb5a1cfc3d5407ba426af6eb2dd3d2c
    95 rdf:rest Ncb1c6240a9cc4a4a8f4e78d74fb109a6
    96 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
    97 schema:name Information and Computing Sciences
    98 rdf:type schema:DefinedTerm
    99 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
    100 schema:name Artificial Intelligence and Image Processing
    101 rdf:type schema:DefinedTerm
    102 sg:journal.1046649 schema:issn 1386-7857
    103 1573-7543
    104 schema:name Cluster Computing
    105 rdf:type schema:Periodical
    106 sg:pub.10.1007/978-3-642-36318-4_3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041145295
    107 https://doi.org/10.1007/978-3-642-36318-4_3
    108 rdf:type schema:CreativeWork
    109 sg:pub.10.1007/s10586-016-0684-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014017089
    110 https://doi.org/10.1007/s10586-016-0684-4
    111 rdf:type schema:CreativeWork
    112 sg:pub.10.1007/s10586-017-1117-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091976884
    113 https://doi.org/10.1007/s10586-017-1117-8
    114 rdf:type schema:CreativeWork
    115 sg:pub.10.1007/s10586-017-1148-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091477548
    116 https://doi.org/10.1007/s10586-017-1148-1
    117 rdf:type schema:CreativeWork
    118 sg:pub.10.1007/s10586-017-1385-3 schema:sameAs https://app.dimensions.ai/details/publication/pub.1092947337
    119 https://doi.org/10.1007/s10586-017-1385-3
    120 rdf:type schema:CreativeWork
    121 sg:pub.10.1007/s10586-018-02890-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1111158521
    122 https://doi.org/10.1007/s10586-018-02890-1
    123 rdf:type schema:CreativeWork
    124 sg:pub.10.1007/s10586-018-1696-z schema:sameAs https://app.dimensions.ai/details/publication/pub.1100470435
    125 https://doi.org/10.1007/s10586-018-1696-z
    126 rdf:type schema:CreativeWork
    127 sg:pub.10.1007/s10586-018-1730-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100857480
    128 https://doi.org/10.1007/s10586-018-1730-1
    129 rdf:type schema:CreativeWork
    130 sg:pub.10.1007/s10586-018-1901-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101127458
    131 https://doi.org/10.1007/s10586-018-1901-0
    132 rdf:type schema:CreativeWork
    133 sg:pub.10.1007/s10586-018-1942-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101321932
    134 https://doi.org/10.1007/s10586-018-1942-4
    135 rdf:type schema:CreativeWork
    136 sg:pub.10.1007/s10586-018-2359-9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101406227
    137 https://doi.org/10.1007/s10586-018-2359-9
    138 rdf:type schema:CreativeWork
    139 sg:pub.10.1007/s10586-018-2408-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101845335
    140 https://doi.org/10.1007/s10586-018-2408-4
    141 rdf:type schema:CreativeWork
    142 sg:pub.10.1007/s10796-016-9680-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1040681212
    143 https://doi.org/10.1007/s10796-016-9680-8
    144 rdf:type schema:CreativeWork
    145 https://doi.org/10.1002/spe.2491 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084516934
    146 rdf:type schema:CreativeWork
    147 https://doi.org/10.1016/j.parco.2015.07.001 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027314579
    148 rdf:type schema:CreativeWork
    149 https://doi.org/10.1016/s0360-8352(02)00036-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003150782
    150 rdf:type schema:CreativeWork
    151 https://doi.org/10.1109/72.788640 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061219233
    152 rdf:type schema:CreativeWork
    153 https://doi.org/10.1109/bigdataservice.2016.10 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095394676
    154 rdf:type schema:CreativeWork
    155 https://doi.org/10.1109/ccdc.2017.7978640 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094231525
    156 rdf:type schema:CreativeWork
    157 https://doi.org/10.1109/ccgrid.2010.112 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094155836
    158 rdf:type schema:CreativeWork
    159 https://doi.org/10.1109/dsn.2004.1311948 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095724577
    160 rdf:type schema:CreativeWork
    161 https://doi.org/10.1109/hase.2014.24 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094319838
    162 rdf:type schema:CreativeWork
    163 https://doi.org/10.1109/hpcsim.2009.5192685 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094191003
    164 rdf:type schema:CreativeWork
    165 https://doi.org/10.1109/icdcs.2012.56 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095233613
    166 rdf:type schema:CreativeWork
    167 https://doi.org/10.1109/icmcs.2014.6911300 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095529505
    168 rdf:type schema:CreativeWork
    169 https://doi.org/10.1109/ipdpsw.2016.124 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095763654
    170 rdf:type schema:CreativeWork
    171 https://doi.org/10.1109/rams.2016.7448033 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094507983
    172 rdf:type schema:CreativeWork
    173 https://doi.org/10.1109/sc.2012.11 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095358387
    174 rdf:type schema:CreativeWork
    175 https://doi.org/10.1109/services.2014.20 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094140515
    176 rdf:type schema:CreativeWork
    177 https://doi.org/10.1109/tdsc.2009.4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061585235
    178 rdf:type schema:CreativeWork
    179 https://doi.org/10.1145/1807128.1807161 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037053442
    180 rdf:type schema:CreativeWork
    181 https://doi.org/10.1145/3217871.3217876 schema:sameAs https://app.dimensions.ai/details/publication/pub.1104470384
    182 rdf:type schema:CreativeWork
    183 https://doi.org/10.1198/106186002317375712 schema:sameAs https://app.dimensions.ai/details/publication/pub.1064199308
    184 rdf:type schema:CreativeWork
    185 https://doi.org/10.2174/1874110x01509010044 schema:sameAs https://app.dimensions.ai/details/publication/pub.1069226963
    186 rdf:type schema:CreativeWork
    187 https://doi.org/10.2197/ipsjjip.24.371 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017905621
    188 rdf:type schema:CreativeWork
    189 https://doi.org/10.5120/20435-2768 schema:sameAs https://app.dimensions.ai/details/publication/pub.1072602064
    190 rdf:type schema:CreativeWork
    191 https://www.grid.ac/institutes/grid.6268.a schema:alternateName University of Bradford
    192 schema:name School of Electrical Engineering and Computer Science, University of Bradford, BD7 1DP, Bradford, UK
    193 rdf:type schema:Organization
    194 https://www.grid.ac/institutes/grid.7628.b schema:alternateName Oxford Brookes University
    195 schema:name Department of Computing & Communication Technologies, Oxford Brookes University, OX33 1HX, Oxford, UK
    196 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...