Reconfiguration and checkpointing in massively parallel systems View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

1994

AUTHORS

Bernd Bieker , Geert Deconinck , Erik Maehle , Johan Vounckx

ABSTRACT

Despite the improvements in hardware design massively parallel systems lack on dependability due to the huge amount of components these systems consist of. One possibility to introduce fault-tolerance into such systems is backward error recovery where failed modules can be replaced by spares. The ESPRIT Project 6731 “A Practical Approach to Fault-Tolerant Massively Parallel Systems” follows such an approach and covers the aspects of error detection, diagnosis, checkpointing and reconfiguration. Target systems are multi-computers consisting of grid-wise connected modules using message passing. A first implementation will be made for the Parsytec GCel under PARIX. This paper focuses on recovery by reconfiguration and checkpointing. The project is based on switching in spares and routing around failed components via virtual links (interval routing). For the recovery a user-driven as well as a user-transparent approach are provided based on the new recovery-line-manager. More... »

PAGES

351-370

References to SciGraph publications

Book

TITLE

Dependable Computing — EDCC-1

ISBN

978-3-540-58426-1
978-3-540-48785-2

Author Affiliations

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/3-540-58426-9_141

DOI

http://dx.doi.org/10.1007/3-540-58426-9_141

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1001612096


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0803", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Computer Software", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "FG Datentechnik, Universit\u00e4t-GH Paderborn, Deutschland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bieker", 
        "givenName": "Bernd", 
        "id": "sg:person.07700515355.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07700515355.50"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "KU Leuven", 
          "id": "https://www.grid.ac/institutes/grid.5596.f", 
          "name": [
            "Dept. Elektrotechniek-ESAT, Katholieke Universiteit Leuven, Belgien"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Deconinck", 
        "givenName": "Geert", 
        "id": "sg:person.01022745130.75", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01022745130.75"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "FG Datentechnik, Universit\u00e4t-GH Paderborn, Deutschland"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Maehle", 
        "givenName": "Erik", 
        "id": "sg:person.010731036652.96", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010731036652.96"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "KU Leuven", 
          "id": "https://www.grid.ac/institutes/grid.5596.f", 
          "name": [
            "Dept. Elektrotechniek-ESAT, Katholieke Universiteit Leuven, Belgien"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Vounckx", 
        "givenName": "Johan", 
        "id": "sg:person.011331673473.04", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011331673473.04"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/comjnl/30.4.298", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010774065"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf01660031", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031442291", 
          "https://doi.org/10.1007/bf01660031"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf01660031", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031442291", 
          "https://doi.org/10.1007/bf01660031"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/12.166602", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061087386"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/12.67315", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061088803"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "1994", 
    "datePublishedReg": "1994-01-01", 
    "description": "Despite the improvements in hardware design massively parallel systems lack on dependability due to the huge amount of components these systems consist of. One possibility to introduce fault-tolerance into such systems is backward error recovery where failed modules can be replaced by spares. The ESPRIT Project 6731 \u201cA Practical Approach to Fault-Tolerant Massively Parallel Systems\u201d follows such an approach and covers the aspects of error detection, diagnosis, checkpointing and reconfiguration. Target systems are multi-computers consisting of grid-wise connected modules using message passing. A first implementation will be made for the Parsytec GCel under PARIX. This paper focuses on recovery by reconfiguration and checkpointing. The project is based on switching in spares and routing around failed components via virtual links (interval routing). For the recovery a user-driven as well as a user-transparent approach are provided based on the new recovery-line-manager.", 
    "editor": [
      {
        "familyName": "Echtle", 
        "givenName": "Klaus", 
        "type": "Person"
      }, 
      {
        "familyName": "Hammer", 
        "givenName": "Dieter", 
        "type": "Person"
      }, 
      {
        "familyName": "Powell", 
        "givenName": "David", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/3-540-58426-9_141", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-540-58426-1", 
        "978-3-540-48785-2"
      ], 
      "name": "Dependable Computing \u2014 EDCC-1", 
      "type": "Book"
    }, 
    "name": "Reconfiguration and checkpointing in massively parallel systems", 
    "pagination": "351-370", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/3-540-58426-9_141"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "07adf24cd3bea91b5f3f92c8ce1ce0e0fd42dd8d5eb894d901e3e91849545911"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1001612096"
        ]
      }
    ], 
    "publisher": {
      "location": "Berlin, Heidelberg", 
      "name": "Springer Berlin Heidelberg", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/3-540-58426-9_141", 
      "https://app.dimensions.ai/details/publication/pub.1001612096"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T00:46", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8700_00000244.jsonl", 
    "type": "Chapter", 
    "url": "http://link.springer.com/10.1007/3-540-58426-9_141"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/3-540-58426-9_141'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/3-540-58426-9_141'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/3-540-58426-9_141'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/3-540-58426-9_141'


 

This table displays all metadata directly associated to this object as RDF triples.

113 TRIPLES      23 PREDICATES      31 URIs      20 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/3-540-58426-9_141 schema:about anzsrc-for:08
2 anzsrc-for:0803
3 schema:author Nc583cc05403e49e49eb7cce3632b1519
4 schema:citation sg:pub.10.1007/bf01660031
5 https://doi.org/10.1093/comjnl/30.4.298
6 https://doi.org/10.1109/12.166602
7 https://doi.org/10.1109/12.67315
8 schema:datePublished 1994
9 schema:datePublishedReg 1994-01-01
10 schema:description Despite the improvements in hardware design massively parallel systems lack on dependability due to the huge amount of components these systems consist of. One possibility to introduce fault-tolerance into such systems is backward error recovery where failed modules can be replaced by spares. The ESPRIT Project 6731 “A Practical Approach to Fault-Tolerant Massively Parallel Systems” follows such an approach and covers the aspects of error detection, diagnosis, checkpointing and reconfiguration. Target systems are multi-computers consisting of grid-wise connected modules using message passing. A first implementation will be made for the Parsytec GCel under PARIX. This paper focuses on recovery by reconfiguration and checkpointing. The project is based on switching in spares and routing around failed components via virtual links (interval routing). For the recovery a user-driven as well as a user-transparent approach are provided based on the new recovery-line-manager.
11 schema:editor N85746a84d53a4fc586a0befaccd8f1d0
12 schema:genre chapter
13 schema:inLanguage en
14 schema:isAccessibleForFree false
15 schema:isPartOf N2612af87420d4190afdedf10e20826bd
16 schema:name Reconfiguration and checkpointing in massively parallel systems
17 schema:pagination 351-370
18 schema:productId N108322f5ddd84ba28c20cd817ec3d5a8
19 N7468e1ac8ff144218de9bb2a4f69caaf
20 N8eb3a92dc7b44260b56aac4d9f34f62e
21 schema:publisher N0f82cb751d5747fbbd64ac60eed41160
22 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001612096
23 https://doi.org/10.1007/3-540-58426-9_141
24 schema:sdDatePublished 2019-04-16T00:46
25 schema:sdLicense https://scigraph.springernature.com/explorer/license/
26 schema:sdPublisher Nc8b2fb6008c14950b28c6712a9a7912f
27 schema:url http://link.springer.com/10.1007/3-540-58426-9_141
28 sgo:license sg:explorer/license/
29 sgo:sdDataset chapters
30 rdf:type schema:Chapter
31 N02fc9f9efc5d45ba99f1070895cea8ee schema:name FG Datentechnik, Universität-GH Paderborn, Deutschland
32 rdf:type schema:Organization
33 N0f82cb751d5747fbbd64ac60eed41160 schema:location Berlin, Heidelberg
34 schema:name Springer Berlin Heidelberg
35 rdf:type schema:Organisation
36 N108322f5ddd84ba28c20cd817ec3d5a8 schema:name readcube_id
37 schema:value 07adf24cd3bea91b5f3f92c8ce1ce0e0fd42dd8d5eb894d901e3e91849545911
38 rdf:type schema:PropertyValue
39 N127b9e94d0254a27a819b8925c7d4e23 rdf:first sg:person.010731036652.96
40 rdf:rest N43e94200fa924dcf9c64ff1137bcfc40
41 N2612af87420d4190afdedf10e20826bd schema:isbn 978-3-540-48785-2
42 978-3-540-58426-1
43 schema:name Dependable Computing — EDCC-1
44 rdf:type schema:Book
45 N43e94200fa924dcf9c64ff1137bcfc40 rdf:first sg:person.011331673473.04
46 rdf:rest rdf:nil
47 N537e0db1d4cf4e5e85e0585698c4e08a schema:familyName Echtle
48 schema:givenName Klaus
49 rdf:type schema:Person
50 N74655986fccb4800b25b37ccd796181f schema:familyName Powell
51 schema:givenName David
52 rdf:type schema:Person
53 N7468e1ac8ff144218de9bb2a4f69caaf schema:name doi
54 schema:value 10.1007/3-540-58426-9_141
55 rdf:type schema:PropertyValue
56 N7827addb74dc4c7f9a5fba41b49c7247 rdf:first N74655986fccb4800b25b37ccd796181f
57 rdf:rest rdf:nil
58 N85746a84d53a4fc586a0befaccd8f1d0 rdf:first N537e0db1d4cf4e5e85e0585698c4e08a
59 rdf:rest Ndbd55fe6c2a742df93baf76fe317e92b
60 N8eb3a92dc7b44260b56aac4d9f34f62e schema:name dimensions_id
61 schema:value pub.1001612096
62 rdf:type schema:PropertyValue
63 N98d716067f2d4ee796740846d9eebef5 schema:familyName Hammer
64 schema:givenName Dieter
65 rdf:type schema:Person
66 Nb150793928254ff2bcb7ec3e1a85c565 schema:name FG Datentechnik, Universität-GH Paderborn, Deutschland
67 rdf:type schema:Organization
68 Nc583cc05403e49e49eb7cce3632b1519 rdf:first sg:person.07700515355.50
69 rdf:rest Nf7fb36c6f88e42e0acd1b15cc03c72f7
70 Nc8b2fb6008c14950b28c6712a9a7912f schema:name Springer Nature - SN SciGraph project
71 rdf:type schema:Organization
72 Ndbd55fe6c2a742df93baf76fe317e92b rdf:first N98d716067f2d4ee796740846d9eebef5
73 rdf:rest N7827addb74dc4c7f9a5fba41b49c7247
74 Nf7fb36c6f88e42e0acd1b15cc03c72f7 rdf:first sg:person.01022745130.75
75 rdf:rest N127b9e94d0254a27a819b8925c7d4e23
76 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
77 schema:name Information and Computing Sciences
78 rdf:type schema:DefinedTerm
79 anzsrc-for:0803 schema:inDefinedTermSet anzsrc-for:
80 schema:name Computer Software
81 rdf:type schema:DefinedTerm
82 sg:person.01022745130.75 schema:affiliation https://www.grid.ac/institutes/grid.5596.f
83 schema:familyName Deconinck
84 schema:givenName Geert
85 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01022745130.75
86 rdf:type schema:Person
87 sg:person.010731036652.96 schema:affiliation Nb150793928254ff2bcb7ec3e1a85c565
88 schema:familyName Maehle
89 schema:givenName Erik
90 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010731036652.96
91 rdf:type schema:Person
92 sg:person.011331673473.04 schema:affiliation https://www.grid.ac/institutes/grid.5596.f
93 schema:familyName Vounckx
94 schema:givenName Johan
95 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011331673473.04
96 rdf:type schema:Person
97 sg:person.07700515355.50 schema:affiliation N02fc9f9efc5d45ba99f1070895cea8ee
98 schema:familyName Bieker
99 schema:givenName Bernd
100 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07700515355.50
101 rdf:type schema:Person
102 sg:pub.10.1007/bf01660031 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031442291
103 https://doi.org/10.1007/bf01660031
104 rdf:type schema:CreativeWork
105 https://doi.org/10.1093/comjnl/30.4.298 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010774065
106 rdf:type schema:CreativeWork
107 https://doi.org/10.1109/12.166602 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061087386
108 rdf:type schema:CreativeWork
109 https://doi.org/10.1109/12.67315 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061088803
110 rdf:type schema:CreativeWork
111 https://www.grid.ac/institutes/grid.5596.f schema:alternateName KU Leuven
112 schema:name Dept. Elektrotechniek-ESAT, Katholieke Universiteit Leuven, Belgien
113 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...