Checkpointing and rollback recovery for distributed systems:
Saved in:
Bibliographic Details
Main Authors: Koo, Richard (Author), Toueg, Sam (Author)
Format: Book
Language:English
Published: Ithaca, New York 1985
Series:Cornell University <Ithaca, NY> / Department of Computer Science: Technical report 706
Subjects:
Abstract:We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions.
Physical Description:22 S.
Order paper/chapter scan

Branch Library Mathematics & Informatics, Reports

Holdings details from Teilbibliothek Mathematik &amp; Informatik, Berichte
Call Number: 0111 2001 B 6052-706 Floor plan
Copy 1 Available for loan On Shelf