A comprehensive introduction to reliability and availability modeling, analysis, and design at the system, hardware, and software levels
Reliability of Computer Systems and Networks presents the fundamentals of reliability and availability analysis for various computer hardware, software, and networked systems. Reliability and availability as major objectives in system design are the focus. Various redundancy and fault-tolerant techniques, as well as error-correcting coding techniques are treated.
The author proposes a high-level design approach based on apportioning the reliability and availability goals to subsystems and provides various techniques for achieving these subsystem goals. The next step is an efficient, exact optimization approach based on upper and lower bounds to minimize the number of feasible candidates. The most readily applied methods for analysis are utilized and design techniques are derived from basic principles. Analytical simplifications and approximations are developed to validate the results of computer models used for large-scale complex problems.
* Coding and decoding schemes for error detection and correction including chip reliability
* Comparison of the reliability and availability of parallel, standby, and majority voting architectures
* Formulation, solution, and interpretation of Markov models for repairable systems
* Introduction and comparison of various RAID memory systems
* The architecture and fault-tolerant principles of TANDEM and STRATUS non-stop computer systems
* Practical and tutorial examples and numerous practice problems
* Appendices which cover the necessary background material on probability, reliability, and architecture
Reliability of Computer Systems and Networks offers in-depth and up-to-date coverage of reliability and availability for students with a focus on important applications areas, computer systems, and networks. Professionals in systems and reliability design, as well as computer architecture, will find it a highly useful reference.
About the Author
MARTIN L. SHOOMAN, PhD, served for many years as a Professor of Electrical Engineering and Computer Science at Polytechnic University in Brooklyn, New York. Dr. Shooman has been a Visiting Professor at MIT and Hunter College, and a consultant to Bell Laboratories, NASA, IBM, the US Army, and many other government and commercial organizations. A fellow of the IEEE, he has received five best paper awards from their Reliability and Computer Societies. Dr. Shooman has contributed to over 100 papers and reports to the research literature and has given special courses in Britain, Canada, France, Israel, and throughout the US. The author of Probabilistic Reliability: An Engineering Approach and Software Engineering: Design, Reliability, and Management, he is currently President of the consulting firm Martin L. Shooman & Associates.