ChorusOS 4.0 Hot Restart Programmer's Guide

1.1 What Is Hot Restart?

The ChorusOS(TM) 4.0 system's hot restart feature has been designed and implemented to address the high-availability requirements of ChorusOS system builders. Hot restart provides an advanced mechanism for restarting ChorusOS applications or the entire system when a serious error or failure occurs. Traditionally, system recovery from such errors or failures involves terminating applications and reloading them from stable storage, or rebooting the system. This causes system downtime, and can mean that important application data is lost. Such behavior is unacceptable for system builders seeking '7 by 24' or 'five nines' system availability.

The ChorusOS 4.0 hot restart feature solves the problem of downtime and data loss by using persistent memory, that is, memory which can persist beyond the lifetime of a particular run-time instance of an actor. When an actor which uses the hot restart feature fails, or terminates abnormally, the system uses the actor data stored in persistent memory to reconstruct the actor without accessing stable storage. This reconstruction of an actor from persistent memory instead of from stable storage is known as hot restarting (or simply restarting) the actor.

Hot restarting one or more actors is significantly faster than conventional failure recovery techniques (application reload or cold system reboot) because it protects critical information that allows the failed portions of a system to be reconstructed quickly, with minimal interruption in service.

1.1.1 Feature Services

ChorusOS hot restart comprises an API and run-time architecture which offer the following services:

The combination of these services provides a powerful framework for highly-available systems and applications, dramatically reducing the time it takes for a failed system or component to return to service.