As described in Chapter 1, Introduction, a restartable actor is an actor which can be rapidly reconstructed from an actor image (text and data) , without accessing stable storage. The management of restartable actors is handled by a ChorusOS supervisor actor known as the Hot Restart Controller. The Hot Restart Controller is responsible for:
Loading and running restartable actors, and controlling their storage in persistent memory.
Monitoring restartable actors for abnormal termination, and restarting their restart group if such an abnormal termination occurs.
Triggering a site restart if a group is restarted too frequently (based on the system's restart policy, as described in Chapter 2, Getting Started With Hot Restart).
This chapter looks at the API provided by the Hot Restart Controller, and the corresponding restart-related commands provided by the C_INIT actor. Before proceeding to a description of the API, however, it is important to understand how restartable actors are managed within the system.
As explained in Chapter 1, Introduction, it is important to understand that actors do not explicitly declare themselves restartable, that is, there is no function call to declare an actor restartable at the start of its main() program. Instead, an actor can be run as a restartable actor. More precisely, an actor can be run as either a direct or indirect restartable actor:
Direct restartable actors
are loaded and run using the C_INIT
command arun(1M) with the -g option.
Indirect restartable actors are spawned from restartable actors using the hrfexec(2RESTART) family of API calls. hrfexec() calls function similarly to afexec(2K) calls, but provide an additional PmmName parameter used to identify them for the purposes of actor restart:
#include <hr/hr.h> int hrfexecve( PmmName * baseName, const char * path, KnCap * cactorcap, const AcParam * param, char const * argv, char const * envp); (...)
This distinction between direct and indirect actors is important for understanding the automatic restart mechanism provided by the Hot Restart Controller. When an error occurs, the Hot Restart Controller will first stop all actors in the group, and then only restart the concerned direct restartable actors. These actors, re-executed from their initial entry point, are responsible for restarting any indirect actors they may have spawned. An illustration of this is provided in Figure 1-3.
Restartable actors, just like traditional ChorusOS actors, are identified in the system by a unique capability and identifier (actor ID). Restartable actors also run in a user group (with a user ID), like traditional ChorusOS actors. The lifetime of each of these credentials is the same as the lifetime of a particular run-time instance of the actor: when a restartable actor is restarted, it is given a new capability, actor ID and user ID.
Hot restartable actors also have two additional credentials, which persist across an actor restart, and serve to characterize them in the Hot Restart Controller:
Each restartable actor has a unique name.
The maximum number of restartable actors (unique names) which can be registered
in the Hot Restart Controller is fixed by the system tunable parameter hrCtrl.maxActors
.
It is the programmer's responsibility to ensure that each actor running in the system uses a unique name, as this is not checked by the system. Attempting to run two actors which use the same name will give unpredictable results.
Each restartable actor is a member of a restart
group. A restart group is uniquely identified in the system by an integer,
known as the group's ID. The maximum number of group IDs allowed in the system
is fixed by a system tunable parameter, hrCtrl.maxGroups
.
As explained in "2.1.2 Memory Requirements and Design Considerations", the system uses persistent memory to store the following data for each executing restartable actor:
The actor's actor image: a copy of the actor's text and initialized data segments from which the actor will be loaded after a restart.
The actor's executing image: a copy of the actor's text and data from which the actor is executed.
This data is stored in three persistent memory blocks: one memory block for the actor image, one memory block for the executed text and one memory block for the actor data. These blocks are allocated and freed by requests from the Hot Restart Controller to the Persistent Memory Manager. Other actors cannot access or free these persistent memory blocks, although restartable actors can place additional blocks which they allocate under the control of the Hot Restart Controller. This is described in "4.2.3 Freeing Persistent Memory".