ChorusOS 5.0 Application Developer's Guide

Using Restartable Processes

This section discusses programming and running restartable processes on the ChorusOS operating system. The following topics are presented:

Introduction

A restartable process can be reconstructed rapidly from a process image (text and data) without accessing stable storage. The management of restartable processes is handled by a ChorusOS supervisor process known as the Hot Restart Controller. The Hot Restart Controller is responsible for:

The following section looks at the API provided by the Hot Restart Controller and the corresponding restart-related commands provided by the C_INIT process. Before proceeding to a description of the API, however, it is important to understand how a restartable process is managed within the system.

Types of Restartable Processes

Processes do not explicitly declare themselves restartable, that is, there is no function call to declare a process restartable at the start of its main() program. Instead, a process can be run as a restartable process. Specifically, a process can be run as either a direct or indirect restartable process:

The distinction between direct and indirect processes is important in understanding the automatic restart mechanism provided by the Hot Restart Controller. When an error occurs, the Hot Restart Controller first stops all processes in the group. After the processes are stopped, only the direct restartable processes will be restarted. These processes (re-executed from their initial entry point) are responsible for restarting any indirect processes they may have spawned.

Restartable Process Credentials

Restartable processes, just like traditional ChorusOS processes, are identified in the system by a unique capability and PID. Restartable processes also run in a user group (with a user ID) like traditional ChorusOS processes. The life of each of these credentials is the same as the life of a specific run-time instance of the process -- when a restartable process is restarted, it is given a new capability, PID and user ID.

Hot restartable processes also have two additional credentials which persist across a process restart, and characterize the processes in the Hot Restart Controller:

Restartable Processes and Persistent Memory

As previously discussed in "Memory Requirements and Design Constraints", the system uses persistent memory to store the following data for each executing restartable process:

This data is stored in three persistent memory blocks. One memory block is used for the process image, one is used for the executed text, and the last is used for the process data. These blocks are allocated and freed upon requests from the Hot Restart Controller to the Persistent Memory Manager. Other processes cannot access or free these persistent memory blocks. However, restartable processes can allocate additional blocks under the control of the Hot Restart Controller. This is described in "Freeing Persistent Memory".

The Restartable Process Lifecycle

One approach to understanding how the Hot Restart Controller API is used, is to consider it in the context of the run-time life-cycle of a restartable process. A restartable process's code is not executed just once (from the start of the main() program to its final return). The code may be re-executed several times if there are multiple restarts. Data that is initialized, and processes that are initially loaded during the first execution will only need to be retrieved or restarted on subsequent executions. Therefore, it is important to view the restart API in the context of this first execution, and then of subsequent executions.

This section looks at the way the Hot Restart Controller API is used in the context of the life-cycle of a typical restartable process.

Initial Load

Use the C_INIT command arun with the -g option, or the function call hrfexec() to load a restartable process from stable storage into persistent memory. Both arun and hrfexec() provide support for specifying the persistent credentials of a restartable process when the process is initially loaded.

Processes that were created directly using acreate(2K) or actorCreate(2K) are not hot restartable and are unable to use the Hot Restart Controller API.

When a process is run as a restartable process, the Hot Restart Controller checks whether a process identified with the specified name is already registered. If this is not the case (as with the initial load), the Hot Restart Controller first solicits the Persistent Memory Manager to allocate the persistent memory blocks which will store the process's process image and executing image. If successful, the Hot Restart Controller registers the name of the new process as a restartable process, running in the specified group.

The subsequent load and start of the persistent process is the same as for a process run using a member of the afexec(2K) function family (see the man page for a description of this process). The only difference is that the process is loaded from its process image (in persistent memory) and not from stable storage.


Note -

A restartable process's name remains registered in the Hot Restart Controller for the life of its process group. The lifespan of the group may extend beyond the lifespan of the process. It is the programmer's responsibility to ensure that no two restartable processes will attempt to register with the same name in the Hot Restart Controller.


After a restartable process has been registered and loaded, it runs under the control of the Hot Restart Controller. If the process fails, the failure will invoke the restart of all direct members of its restart group. These direct processes will be responsible for restarting any indirect processes registered in the group. To query a process's restart group, use hrGetprocessGroup(2RESTART):

#include <hr/hr.h>
hrGetprocessGroup(int pid)

Group Restart

In the context of hot restart, a process is considered to have terminated abnormally (and will therefore invoke the restart of its group) if any of the following occur:

There is no single API call that can explicitly force a group of processes to restart. For cases in which it may be desirable to provoke a restart (for example, for testing purposes). The easiest way to do so is to deliberately provoke one of the previous cases. In the "hello world" example introduced in the previous chapter, this was done by causing a segmentation fault.

When a process fails, all processes in the failed restart group stop running and the Hot Restart Controller restarts all direct processes in the group from their initial entry point. The direct processes are responsible for restarting any indirect processes, using hrfexec(). When hrfexec() is called with a name that is already registered in the Hot Restart Controller, the Controller recognizes the process name and restarts the process from the process image, instead of loading it from stable storage.

A restartable process is always restarted at the same address. Its capability, process ID and user ID are not guaranteed to be will not necessarily be the same after restart. All system resources obtained before the restart are lost: in particular, open files, including those that were inherited at the time of initial creation are lost. This may include standard I/O connected to an rsh connection.

A restarted process uses the same arguments and environment parameters that were specified when the process was initially started. For direct restartable processes, a new set of pre-open stdin/stdout/stderr has been provided, which is connected to /dev/console. For indirect members, a new set of pre-open stdin/stdout/stderr is provided by the invoker of hrfexec(), just as for afexec(2K).

Freeing Persistent Memory

Just like any process, a restartable process can free persistent memory blocks using pmmFree() or pmmFreeAll(). This is described in "Freeing a Persistent Memory Block".

Restartable processes which allocate memory with pmmAllocate() can also use a basic automatic deallocation mechanism provided by the Hot Restart Controller. This saves the process from having to free its persistent memory explicitly. Instead, the persistent memory will remain allocated for the lifespan of the process's group, and then be freed automatically by the Hot Restart Controller when the last member of the process's restart group terminates cleanly. The disadvantage of this system is that the lifespan of the restart group may extend well beyond the point at which the memory block is no longer required. In this situation, the memory block will take up space in persistent memory unnecessarily.

To mark a persistent memory block for automatic de-allocation by the Hot Restart Controller, pass the macros HR_GROUP_KEY and HR_GROUP_KEYSIZE as the delKey and delKeySize arguments respectively in the call to pmmAllocate(). These macros tie the lifespan of the persistent memory block to the lifespan of the calling process's restart group.

A block marked for automatic de-allocation by the Hot Restart Controller can still be freed explicitly by calling pmmFree() with the block's PmmName. However, attempting to call pmmFreeAll() by passing the HR_GROUP_KEY and HR_GROUP_KEYSIZE macros will result in an error because this is not permitted.

Clean Termination

Any process that exits before the expected completion of its task is considered to have aborted abnormally and will cause a restart of its process group. This can be useful for cases where the process exits prematurely as a result of an error. This mechanism can also be useful for invoking a process restart where this is required, for example, if an execution problem is detected.

To enable a restartable process to terminate cleanly without causing a restart, use the HR_EXIT_HDL() macro prior to the call to exit(3STDC):

#include <hr/hr.h>
HR_EXIT_HDL();

The purpose of the preceding macro is to add an additional hot restart exit handler to the process's atexit(3STDC) function. The hot restart exit handler effectively removes the process in question from the Hot Restart Controller's responsibility. After a process has called HR_EXIT_HDL(), the Hot Restart Controller will no longer monitor the process for abnormal termination. As a result, when the process exits, it will terminate cleanly and not trigger a restart.

The HR_EXIT_HDL() macro should be called shortly before the process exits. Calling this macro earlier in the process code will mean that any unexpected exit between the macro call and the final exit will not be detected by the Hot Restart Controller. As a result, the process will not be restarted if it exits abnormally.

Cleanly terminating a process does not unregister the process in the Hot Restart Controller or remove the process's process image and executing image from persistent memory. This is because a cleanly terminated process will still be restarted if its group is restarted (because a group is always restarted in its initial state). In other words, when a group is restarted, all direct restartable processes will recommence execution at their initial entry point, regardless of whether or not they had already exited before the restart occurred. This is demonstrated by the following diagram. Both direct process one (DP1) and indirect process two (IP2) terminate cleanly, but are automatically restarted when direct process two (DP2) crashes.

Figure 15-1 Restart of Cleanly Terminated Processes

Graphic

Because of this behavior, it is useful to record the clean termination of restartable processes that will never require being reexecuted completely during a group's life by setting a flag in persistent memory. A restarted process can check the state of this flag at the start of its execution, and therefore detect whether it should re-execute or not.

Group Termination

For each group of restartable processes present in a ChorusOS system, the Hot Restart Controller stores a list of the processes for each group in a persistent memory block. A process is added to the list when it is first started. When a process cleanly terminates, the Hot Restart Controller notes this in the list. When all processes in the list have terminated cleanly, the Hot Restart Controller performs the following:

A group of processes can only terminate if all of its member processes terminate cleanly. This is important to remember in situations where not all indirect processes are restarted after a group restart. This is a matter of execution flow: if certain conditions in a direct process change the process's flow from one execution to another, the direct process may not restart an indirect process that was running prior to the restart. As a result, the indirect process will never terminate cleanly and so the group will not be able to terminate.

For example, consider the situation in the following diagram. The direct process spawns the indirect process only after certain conditions are met. These conditions are met the first time the direct process runs. After the direct process restarts, the conditios are no longer satisfied, so the indirect process is no longer spawned.

Figure 15-2 Conditional Spawning of a Restartable Process

Graphic

In the preceding diagram, the process group will not be able to terminate until the indirect process has been rerun using hrfexec(), and has terminated cleanly.

When a restart group cannot terminate because of one or more direct processes, the Hot Restart Controller detects this situation and displays the following message on the target console:


HR_CTRL: group gid blocked, some members have 
not terminated: list_of_processes

gid is the ID of the group in question, and list_of_processes provides the name of each process which prevents the group from terminating. When this message is displayed, a common solution is to kill the process group using the akill command with the -g option. However, this solution is useful only if none of the indirect processes need to be run to complete the group's task.

A better solution is to use careful application design. If the preceding situation is likely to occur, flags can be stored in persistent memory to identify indirect processes that have not terminated cleanly. A process can then be made responsible for cleaning up the group, that is, restarting each indirect process that is flagged. This clean-up process can be run using the arun -g command when the Hot Restart Controller notification is displayed on the target console. Alternatively, the group could be designed so that the clean-up process is always run just before the group is expected to terminate. In this case the problem is solved without accessing the C_INIT console.

Killing Restartable Processes

At times it may be necessary to circumvent the automatic restart mechanism provided by the Hot Restart Controller and explicitly terminate (kill) a restartable process. Processes which are killed will not be restarted. Killing a process automatically kills all processes within the process's restart group. This is because a restart group must remain consistent. The restart group may not be able to function correctly if a process is no longer available.

Restartable processes can be explicitly killed using either of the following:

Either method produces the same result: all processes in the associated restart group are killed. The Hot Restart Controller terminates the group as though all processes had exited cleanly (see "Group Termination" Group Termination).

Site Restart

A site restart is a hot restart of the entire system. All data of boot processes are reset to their original values from the previously loaded system image and the system enters its start-up phase again. As C_INIT restarts, sysadm.ini is reexecuted. Any calls to start restartable processes in the sysadm.ini file are ignored for a site restart because all direct restartable processes are automatically restarted by the system after the sysadm.ini file has been read.


Note -

When the system is restarted, previously mounted disks are not automatically remounted. To resolve this issue, ensure that the disks are mounted in the sysadm.ini file, or create a hot restartable process that will automatically mount the disks.


A site restart can be invoked automatically by the Hot Restart Controller, according to the tunable parameters that define the system's restart policy. For more information, see "Tunable Parameters".

To invoke a site restart programmatically, use the sysShutdown(2K) function call with the -i 1 arguments:

int sysShutdown (int argc, char** argv)

To provoke a site restart from the C_INIT command-line console, use the command shutdown -i 1 or restart(1M) command.

Putting It All Together: the restartSpawn Example Program

The following code example, restartSpawn, illustrates many of the function calls covered in this and previous chapters. The example is provided as an overview of the restart mechanism and the use of persistent memory. Specific parts of the example could be used as the basis of a more complex user application that incorporates hot restart.

The restartSpawn example uses two restartable processes, a parent process, HR_parent.r and a child, HR_child.r which is spawned by the parent. Both processes should be compiled as supervisor processes. The source code for the two processes is provided in "Example Application Code". The example can be summarized as follows:

The message is displayed independently of the number of times the parent process crashes, or the site is restarted.