One approach to understanding how the Hot Restart Controller API is used, is to consider it in the context of the run-time life-cycle of a restartable process. A restartable process's code is not executed just once (from the start of the main() program to its final return). The code may be re-executed several times if there are multiple restarts. Data that is initialized, and processes that are initially loaded during the first execution will only need to be retrieved or restarted on subsequent executions. Therefore, it is important to view the restart API in the context of this first execution, and then of subsequent executions.
This section looks at the way the Hot Restart Controller API is used in the context of the life-cycle of a typical restartable process.
Use the C_INIT command arun with the -g option, or the function call hrfexec() to load a restartable process from stable storage into persistent memory. Both arun and hrfexec() provide support for specifying the persistent credentials of a restartable process when the process is initially loaded.
For a direct process that was run with the arun command, the process name will be system-generated, and the group ID is passed using the -g option. If the group ID is not already in use, a new group is created which contains the direct process. If the group ID already exists, the direct process is added to the corresponding restart group. If no ID is passed after -g, the process is started in the restart group with ID 0.
A restart group can contain any number of direct processes.
For an indirect process run with hrfexec(), the process name is specified using a PmmName structure. An indirect process automatically becomes a member of the same process group as the process that spawned it.
Processes that were created directly using acreate(2K) or actorCreate(2K) are not hot restartable and are unable to use the Hot Restart Controller API.
When a process is run as a restartable process, the Hot Restart Controller checks whether a process identified with the specified name is already registered. If this is not the case (as with the initial load), the Hot Restart Controller first solicits the Persistent Memory Manager to allocate the persistent memory blocks which will store the process's process image and executing image. If successful, the Hot Restart Controller registers the name of the new process as a restartable process, running in the specified group.
The subsequent load and start of the persistent process is the same as for a process run using a member of the afexec(2K) function family (see the man page for a description of this process). The only difference is that the process is loaded from its process image (in persistent memory) and not from stable storage.
A restartable process's name remains registered in the Hot Restart Controller for the life of its process group. The lifespan of the group may extend beyond the lifespan of the process. It is the programmer's responsibility to ensure that no two restartable processes will attempt to register with the same name in the Hot Restart Controller.
After a restartable process has been registered and loaded, it runs under the control of the Hot Restart Controller. If the process fails, the failure will invoke the restart of all direct members of its restart group. These direct processes will be responsible for restarting any indirect processes registered in the group. To query a process's restart group, use hrGetprocessGroup(2RESTART):
#include <hr/hr.h> hrGetprocessGroup(int pid)
In the context of hot restart, a process is considered to have terminated abnormally (and will therefore invoke the restart of its group) if any of the following occur:
Unrecoverable error (division by zero, unresolved page fault, invalid op code, and so forth).
Premature exit call, that is, an exit call prior to the expected completion of the process's task.
The process is killed without using the restart-specific command (akill(1M) with the -g option) or function call (hrKillGroup(2RESTART)) provided for this purpose.
There is no single API call that can explicitly force a group of processes to restart. For cases in which it may be desirable to provoke a restart (for example, for testing purposes). The easiest way to do so is to deliberately provoke one of the previous cases. In the "hello world" example introduced in the previous chapter, this was done by causing a segmentation fault.
When a process fails, all processes in the failed restart group stop running and the Hot Restart Controller restarts all direct processes in the group from their initial entry point. The direct processes are responsible for restarting any indirect processes, using hrfexec(). When hrfexec() is called with a name that is already registered in the Hot Restart Controller, the Controller recognizes the process name and restarts the process from the process image, instead of loading it from stable storage.
A restartable process is always restarted at the same address. Its capability, process ID and user ID are not guaranteed to be will not necessarily be the same after restart. All system resources obtained before the restart are lost: in particular, open files, including those that were inherited at the time of initial creation are lost. This may include standard I/O connected to an rsh connection.
A restarted process uses the same arguments and environment parameters that were specified when the process was initially started. For direct restartable processes, a new set of pre-open stdin/stdout/stderr has been provided, which is connected to /dev/console. For indirect members, a new set of pre-open stdin/stdout/stderr is provided by the invoker of hrfexec(), just as for afexec(2K).
Just like any process, a restartable process can free persistent memory blocks using pmmFree() or pmmFreeAll(). This is described in "Freeing a Persistent Memory Block".
Restartable processes which allocate memory with pmmAllocate() can also use a basic automatic deallocation mechanism provided by the Hot Restart Controller. This saves the process from having to free its persistent memory explicitly. Instead, the persistent memory will remain allocated for the lifespan of the process's group, and then be freed automatically by the Hot Restart Controller when the last member of the process's restart group terminates cleanly. The disadvantage of this system is that the lifespan of the restart group may extend well beyond the point at which the memory block is no longer required. In this situation, the memory block will take up space in persistent memory unnecessarily.
To mark a persistent memory block for automatic de-allocation by the
Hot Restart Controller, pass the macros HR_GROUP_KEY
and HR_GROUP_KEYSIZE
as the delKey and delKeySize arguments respectively
in the call to pmmAllocate(). These macros tie the lifespan
of the persistent memory block to the lifespan of the calling process's restart
group.
A block marked for automatic de-allocation by the Hot Restart Controller
can still be freed explicitly by calling pmmFree() with
the block's PmmName. However, attempting to call
pmmFreeAll() by passing the HR_GROUP_KEY
and HR_GROUP_KEYSIZE
macros will result in an error
because this is not permitted.
Any process that exits before the expected completion of its task is considered to have aborted abnormally and will cause a restart of its process group. This can be useful for cases where the process exits prematurely as a result of an error. This mechanism can also be useful for invoking a process restart where this is required, for example, if an execution problem is detected.
To enable a restartable process to terminate cleanly without causing a restart, use the HR_EXIT_HDL() macro prior to the call to exit(3STDC):
#include <hr/hr.h> HR_EXIT_HDL();
The purpose of the preceding macro is to add an additional hot restart exit handler to the process's atexit(3STDC) function. The hot restart exit handler effectively removes the process in question from the Hot Restart Controller's responsibility. After a process has called HR_EXIT_HDL(), the Hot Restart Controller will no longer monitor the process for abnormal termination. As a result, when the process exits, it will terminate cleanly and not trigger a restart.
The HR_EXIT_HDL() macro should be called shortly before the process exits. Calling this macro earlier in the process code will mean that any unexpected exit between the macro call and the final exit will not be detected by the Hot Restart Controller. As a result, the process will not be restarted if it exits abnormally.
Cleanly terminating a process does not unregister the process in the Hot Restart Controller or remove the process's process image and executing image from persistent memory. This is because a cleanly terminated process will still be restarted if its group is restarted (because a group is always restarted in its initial state). In other words, when a group is restarted, all direct restartable processes will recommence execution at their initial entry point, regardless of whether or not they had already exited before the restart occurred. This is demonstrated by the following diagram. Both direct process one (DP1) and indirect process two (IP2) terminate cleanly, but are automatically restarted when direct process two (DP2) crashes.
Because of this behavior, it is useful to record the clean termination of restartable processes that will never require being reexecuted completely during a group's life by setting a flag in persistent memory. A restarted process can check the state of this flag at the start of its execution, and therefore detect whether it should re-execute or not.
For each group of restartable processes present in a ChorusOS system, the Hot Restart Controller stores a list of the processes for each group in a persistent memory block. A process is added to the list when it is first started. When a process cleanly terminates, the Hot Restart Controller notes this in the list. When all processes in the list have terminated cleanly, the Hot Restart Controller performs the following:
Deallocates the persistent memory blocks used to store the
images of the terminated processes, as well as blocks that were allocated
using the HR_GROUP_KEY
and HR_GROUP_KEYSIZE
deletion key macros. The process names used by the processes
can then be reused by other restartable processes (which will be loaded into
memory as new processes).
Adds the group ID to the list of available IDs for new process groups.
A group of processes can only terminate if all of its member processes terminate cleanly. This is important to remember in situations where not all indirect processes are restarted after a group restart. This is a matter of execution flow: if certain conditions in a direct process change the process's flow from one execution to another, the direct process may not restart an indirect process that was running prior to the restart. As a result, the indirect process will never terminate cleanly and so the group will not be able to terminate.
For example, consider the situation in the following diagram. The direct process spawns the indirect process only after certain conditions are met. These conditions are met the first time the direct process runs. After the direct process restarts, the conditios are no longer satisfied, so the indirect process is no longer spawned.
In the preceding diagram, the process group will not be able to terminate until the indirect process has been rerun using hrfexec(), and has terminated cleanly.
When a restart group cannot terminate because of one or more direct processes, the Hot Restart Controller detects this situation and displays the following message on the target console:
HR_CTRL: group gid blocked, some members have not terminated: list_of_processes |
gid is the ID of the group in question, and list_of_processes provides the name of each process which prevents the group from terminating. When this message is displayed, a common solution is to kill the process group using the akill command with the -g option. However, this solution is useful only if none of the indirect processes need to be run to complete the group's task.
A better solution is to use careful application design. If the preceding situation is likely to occur, flags can be stored in persistent memory to identify indirect processes that have not terminated cleanly. A process can then be made responsible for cleaning up the group, that is, restarting each indirect process that is flagged. This clean-up process can be run using the arun -g command when the Hot Restart Controller notification is displayed on the target console. Alternatively, the group could be designed so that the clean-up process is always run just before the group is expected to terminate. In this case the problem is solved without accessing the C_INIT console.