ChorusOS 4.0 Hot Restart Programmer's Guide

4.2.4 Clean Termination

As described in "4.2.2 Group Restart", any actor that exits before the expected completion of its task is considered to have aborted abnormally and will cause the restart of its actor group. This is useful for cases in which the actor does indeed exit prematurely as a result of an error. This mechanism is also useful for provoking an actor restart where this is explicitly desired, for example, when an execution problem is detected.

To allow a restartable actor to terminate cleanly without causing a restart, use the HR_EXIT_HDL() macro prior to the call to exit(3STDC):

#include <hr/hr.h>
HR_EXIT_HDL();

The purpose of this macro is to add an additional hot restart exit handler to the actor's atexit(3STDC) function. The hot restart exit handler effectively removes the concerned actor from the Hot Restart Controller's responsibility: once an actor has called HR_EXIT_HDL(), the Hot Restart Controller will no longer monitor it for abnormal termination. As a result, when the actor exits, it will terminate cleanly and no longer trigger a restart.

Call the HR_EXIT_HDL() macro shortly before the actor exits. Calling the macro earlier in the actor code will mean that any unexpected exit between the macro call and the final exit will not be detected by the Hot Restart Controller. As a result, the actor will not be restarted if it exits abnormally.

Cleanly terminating an actor does not deregister the actor in the Hot Restart Controller, or remove the actor's actor image and executing image from persistent memory. This is because a cleanly terminated actor will still be restarted if its group is restarted, since a group is always restarted in its initial state. In other words, when a group is restarted, all direct restartable actors will recommence execution at their initial entry point, regardless of whether or not they had already exited before the restart occurred. This is shown in the following diagram. Both direct actor 1 (DA1) and indirect actor 2 (A2) terminate cleanly, but are restarted when direct actor 2 (DA2) crashes.

Figure 4-1 Restart of Cleanly Terminated Actors

Graphic

Because of this behavior, it can be useful to record the clean termination of restartable actors which will never need to be re-executed completely during a group's lifetime by setting a flag in persistent memory. A restarted actor can check the state of this flag at the start of its execution, and thus detect whether it should re-execute or not.

4.2.4.1 Group Termination

For each group of restartable actors present in a ChorusOS system, the Hot Restart Controller stores a list of the actors in the group in a persistent memory block. An actor is added to the list when it is first started. When an actor cleanly terminates, the Hot Restart Controller notes this in the list. When all actors in the list have terminated cleanly, the Hot Restart Controller does the following:

A group of actors can only terminate if all of its member actors terminate cleanly. This is important to remember in situations where not all indirect actors are restarted after a group restart. This is a matter of execution flow: if certain conditions in a direct actor change the actor's flow from one execution to the next, the direct actor may not restart an indirect actor which was running prior to the restart. As a result, the indirect actor will never terminate cleanly and so the group will not be able to terminate.

For example, consider the situation in the following figure. The direct actor spawns the indirect actor only if a certain condition is fulfilled. This condition is fulfilled the first time the direct actor runs. After the direct actor restarts, the condition is no longer fulfilled, so the indirect actor is no longer spawned.

Figure 4-2 Conditional Spawning of a Restartable Actor

Graphic

In the situation illustrated above, the actor group will not be able to terminate until the indirect actor has been rerun using hrfexec(), and has terminated cleanly.

When a restart group cannot terminate because of one or more direct actors in this situation, the Hot Restart Controller detects the fact and prints the following message on the target console:


HR_CTRL: group gid blocked, some members have not terminated: list_of_actors

gid is the ID of the group concerned, and list_of_actors provides the name of each actor which prevents the group from terminating. When this message appears, a basic solution is to kill the actor group using the akill command with the -g option, as described in "4.3 Killing Restartable Actors". This solution is only useful if none of the indirect actors need to be run to complete the group's task.

A better solution is to use careful application design. If the situation is likely to occur, flags can be stored in persistent memory to indicate indirect actors which have not terminated cleanly. An actor can then be made responsible for cleaning up the group, that is, restarting each indirect actor which is flagged. This clean-up actor can be run using the arun -g command when the Hot Restart Controller notification appears on the target console. Alternatively, the group could be designed so that the clean-up actor is always run just before the group is expected to terminate, in which case the problem is solved without the need for access to the C_INIT console.