ChorusOS 4.0 Hot Restart Programmer's Guide

Chapter 2 Getting Started With Hot Restart

This chapter describes how to set up your ChorusOS 4.0 system to use the hot restart feature. It covers the following:

Configuring your ChorusOS 4.0 system for hot restart: see "2.1 System Configuration".
Running the graphical hot restart demonstration program provided with Sun Embedded Workshop: see "2.2 Running the Hot Restart Demonstration Program".

Note -

This chapter assumes that you have already correctly installed Sun Embedded Workshop on a host machine, and that you have a target machine which can be booted from a network boot server. You should also be familiar with configuring your ChorusOS 4.0 system and building a system image. For more information on these topics, see the related documents cited in the Preface of this guide.

This chapter does not cover hints for linking and building your own hot restartable applications. For information on this topic, see Appendix A, Hot Restart Programming Environment.

2.1 System Configuration

Before beginning to program and run actors which use the hot restart feature, you will need to update and configure your system for hot restart. System configuration for hot restart involves the following steps:

including the necessary ChorusOS optional features in your system
ensuring that the settings for the tunable parameters used by the hot restart feature are suitable for your system

These steps are described in the sections which follow.

2.1.1 Features

To incorporate hot restart in your ChorusOS 4.0 system, use the ews graphical tool or the configurator(1CC) command line utility to include the following optional features in your system profile:

HOT_RESTART. This feature exports the hot restart API and restart mechanism.
ACTOR_EXTENDED_MNGT, LAPSAFE, LAPBIND and ADMIN_SHUTDOWN. These features provide necessary support for the HOT_RESTART feature.

2.1.2 Memory Requirements and Design Considerations

As stated in Chapter 1, Introduction, the hot restart feature implements persistent memory as a portion of physical memory (RAM) on the target device. Although the persistent memory bank does not itself use virtual memory or swapping, hot restart is compatible with all three of the main memory models: flat, protected, and virtual.

The size of the persistent memory bank is defined in bytes by a system tunable parameter, pmm.rambankSize. The value of this parameter is static: its value cannot be modified while the system is running. In addition, because the RAM persistent memory bank does not use virtual memory or swapping, objects in persistent memory are locked in memory until they are freed. For these two reasons, it is important to make sure that pmm.rambankSize is set to a value realistic for the amount of data likely to be stored in persistent memory at any one time.

A portion of space reserved for an object in the persistent memory bank is termed a persistent memory block. A block is a contiguous set of memory pages, which means that the size of a block is always a multiple of the page size. Use vmPageSize(2K) to find out the page size for your platform.

For each running restartable actor, the system stores the following data in persistent memory:

The text and initialized data which were loaded into memory from stable storage. This is known as the actor image. The actor image occupies a single block of persistent memory.
The executed text, initialized data and BSS (data initialized to zero), from which the actor is running. This is known as the actor's executing image. The executing image occupies two blocks of persistent memory: one block for the text and one block for the data. The heap and stack for the executing actor are stored in non-persistent memory.

The persistent memory blocks used to store the actor image and executing image will only be freed when the actor's group terminates cleanly (note that this may be some time after the actor itself has terminated). The actor can also allocate its own blocks of persistent memory to store run-time data while it is executing.

Although it can be difficult to predict the likely required value of pmm.rambankSize early in the development cycle, the following rule of thumb, derived from the statements above, may be of use to developers at the system design stage:

Restartable actors occupy an absolute minimum of twice their size in persistent memory. This minimum will accommodate the actor's actor image and executing image (although it does not allow for rounding up of memory block sizes to the nearest page). The actor may also allocate additional portions of memory. pmm.rambankSize should therefore be greater than twice the combined size of the restartable actors expected to run simultaneously on a system.

Note -

Sharing persistent memory blocks between user actors, or between user and supervisor actors is not supported. Persistent memory blocks can only be shared between supervisor actors.

The default value of the pmm.rambankSize tunable parameter is 1024*1024 bytes, that is, one megabyte.

2.1.3 Tunable Parameters

The HOT_RESTART feature uses a number of system tunable parameters. Each parameter has a default value which can serve as a guideline and is generally suitable for getting started with hot restart programming. All tunable parameters are static: they cannot be modified while the system is running.

Two parameters define limits for persistent memory occupation in the system's persistent memory bank:

pmm.rambankSize is the maximum amount of persistent memory available in the system, in bytes. The default value is one megabyte (0x100000). See the previous section for guidelines on setting this parameter to suit your system. If you want to run the hot restart demonstration program, you will need to increase the value of this parameter to four megabytes (0x400000).
pmm.maxBlocks is the maximum number of recorded persistent memory blocks which can be allocated in the persistent memory bank. A block is a variable-sized number of contiguous pages of RAM. Each time an actor (supervisor or user) issues a request to store a piece of data in persistent memory, a block of the appropriate size, rounded up to the nearest whole page, is allocated. The default value is 30.

Two parameters control the maximum number of restartable actors and restart groups permitted in the system:

hrCtrl.maxActors is the maximum number of hot restartable actors which can be registered in the system. An actor is registered in the system when it is first run, and remains registered until all the actors in its group have terminated normally. The default value is 32. If hrCtrl.maxActors is greater than 65536, 65536 is used instead.
hrCtrl.maxGroups is the maximum number of restart groups which can be present in the system at the same time. Its default value is 32.

Two parameters define the system's restart policy (see "1.2.4 Site Restart"). These parameters are quite sensitive: different values can produce very different behavior in the system. The system manages a restart counter for each restart group. Each time a group is restarted, the system increases its restart counter by one.

hrCtrl.interval is the frequency with which a group's restart counter is decreased, in seconds. Every hrCtrl.interval seconds, the system will decrease the group's restart counter by one (until the counter reaches zero). The default value for hrCtrl.interval is 3 seconds.
hrCtrl.maxBadness is the maximum value a group's restart counter can reach before it triggers a site restart. In other words, when a group's restart counter reaches this value, a site restart is automatically performed. The default value is 25. If set to zero, the system never triggers a site restart.

2.1.4 Building the System Image

Once you have updated your system's features for hot restart and the tunable parameter settings are appropriate for your needs, you are ready to build the system image.

If you want to run the hot restart demonstration and examples, ensure that you include the examples directory and X11 library in your system build paths if they are not already included. For information on building a system image for your particular target platform, see the corresponding document in the ChorusOS 4.0 Target Family Guide collection.

After the system image has been correctly built, copy it to your boot server and reboot your target machine. You are now ready to begin programming and running applications which use the hot restart feature.

2.2 Running the Hot Restart Demonstration Program

Sun Embedded Workshop includes a graphical demonstration of the hot restart feature. The demonstration is based on the well-known program Xmaze, which has been slightly modified to make it hot restartable. Some of the program's data is stored in persistent memory, which means that when the program is restarted, it starts at a point close to the point it had reached prior to the restart. The resulting application is a ChorusOS actor called xdemo_s.

To run the hot restart demonstration program, do the following:

Ensure that your system features are correctly set for hot restart: see "2.1.1 Features".

Adjust the following system tunable parameters to suit the memory requirements of the Xmaze demonstration program, using ews or the configurator(1CC) command line utility:

Tunable parameter	Description	Required value
`pmm.rambankSize`	Size of persistent memory bank, in bytes	0x400000
`kern.exec.dflSysStackSize`	Default system stack size, in bytes	0x8000

Configure your system image build to include the X11 library and ChorusOS examples directory, if this is not already the case.
If you have made changes to the system image since the previous build, rebuild the system, copy the system image to the appropriate location (for example, the boot directory if you are using tftp-based boot) , and reboot the target machine.
Ensure that a copy of the xdemo_s actor is present in a directory which is mounted on the target machine. If you use the make root command, a copy of the actor is already stored in build_dir/root/bin/examples. If this directory is not mounted, or you prefer to use a different mounted directory:
$ cp build_dir/BUILD_EXAMPLES/restartDemo/xdemo_s example_directory
Set the target machine's DISPLAY environment variable to the host machine which you are currently working on:
$ rsh target setenv DISPLAY host_IP_address:0.0
Run the restartable actor:
$ rsh target arun -g 0 example_directory/xdemo_s
The actor will be run as a member of the restart group with group ID 0.

The Xmaze demonstration appears on the screen. As the demonstration runs, it periodically stores its state as data in persistent memory. Let the demonstration advance a little, then restart the actor by typing the following on the host console:

$ rsh akill aid

aid is the actor identifier which is printed on the host console when the actor starts. The actor is restarted, and the Xmaze demonstration continues from a point close to the point it had reached before the restart.

The akill command provoked the restart because it was not passed with the restart-specific option -g. To kill the Xmaze demonstration actor without restarting it, type:

$ rsh target akill -g 0

As the xdemo_s actor is run from the command line, it is a direct actor, and will be started automatically by the system when the site is restarted. To check this, rerun the actor, and then provoke a site restart by typing the following:

$ rsh target restart

When the system has been re-initialized, the demonstration will be restarted.

Of course, this is a very simple illustration of the use of hot restart. The site restart is provoked manually from the command line. As an alternative, try restarting the actor (using akill -g) sufficiently frequently to trigger an automatic site restart. To do this, you will first need to set the system's restart policy to be more sensitive to actor failure. The following configuration will cause a site restart if the actor is restarted twice in the space of four seconds:

Tunable parameter	Value
`hrCtrl.interval`	4
`hrCtrl.maxBadness`	2