Solaris Resource Manager 1.3 System Administration Guide

Crash Recovery

There are many concerns for an administrator when a Solaris system has a failure, but there are some additional considerations when a Solaris Resource Manager system is being used. They are:

The following sections discuss these in detail and offer suggestions for handling the situation, where appropriate.

Corruption of the Limits Database

Solaris Resource Manager maintenance of the limits database is robust and corruption is unlikely. However, if corruption does occur, it is of major concern since this database is basic to the operation of Solaris Resource Manager. Any potential corruption should be investigated and, if detected, corrected.

Symptoms

No single symptom can reliably be used to determine whether the limits database has been corrupted, but there are a number of indicators that potentially reflect a corrupted limits database:

If an administrator suspects that there is corruption in the limits database, the best way to detect it is to use limreport to request a list of lnodes with attributes that should have values within a known range. If values outside that range are reported, corruption has taken place. limreport could also be used to list lnodes which have a clear flag.real. This will indicate accounts in the password map for which no lnode exists.

Correction

When corruption is detected, the administrator should revert to an uncorrupted version of the limits database. If the corruption is limited to a small section of the limits database, the administrator may be able to save the contents of all other lnodes and reload them into a fresh limits database using the limreport and limadm commands. This would be preferable if no recent copy of the limits database is available since the new limits database would now contain the most recent usage and accrue attributes. The procedure for saving and restoring the limits database is documented in Chapter 5, Managing Lnodes. For simple cases of missing lnodes it could be sufficient to just recreate them by using the limadm command.

Connect-Time Loss by limdaemon

If limdaemon terminates for any reason, all users currently logged in cease to be charged for any connect-time usage. Furthermore, when limdaemon is restarted, any users logged in will continue to use those terminals free of charge. This is because the daemon relies on login notifications from login to establish a Solaris Resource Manager login session record within the internal structures it uses to calculate connect-time usages. Therefore, whenever it starts, there are no Solaris Resource Manager login sessions established until the first notification is received.

Typically this will not be a problem if limdaemon terminated due to a system crash, since the crash will also cause other processes to terminate. Login sessions would then not be able to recommence until the system is restarted.

If limdaemon terminates for some other reason, the administrator has two choices:

  1. Restart the daemon immediately, and ignore the lost charging of terminal connect-time for users who are already logged in. This could mean that a user has free use of a terminal indefinitely unless identified and logged out.

  2. Bring the system back to single-user mode then return to multi-user mode, thus ensuring that all current login sessions are terminated and users can only log in again after the daemon has been restarted.