C H A P T E R  11

Operating Lights Out Management from Solaris

This chapter explains how to use the LOMlite2-specific commands available in Solaris 8 for monitoring and managing a Netra 20 server.

For an introduction to the LOMlite2 device and a description of an alternative user interface to it, see Chapter 10.

The chapter contains the following sections:


Monitoring the System From Solaris

To use the Lights Out Management (LOM) facilities, either remotely or locally, you need a terminal connection to the LOM console port on the Netra 20 server.

There are two ways of interrogating the LOMlite2 device or of sending it commands to perform:

For information about how to do this, see Chapter 10.

These commands are described in this chapter.

The Solaris commands described in this section, which are all available from the UNIX # prompt, run the /usr/sbin/lom utility.

Where appropriate, the command lines given in this section are accompanied by typical output from the commands.

Viewing Online LOMlite2 Documentation

single-step bulletTo view the manual pages for the LOMlite2 utility, type:

# man lom

Checking the Power Supply Unit (lom -p)

single-step bulletTo check that the input lines and the output line for the power supply unit are working normally, type:

# lom -p
PSUs:
1 OK
#



Note - If there are any failures of the PSU that affect more than just the input or output lines, Solaris will not run. However, if standby power is present, you can still use the LOMlite2 shell commands described in Chapter 10.



Checking the Status of the PSU LEDs (lom -L)

single-step bulletTo check whether the PSU LEDs are on or off, type:

# lom -L
LOMlite led states:
1      on      Power
2      off     Fault
3      off     Suppply A
4      off     Supply B
5      on      PSU ok
6      off     PSU fail
#



Note - The above example is taken from an AC system, hence the Supply A and Supply B status LEDs, which relate to the DC power supply, are both reported as off.



Checking the Fans (lom -f)

single-step bulletTo check status of the fans, type:

# lom -f
Fans:
1 OK speed 99%
2 OK speed 95%
3 OK speed 100%
#

To identify each fan, see Fan Identification. If you need to replace a fan, contact your local Sun sales representative and quote the part number of the component you need. For information, see Appendix A and the Netra 20 Service and System Reference Manual.

Checking the Internal Circuit Breakers (lom -v)

The -v option displays the status of the Netra 20 server's internal circuit breakers. For any that have been tripped, the status will read faulty. The system contains two circuit breakers: one for the PSU and one for the System Configuration Card reader. If there is a problem with the circuit breakers, remove the device connected to the relevant port. When you do this, the circuit breakers will automatically reset. If there is a problem with the circuit breaker for the System Configuration Card, it means that you do not have a valid System Configuration Card inserted. Insert one.

single-step bulletTo check the status of the supply rails and internal circuit breakers, type:

# lom -v
Supply voltages:
System status flags (circuit breakers):
 1              SCC status=ok
 2              PSU status=ok
# 

Checking the Internal Temperature (lom -t)

single-step bulletTo check the internal temperature of the system and also the system's warning and shutdown threshold temperatures, type:

# lom -t
System Over-temperature Sensors:
 1                  status=System Temperature Sensors:
 1          Ambient 23 degC : warning 67 degC : shutdown 72 degC
 2   CPU0 enclosure 23 degC : warning 59 degC : shutdown 61 degC
 3         CPU0 die 56 degC : warning 90 degC : shutdown 95 degC
 4   CPU1 enclosure 22 degC : warning 59 degC : shutdown 61 degC
 5         CPU1 die 56 degC : warning 90 degC : shutdown 95 degC
System Over-temperature Sensors:
 1                  status=ok
#

Checking the Status of the Fault LED and Alarms (lom -l)

single-step bulletTo check whether the Fault LED and alarms are on or off, type:

# lom -l
LOMlite alarm states:
Alarm1=off
Alarm2=off
Alarm3=off
Fault LED=off
# 

Alarms 1, 2, and 3 are software flags. They are associated with no specific conditions but are available to be set by your own processes or from the command line (see Turning Alarms On and Off (lom -A)).

Changing the LOMlite2 Device's Watchdog Configuration (lom -w)

For full information about enabling and using the LOMlite2's watchdog process, see Configuring the LOMlite2 to Restart the Server Automatically After a Lockup.

single-step bulletTo find out how the LOMlite2 watchdog is currently configured, type:

# lom -w
LOMlite watchdog (ASR) settings:
Watchdog=off
Hardware reset=off
Timeout=40 s
# 

The LOMlite2 watchdog is enabled by default when Solaris boots. This means that if the watchdog does not receive a "pat" for 40 seconds, it will turn on the Fault LED on the front and back panels of the system, generate a LOM event report, and, if configured to do so, perform an automatic server restart. However, although the watchdog is enabled by default when Solaris boots, the Hardware reset option is not. This means that the LOMlite2 device does not, by default, automatically restart the server after a lockup.

single-step bulletTo configure the LOMlite2 device to perform an automatic server restart (ASR) after a lockup, you must enable the Hardware reset option as well as the Watchdog option. For more information, see Configuring the LOMlite2 to Restart the Server Automatically After a Lockup.

Viewing the LOMlite 2 Configuration (lom -c)

single-step bulletTo view the settings of all the configurable variables for the LOMlite2 device, type:

# lom -c
LOMlite configuration settings:
serial escape character=#
serial event reporting=default
Event reporting level=fatal, warning & information
Serial security=enabled
Disable watchdog on break=enabled
Automatic return to console=disabled
alarm3 mode=user controlled
firmware version=4.0
firmware checksum=f92e
product revision=1.4
product ID=Netra 20
# 

Viewing All Component Status Data and the LOMlite2 Configuration Data (lom -a)

single-step bulletTo view all the status data stored by the LOMlite2 device plus the details of the device's own configuration, type:

# lom -a

Viewing the Event Log (lom -e)

single-step bulletTo see the event log, type:

# lom -e n,[x]

where n is the number of reports (up to 128) that you want to see and x specifies the level of reports you are interested in. There are four levels of events:

1. Fatal events

2. Warning events

3. Information events

4. User events

If you specify a level, you will see reports for that level and above. For example, if you specify level 2, you will see reports of level 2 and level 1 events. If you specify level 3, you will see reports of level 3, level 2, and level 1 events.

If you do not specify a level, you will see reports of level 3, level 2, and level 1 events.

CODE EXAMPLE 11-1 shows a sample event log display. Note that the first event is the oldest and that each event has a date-stamp indicating the days, hours and minutes since the system was last booted.

CODE EXAMPLE 11-1 Sample LOMlite2 Device Event Log (Oldest Event Reported First)
# lom -e 10
LOMlite Event Log:
+0h0m21s host reset
6/15/2001 17:35:28 GMT LOM time reference
+0h3m20s  fault led state - ON
+0h3m24s  fault led state - OFF
+0h39m34s Alarm 1 ON
+0h39m40s Alarm 3 ON
+0h39m54s Alarm 3 OFF
+0h40m0s Alarm 1 OFF
+0h48m52s fault led state - OFF
+0h49m39s Fan 1 FATAL FAULT: failed
+0h50m58s fault led state - ON
# 


Configuring the LOMlite2 to Restart the Server Automatically After a Lockup

You can configure the LOMlite2 device to restart the server automatically after a lockup. The LOMlite2 device has a watchdog process which, by default, expects to be patted every 10000 milliseconds, i.e., every 10 seconds. If it does not receive a pat after 40000 milliseconds (default)--i.e., every 40 seconds--the LOMlite2 device turns on the front and back Fault LEDs and generates a LOM event report. However, it does not automatically restart the system unless you have configured it to do so.

Configuring the LOMlite2 Watchdog to Restart the System After a Lockup

single-step bulletRemove the hash (`#') from the following line in the script file /etc/rc2.d/S25lom to enable the LOMlite2 watchdog process:

priocntl -e -c RT lom -W on,40000,10000 -R on

When you have done this, the LOMlite2 device will restart the server whenever the watchdog times out.

You can turn the option on and off from the UNIX # prompt. For more information, see Setting the Hardware Reset Option From a Script or Command (lom -R on).

However, as long as you have the -R on option set in /etc/rc2.d/S25lom, the Hardware Reset option will always be enabled when you start the system.

Enabling the LOMlite2 Watchdog Process From Your Own Script or Command (lom -W on)



Note - You do not normally need to do this. If you want to configure the LOMlite2 device to perform an automatic server restart after a lockup, see Stopping LOMlite2 from Sending Reports to the Lom Console Port (lom -E off). Only use the lom -W on option on the command line or in another script file if for some reason you have removed the /etc/rc2.d/S25lom script.



The LOMlite2 watchdog process is disabled by default. To enable the watchdog process type:

# priocntl -e -c RT lom -W on,40000,10000

The number 40000 on this command line indicates the watchdog's timeout period in milliseconds; you can specify a different number. The number 10000 indicates its pat interval in milliseconds; again, you can specify a different number.



Note - Do not specify a watchdog timeout period of less than 5000 milliseconds. If you do, you might find that the watchdog times out frequently even though the server has not locked up. And this could cause your server to panic unnecessarily.



If the watchdog process times out (in other words, if it does not receive its expected pat), the LOMlite2 device will turn on the server's front and back Fault LEDs and generate a LOM event report. However, it will not automatically reset the system. To make it reset the system, you must use the -R option.

single-step bulletIf you have no LOMlite2 watchdog process running already and you want the process to run, type the following, or add it to another script file:

# lom -W on,40000,10000

single-step bulletIf you want the LOMlite2 device to perform an automatic server restart after a lockup, you must include the -R on option in the command, as follows:

# lom -W on,40000,10000 -R on



Note - Unless you include the lom -W on and -R on options in a script file, you will need to execute the lom command every time you reboot the system if you want to use the automatic server restart facility. Otherwise the watchdog will not run, and the server will not reset after a lockup.



Setting the Hardware Reset Option From a Script or Command (lom -R on)

To force the LOMlite2 watchdog to trigger an automatic server restart (ASR) after a lockup, add the -R on option to the command in the /etc/rc2.d/S25lom script file. This is the script that runs the watchdog. For full instructions about how to do this, see Configuring the LOMlite2 Watchdog to Restart the System After a Lockup.

single-step bulletHowever, if for any reason you are not using the script file provided with your system (/etc/rc2.d/S25lom) but have instead enabled the watchdog from the command line or from another script file, you can turn the Hardware reset option on by typing the following at the command line:

# lom -R on

single-step bulletTo turn the Hardware reset option off from the command line, type:

# lom -R off


Other LOM Tasks Performed From Solaris

This section explains how to turn the alarms and Fault LEDs on and off by using the lom command. It also explains how to:

Turning Alarms On and Off (lom -A)

There are three alarms associated with the LOMlite2 device. They are associated with no specific conditions but are software flags available to be set by your own processes or from the command line.

single-step bulletTo turn an alarm on from the command line, type:

# lom -A on,n

where n is the number of the alarm you want to set: 1, 2, or 3.

single-step bulletTo turn the alarm off again, type:

# lom -A off,n

where n is the number of the alarm you want to turn off: 1, 2, or 3.

Turning the Fault LED On and Off (lom -F)

single-step bulletTo turn the Fault LED on, type:

# lom -F on

single-step bulletTo turn the Fault LED off again, type:

# lom -F off

Changing the lom> Prompt Escape Sequence
(lom -X)

The character sequence #. (hash, dot) enables you to escape from Solaris to the lom> prompt.

single-step bulletTo change the first character of this default lom escape sequence, type:

# lom -X x

where x is the alpha-numeric character you want to use instead of #.



Note - If you are at the console and you type the first character of the LOM escape sequence (by default this is #), there is a second's delay before the character appears on the screen. This is because the system waits to see if you type the dot (.) character next. And, if you do, the lom> prompt appears. If you do not, the # character appears. If you want to change the LOM escape character, use a character that is not included in many console commands. Otherwise the delay between when you strike the key and when the character appears on the screen may affect your typing at the console.



Stopping LOMlite2 from Sending Reports to the Lom Console Port (lom -E off)

LOMlite2 event reports can interfere with information you are attempting to send or receive on the LOM console port.

single-step bulletTo stop the LOMlite2 device from sending reports to the LOM console port, type:

# lom -E off

By default, the LOM console port is shared by the console and the LOMlite2 device. The LOMlite2 interrupts the console whenever it needs to send an event report. To prevent the LOMlite2 from interrupting the console on Serial A/LOM, turn serial event reporting off.

single-step bulletTo turn serial event reporting on again, type:

# lom -E on

If you want to dedicate the LOM console port to the LOMlite2 device and to use the Serial B port as your console port, see Separating LOMlite2 From the Console on the LOM Console Port.

Removing Driver Protection From the LOMlite2 Driver (lom -U)

By default, the LOMlite2 driver cannot be unloaded. This is because the driver is required by the watchdog process and event reporting. If you unload the driver and you have configured the system to restart when the watchdog times out, the watchdog will time out causing a system reset. For information about configuring the system to restart automatically after a lock-up, see Configuring the LOMlite2 to Restart the Server Automatically After a Lockup).

To remove driver protection from the LOMlite2 driver so that you can unload the driver:

1. Turn the watchdog process off by typing:

# lom -W off

2. Unload the driver by typing:

# lom -U

Making the LOMlite2 Interface Backward Compatible (lom -B)

If you have scripts written to the LOMlite interface on the Netra t1 Model 100/105 server or the Netra t 1400/1405 server and you want to use these scripts on the Netra 20 server, you can add file system links that make this possible. To do so, simply type:

# lom -B

When you have done this, you will be able to use the old scripts on the new system.

Upgrading the LOMlite2 Firmware
(lom -G filename)

To upgrade the firmware on the LOMlite2 device, obtain the new firmware package from SunSolveSM or from your local Sun Sales representative, and type the following:

# lom -G filename

where filename is the name of the file containing the new firmware.



Note - LOMlite2 firmware upgrades will be released as patches and will include detailed installation instructions.