C H A P T E R 3 |
Administering Your System |
You administer your system using the alarm card command-line interface and through the MOH application.
The alarm card CLI works with the MOH and PMS applications, and supports Simple Network Management Protocol (SNMP) and Remote Method Invocation (RMI) interfaces. MOH provides the SNMP and RMI interfaces to manage the system and send out events and alerts. CLI provides an overlapping subset of commands with MOH and also provides commands for the alarm card itself; sending out events and alerts is not a function of the CLI.
This chapter contains the following sections:
The alarm card command-line interface provides commands to control power of the system, control the CPU nodes, administer the system, show status, and set configuration variables. See Accessing the Alarm Card for information on how to access the alarm card.
TABLE 3-1 lists the alarm card command-line interface commands by type, command name, default permission required to use the command, and command description. A -h option with a command indicates that help is available for that command.
Default permission levels are:
The permission level for a user can be changed with the userperm command.
Display a summary of current environmental information, such as fan and power supply status. |
|||
Display the current network configuration of the alarm card. |
|||
Display the value of serial_mode for the specified port number. |
|||
Display the value of serial_baud for the specified port number. |
|||
Display the value of serial_parity for the specified port number. |
|||
Display the value of serial_stop for the specified port number. |
|||
Display the value of serial_data for the specified port number. |
|||
Display the value of serial_hwhandshake for the specified port number. |
|||
Display the value of ip_netmask for the specified port number. |
|||
Display the value of ip_gateway for the specified port number. |
|||
Display FRU ID information. Refer to Displaying Netra CT Server FRU ID Information for more information. |
|||
Display the value of the alarm card flash update service mode. |
|||
Display the board type, power state, and boot state for each CPU board in the system. |
|||
Power off the specified CPU node slot, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410; if no node is specified, power off the whole system. |
|||
Power on the specified CPU node slot, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410; if no node is specified, power on the whole system. |
|||
Enter console mode and connect to the specified CPU node, where cpu_node can be 1 to 7 on a Netra CT 810 or 3 to 5 on a Netra CT 410. |
|||
Display the contents of the alarm card console run log or orun log. The init option clears both the run and orun logs. The consolehistory command may be abbreviated to chist. |
|||
Copy the alarm card console run log (run buffer) into the old log (orun buffer), overwriting the previous contents; then clear the run buffer. |
|||
Put the server in debug mode, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410. |
|||
Reset (reboot) a specified server, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410; ac is the alarm card; host is the host CPU board. reset cpu_node produces a soft reset (reboots the operating system); reset -x produces a hard reset (reboots the board). |
|||
Set whether a panic dump is generated when a CPU node is reset. |
|||
Show whether or not a panic dump has been set for a specific CPU node. |
|||
Set the escape character to end a console session. The default is a ~ (tilde). |
|||
Show the healthy information of a CPU node, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410. |
|||
Display help information on starting, stopping, and controlling the PMS daemon on the alarm card. Refer to Enabling the Processor Management Service Application and to Using the PMS Application for Recovery and Control of CPU Boards for more information. |
|||
Add a user account. The default user account is netract. The alarm card supports 16 accounts. |
|||
Set or change the permission levels for a specified user account. |
|||
Add an MOH user account. The alarm card supports five MOH user accounts. |
|||
Set or change the permission levels for a specified MOH user account. |
|||
Flash update the alarm card software, where cmsw represents the chassis management software;. bcfw represents the boot control firmware; bmcfw represents the BMC firmware; rpdf represents the system configuration repository; and scdf initializes the system configuration variables to their defaults. Refer to Updating the Alarm Card Flash Images for more information. |
|||
The primary boot device for the alarm card is always the flash. In case of flash failure, the secondary boot device is used. The default is rarp. |
|||
Configure the alarm card to be an NTP client. The NTP server IP address must be on the same subnet as the alarm card. The default is none. |
|||
Set FRU ID information. Refer to Specifying Netra CT Server FRU ID Information for more information. |
|||
Set whether the alarm card will reset itself if the PMS daemon and/or the MOH application exit. The default is false, that is, the alarm card will not reset itself. |
|||
Show the value of the setrecovery action the alarm card takes if the PMS daemon and/or the MOH application exit. |
|||
snmpconfig add|del|show access|trap community [readonly|readwrite] [ip_addr] |
Configure the alarm card SNMP interface for the MOH application. The default is readonly. Refer to MOH Configuration and SNMP for more information. |
||
Configure the alarm card RMI interface for the MOH application. The default is false. Refer to MOH Configuration and RMI for more information. |
|||
Display the name of the process that exited and caused an alarm card reset. |
|||
Set the mode of the specified serial port to tty or none.The default for COM2 is none, that is, no services are available on this port. |
|||
Set the baud rate of the specified serial port. The default is 9600. Valid values are: 1200, 4800, 9600, 19200, 38400, 56000. |
|||
Set the parity bit of the specified serial port. Valid values are none, odd, or even. The default is odd. |
|||
Set the stop bit of the specified serial port. Valid values are 1 or 2. The default is 1. |
|||
Set the number of data bits of the specified serial port. Valid values are 7 or 8. The default is 7. |
|||
Set the hardware handshake of the specified serial port. Valid values are true or false. The default is false. |
|||
Set the IP mode of the specified Ethernet port. Choose the IP mode according to the services available in the network (rarp, config) or to configure the port for failover (standby). The default for ENET1 is rarp, the default for ENET2 is none, that is, no services are available on this port. You must reset the server for the changes to take effect. |
|||
Set the IP address of the specified Ethernet port. The default is 0.0.0.0. This command is only used if the ipmode is set to config. You must reset the server for the changes to take effect. |
|||
Set the IP netmask of the specified Ethernet port. The default is 0.0.0.0. This command is only used if the ipmode is set to config. You must reset the server for the changes to take effect. |
|||
Set the IP gateway of Ethernet port 1. The default is 0.0.0.0. You must reset the server for the changes to take effect. |
|||
Set the hostname to be used in the CLI prompt. The default is netract. The maximum length is 32 characters. |
|||
When the servicemode is set to true, MOH and PMS services are stopped for the alarm card flash update. Refer to Updating the Alarm Card Flash Images for more information. |
|||
pmsd start [-p port_num] [-e server_admin_state] [-t tick_interval][-d] |
Start PMS on the alarm card or a CPU board. The -t option can only be used on a CPU board. |
||
Set the IP address for the alarm card to control and monitor a CPU board. |
|||
Print the IP address set with the pmsd slotaddressset command. |
|||
pmsd slotrndaddressadd -s slot_num|all -n ip_addr
|
Add address information for a CPU board to control other CPU boards. |
||
Delete address information added with the pmsd slotrndaddressadd command. |
|||
Print address information added with the pmsd slotrndaddressadd command. |
|||
pmsd operset -s slot_num|all -o maint_config|
|
|||
pmsd recoveryautooperset -s slot_num|all
|
|||
Print the configuration information affected by the recoveryautooperset command. |
|||
pmsd hwoperset -s slot_num|all -o powerdown|powerup|
|
|||
pmsd osoperset -s slot_num|all -o reboot|mon_enable|
|
|||
pmsd appoperset -s slot_num|all -o force_offline|
|
|||
Information on configuring alarm card ports, setting up user accounts, specifying FRU ID information, and starting the PMS daemon using the alarm card CLI is provided in Chapter 2. The PMS daemon commands are described in Using the PMS Application for Recovery and Control of CPU Boards.
A remote command-line session or a console session automatically disconnects after 10 minutes of inactivity.
Security is also provided through the permission levels and passwords set for each account.
You can update the alarm card flash images over the network. TABLE 3-2 shows the alarm card flash options.
There is no required sequence for flashing the alarm card; the following is a typical sequence: cmsw, bcfw, bmcfw, and rpdf.
You can update individual images if you want.
To Update All the Alarm Card Flash Images |
2. Set the servicemode to true by entering the following command:
Setting the servicemode to true allows the alarm card to be flash updated; it also stops the MOH and PMS services on the alarm card.
Note - In Step 3, the scdf option is not mandatory. Use it only if you want to initialize the system configuration variables to the defaults. |
3. Flash update all the alarm card images, and complete the process by entering the following commands:
where path is nfs://nfs.server.ip.address/directory/filename where the software to use in the flash is installed.
After you update rpdf, the alarm card resets itself. If you do not update rpdf, you must reset the alarm card manually.
To Update an Individual Alarm Card Flash Image |
2. Set the servicemode to true by entering the following command:
Setting the servicemode to true allows the alarm card to be flash updated; it also stops the MOH and PMS services on the alarm card.
3. Flash update an alarm card image, and complete the process by entering the following commands:
where option can be cmsw -f path, bcfw -f path, bmcfw -f path, or scdf, and path is nfs://nfs.server.ip.address/directory/filename where the software to use in the flash is installed. Note that if you want to update rpdf, you must set the servicemode to false before using the flashupdate command, and the alarm card will reset itself after finishing the rpdf update.
The alarm card does not support battery backup time-of-day because battery life cannot be monitored to predict end of life, and drift in system clocks can be common. To provide a consistent system time, set the date and time on the alarm card using one of these methods:
To Set the Alarm Card Date and Time Manually |
2. Set the date and time manually:
where mm is the current month; dd is the current day of the month; HH is the current hour of the day; MM is the current minutes past the hour; cc is the current century minus one; and yy is the current year.
To Set the Alarm Card Date and Time as an NTP Client |
2. Set the date and time as an NTP client:
where addr is the IP address of the NTP server. The NTP server must be on the same subnet as the alarm card.
This section describes how to use a remote shell to execute CLI commands on the alarm card in batch mode, and how to use the rsh command interactively.
Normally, the alarm card cannot execute batch commands. The alarm card scripting feature enables you to write scripts to execute alarm card CLI commands in batch mode on the alarm card, similar to using scripting in the Solaris OS. You run the scripts from a host or satellite CPU board in the same system as the alarm card.
As an example, using the scripting feature, you can write a script to configure an Ethernet port on the alarm card, and then check to make sure it is configured the way you want. This sample script runs the version command, and the setipmode, setipaddr, showipmode, and showipaddr commands for Ethernet port 2 on the alarm card:
The script includes the rsh command, the alarm card MCNet IP address, and the CLI command(s) to run. For information on the MCNet IP address, refer to Configuring the MCNet Interface; for information on the CLI commands, refer to TABLE 3-1.
All the alarm card CLI commands in TABLE 3-1 are supported in a script except for the following interactive commands: userpassword, mohuserpassword, password, console, and break.
For security reasons, you must be superuser on a host or satellite CPU board in the same system as the alarm card. The commands can be run only over the MCNet interface.
To Run a Script on the Alarm Card |
rsh alarm_card_MCNet_ipaddress CLI_command rsh alarm_card_MCNet_ipaddress CLI_command rsh alarm_card_MCNet_ipaddress CLI_command rsh alarm_card_MCNet_ipaddress CLI_command ... |
where alarm_card_MCNet_ipaddress is the MCNet IP address of the alarm card, and CLI_command is the CLI command you want to run.
4. As superuser, run the script:
where path is the path to the script and filename is the name of the script.
Before executing the commands in the script, the alarm card verifies that the commands are being run by a root user on a host or satellite CPU board in the same system as the alarm card, and that the commands have been received over the MCNet.
A root user on a host or satellite CPU board in the same system as the alarm card can use the rsh command interactively with the CLI commands userpassword or mohuserpassword.
where alarm_card_MCNet_ipaddress is the MCNet IP address of the alarm card. After the CLI command is accepted, you are prompted for a username and password.
The alarm card keeps console logs, event logs, and debugging logs.
The alarm card console logs contain messages received from the host CPU board. There are two types of console logs:
The run and orun logs together can contain up to 16 Kbytes of data.
To View Console Logs |
2. View a console log with the consolehistory command:
where index n is the number of lines to display from either the oldest log entry forward (positive index) or the most recent log entry back (negative index); and pause x is the number of lines to display before pausing (default pause value is 10 lines). For example, to display the contents of the run log, pausing after 20 lines at a time, enter the following:
If no options are specified, the consolehistory command prints out the entire contents of all nonempty console logs.
You can use the consolehistory -init command to clear both the run and orun logs.
The alarm card event log contains event history, that is, all events that change the state of the system. The log entries are stored in the circular buffer of the alarm card RAM. The buffer holds up to 2,048 log entries; it is reset if the alarm card is reset.
A log entry includes the time of the event, a hostname, a unique event ID, and a description of the event. For example:
hostname cli> loghistory Feb 3 02:38:10 netract: 0009: Alarm Card Booted Feb 3 02:38:11 netract: 0004: ENET2 now DOWN Feb 3 02:39:57 netract: 0022: User netract Logged on ... |
To View the Event Log |
2. View an event log with the loghistory command:
where index n is the number of lines to display from either the oldest log entry forward (positive index) or the most recent log entry back (negative index); and pause n is the number of lines to display before pausing (the default is to display the entire log without pausing). For example, to display the last 30 lines of the event log, enter the following:
The alarm card debugging log contains the name of the last key process that exited and caused a reset of the alarm card, or it is empty. For example:
Debugging log data remains in the alarm card flash until:
To View the Debugging Log |
2. View the debugging log with the debuglog command:
If there is no process information output from this command, the debugging log has been cleared. Otherwise, information on the last process that caused the alarm card reset is displayed.
To Clear the Debugging Log |
2. Clear the debugging log with the debuglog reset command:
Host and satellite CPU boards can boot from a local disk or over the network.
By default, the OpenBoot PROM NVRAM boot-device configuration variable is set to disk net, disk being an alias for the path to the local disk, and net being an alias for the path of the primary network. You can set the boot device for CPU boards through the alarm card CLI setfru command. Refer to Configuring a Chassis Slot for a Board for more information on using the setfru command to specify a boot device for a board.
When the alarm card powers on a board in a slot, the OpenBoot PROM firmware checks with the alarm card for a boot device for that slot. The alarm card sends the value from the Boot_Devices field in FRU ID to the OpenBoot PROM firmware; the value is either the boot device list for that slot you set using the setfru command or a null string if you did not set a boot device list for that slot. The value overwrites the NVRAM boot-device value.
In the event of an alarm card fault, a CPU board hot-swap, power cycle, reboot or reset will cause the OpenBoot PROM firmware to default to the value set in the boot-device variable.
You can configure Netra CT CPU boards to boot over DHCP. This process includes setting the CPU board boot device for DHCP, forming the CPU board DHCP client ID, and configuring the DHCP server.
On the Netra CT system, the DHCP client ID is a combination of the system's midplane Sun part number (7 bytes), the system's midplane Sun serial number (6 bytes), and the board's geographical address (slot number) (2 bytes). The parts are separated by a : (colon).
To Configure a CPU Board to Boot Over DHCP |
2. Set the boot device for the board to dhcp with the setfru command:
where fru_instance is the slot number of the board to be configured for DHCP and network_devicename is a path or alias to a network device. For example, to set the boot device to dhcp for the CPU board in slot 4, enter the following:
3. Get the Netra CT system part number and the system serial number with the showfru command:
4. Form the three-part client ID by using the system part number, the system serial number, and the slot number, separated by colons.
For example, if the output from the showfru commands in Step 3 is 375-4335 (Sun part number) and 000001 (Sun serial number), and you want to form the client ID for the CPU board in slot 4, the client ID is: 3754335:000001:04.
5. Translate the client ID to its ASCII equivalent. For example:
Thus, the example client ID in ASCII is:
33 37 35 34 33 33 35 3A 30 30 30 30 30 31 3A 30 34.
Refer to the Solaris DHCP Administration Guide for information on how to configure the DHCP server for remote boot and diskless boot clients.
The client ID is retained across a CPU board power cycle, reboot, or reset; the alarm card updates the client ID during a first-time power on or a hot-swap of a CPU board. In the event of an alarm card fault, a CPU board reboot or reset will retrieve the previously written client ID.
The Netra CT system provides the capability to connect to CPU boards and open console sessions from the alarm card.
You begin by logging in to the alarm card through either the serial port or the Ethernet port. Once a console session with a CPU board is established, you can run Solaris system administration commands, such as passwd, read status and error messages, or halt the board in that particular slot.
To enable your system to use multiple consoles, you set several variables, either at the Solaris level or at the OpenBoot PROM level. Set these variables on each CPU board to enable console use.
To Configure Your System for Multiple Consoles |
1. Log in as superuser to the CPU board, using the on-board console port ttya.
2. Enter either set of the following commands to enable multiple consoles:
Once you have configured your system for multiple console use, you can log in to the alarm card and open a console for a slot. The Netra CT system allows four console users per slot.
TABLE 3-3 shows the alarm card CLI console-related commands that can be executed from the current login session on the alarm card.
Enter console mode and connect to a specified CPU board, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410 server. If cpu_node is not specified, connect to the host CPU board. |
|
Put the specified CPU board in debug mode, where cpu_node can be 1 to 8 on a Netra CT 810 or 1 to 5 on a Netra CT 410 server. Debug mode can use OpenBoot PROM or kadb, depending on server configuration. |
|
Set the escape character to be used in all future console sessions. The default is ~ (tilde). Refer to TABLE 3-4 for escape character use. |
|
Most CPU board consoles use the MCNet bus, but a board at the OpenBoot PROM level connects over the IPMI bus. There can be only one console user on the IPMI bus at any one time.
For example, if the board in slot 4 is at the OpenBoot PROM level, the user opening a console session will connect to it over the IPMI bus. This will cause the IPMI bus to be fully occupied and no other users can connect over that bus. If they try, an error message displays. However, other users can connect to boards in other slots over the MCNet bus. The MCNet bus is faster than the IPMI bus, while the IMPI bus is typically a more stable communication channel than the MCNet bus.
Once you have a console connection with a CPU board, you can issue normal Solaris commands. There are several escape character sequences to control the current session. TABLE 3-4 shows these sequences.
Break from the Solaris level and enter the OpenBoot PROM (debug) level. |
|
Determine the status (MCNet or IMPI) of the current console. |
|
To Start a Console Session From the Alarm Card |
You can log in to the alarm card through a terminal attached to either the serial port connection or the Ethernet port connection.
2. Open a console session to a board in a slot:
where cpu_node is 1 to 7 on a Netra CT 810 system or 3 to 5 on a Netra CT 410 system. For example, to open a console to the board in slot 4, enter the following:
You now have access to the board in slot 4. Depending on the state of the board in that particular slot, and whether the previous user logged out of the shell, you see one of several prompts:
To Determine the Status of the Current Console |
Enter the escape sequence ~g at the start of a new line:
A message displays, indicating the current state of the console connection. The message is either:
This means the console is in Solaris mode or OpenBoot PROM mode.
This means the console is in Solaris mode.
To Toggle Between MCNet and IPMI |
Toggling between MCNet and IPMI could be useful for troubleshooting. For example, if the console stops working for some reason, you could try toggling to IPMI (the more reliable communication channel).
1. If the CPU board is in Solaris mode, enter the escape sequence ~t:
The console switches between MCNet and IPMI mode. The console now fully occupies the IPMI bus. No other console may be at the OpenBoot PROM level at the same time. If another user attempts to access a board that is occupying the IPMI bus, the console connection will fail.
2. To return to MCNet mode, enter ~t again and press enter:
To Break into OpenBoot PROM from the Console |
At the Solaris prompt, enter the escape sequence ~b:
The console mode switches to IPMI:
You can now debug from the OpenBoot PROM level.
To End the Console Session |
1. (Optional) Log out of the Solaris shell.
2. At the prompt, disconnect from the console by entering the escape sequence ~. (tilde period):
Disconnecting from the console does not automatically log you out from the remote host. Unless you log out from the remote host, the next console user who connects to that board sees the shell prompt of your previous session.
To Show the Current Escape Character |
At the alarm card prompt, enter the following command:
The current escape character is displayed:
To Change the Default Escape Character |
At the alarm card prompt, enter the following command:
where value is any printable character. For example, to change the default escape character from ~ (tilde) to # (pound sign), enter the following:
The pound sign is now the escape character for all future console sessions.
This section describes specifying recovery operations and controlling CPU boards through the alarm card PMS CLI commands.
You specify the recovery configuration of a CPU board by using the command pmsd operset -s slot_num|all (a single slot number or all slots in the Netra CT system containing a CPU board) and the recovery mode for the specified slot(s).
The recovery configuration can be maintenance mode, operational mode, or none mode. Maintenance mode means the alarm card's automatic recovery of a CPU board is disabled, and PMS applications are started in an offline state, so that you can use manual maintenance operations. Operational mode means the alarm card's automatic recovery of a CPU board is enabled; the alarm card will recover the CPU board in the event of a monitoring fault, and start PMS applications in an active state. None mode means the alarm card's automatic recovery mode may be manually enabled or disabled; PMS application states are not enforced.
The mode is stored in persistent storage. You specify the operation to be performed on the specified slot by using the option -o with the parameter maint_config (set the hardware, operating system, and applications into maintenance mode), oper_config (set the hardware, operating system, and applications into operational mode), none_config (set the hardware, operating system, and applications into no enforcement mode), or graceful_reboot (bring the applications offline if needed and then reboot the operating system).
To Specify the Recovery Configuration of a CPU Board |
2. Configure the automatic recovery mode with the operset command:
where slot_num can be a slot number from 1 to 8, and all specifies all slots containing CPU boards. For example, to make PMS' recovery operational for the entire Netra CT server, enter:
The pmsd infoshow -s slot_num|all command can be used to print the recovery configuration and alarm status for the recovery configuration.
The pmsd historyshow -s slot_num|all command can be used to print a recovery configuration and runtime message log. The log is printed to the terminal performing the operation.
You can perform detailed, manual recovery operations on a board or instruct PMS to perform detailed, automatic recovery operations on a board using the CLI. The operations are performed across the hardware, the operating system, and the applications.
For manual recovery, use the pmsd recoveryoperset -s slot_num|all command. This command can only be run when the board is in maintenance mode or none mode (PMS applications are offline). You specify the recovery operation to be performed on the specified slot by using the option -o with the parameters: pc (power cycle), rst (reset), rstpc (reset, then power cycle), pd (power down), or rb (reboot).
For automatic recovery, use the recoveryautooperset -s slot_num|all command. This command instructs PMS what to do in response to a fault when the board is in operational mode (PMS applications are active).
You specify the automatic recovery operation to be performed on the specified slot by using the option -o with the parameters: pc (power cycle), rst (reset), rstpc (reset, then power cycle), pd (power down), rb (reboot), rbpc (reboot, then power cycle), none (no recovery), or trg (manually simulate a fault to trigger a recovery). Optional parameters for automatic recovery include: -d startup delay (the time in deciseconds between a fault occurrence and the start of a recovery operation; default is 0 deciseconds), -f failure power off|on (whether a power down operation will occur if the recovery operation fails; on specifies power down will occur and off specifies that power down will not occur; the default is off), -r retries (the number of times a recovery operation can occur and fail before it is terminated; the default is one try), -n inter-operation delay (the time in deciseconds between one and the next operation for an operation with multiple retries; default is 0 deciseconds), and -p reset power-cycle delay (the time in deciseconds to be waited between the reset and power cycle portions of the recovery operation before a failed reset is declared and the power cycle portion of the operation starts; default is 0 deciseconds).
To Manually Recover a Board |
2. Perform manual recovery operations on a board with the recoveryoperset command:
where slot_num can be a slot number from 1 to 8, and all specifies all slots containing CPU boards. For example, to instruct PMS to reboot slot 5 after a fault, enter the following:
To Automatically Recover a Board |
2. Perform automatic recovery operations on a board with the recoveryoperset command:
where slot_num can be a slot number from 1 to 8, and all specifies all slots containing CPU boards. For example, to instruct PMS to automatically reboot slot 5 after a fault, with the default delays, retries, and failure power state, enter the following:
The pmsd recoveryautoinfoshow -s slot_num|all command can be used to print information showing the configuration information affected by the recoveryautooperset command.
PMS can perform operations on a board's hardware, the operating system, and applications. You can specify that PMS performs operations on one of these, rather than all.
The pmsd hwoperset -s slot_num|all command performs operations on the hardware. The operations can only be performed in maintenance or none mode unless the optional -f parameter is used. You specify the operation to be performed on the specified slot by using the option -o with the parameters: powerdown (set the hardware to the power-off state), powerup (set the hardware to the power-on state), reset (reset the hardware), mon_enable (enable health monitoring of the hardware), or mon_disable (disable health monitoring of the hardware). The optional -f parameter can be used to perform the operation even if applications are in the active state, and the slot is in operational mode.
The pmsd hwinfoshow -s slot_num|all command can be used to print PMS system information on the hardware state, monitoring status, and alarm status (whether an alarm was generated).
The pmsd hwhistoryshow -s slot_num|all command can be used to print a short log (one-line descriptions) of messages pertaining to changes in the hardware's operation. The log is printed to the terminal performing the operation.
The pmsd osoperset -s slot_num|all command performs operations on the operating system. The operations can only be performed in maintenance or none mode unless the optional -f parameter is used. You specify the operation to be performed on the specified slot by using the option -o with the parameters: reboot (reboot the operating system), mon_enable (enable health monitoring of the operating system), or mon_disable (disable health monitoring of the operating system). The optional -f parameter can be used to perform the operation even if applications are in the active state, and the slot is in operational mode.
The pmsd osinfoshow -s slot_num|all command can be used to print PMS system information on the operating system state, monitoring status, and alarm status (whether an alarm was generated).
The pmsd oshistoryshow -s slot_num|all command can be used to print a short log (one-line descriptions) of messages pertaining to changes in the operating system's operation. The log is printed to the terminal performing the operation.
The pmsd appoperset -s slot_num|all command performs operations on the applications. You specify the operation to be performed on the specified slot by using the option -o with the parameters: force_offline (force the applications to an offline state), vote_active (move the group of applications to the active state only if all of the applications agree to be moved), or force_active (force the applications to the active state).
The pmsd appinfoshow -s slot_num|all command can be used to print PMS system information on the applications' state and alarm status (whether an alarm was generated).
The pmsd apphistoryshow -s slot_num|all command can be used to print a short log (one-line descriptions) of messages pertaining to changes in the applications' operation. The log is printed to the terminal performing the operation.
The pmsd version command prints the current version of pmsd.
The pmsd usage command prints a synopsis of the pmsd commands.
The Netra High Availability (HA) Suite software provides enhanced services for customer high-availability applications. When installed, it runs on the host and satellite CPU boards. The Netra HA Suite provides reliable (redundant) services across CPU boards; you can fail over from one CPU board in one Netra CT system to another CPU board in another Netra CT system.
The MOH and PMS applications integrate with these Netra HA Suite foundation services: reliable NFS, reliable DHCP/boot server, and CGTP (Carrier-Grade Transport Protocol, providing IP packet services).
The MOH application has to manage these services, for example, monitoring the nfs and tftp daemons. It does this through the node manager agent (NMA). For example, if there is an NFS failure, the MOH application will detect this failure.
The points of interaction between the Netra CT server software and the Netra HA Suite are:
The Netra HA Suite starts RNFS, RDHCP, and CGTP by default. If you want to change the Netra HA Suite services that are started by default, configure the Process Monitor Daemon (PMD). Refer to the Netra HA Suite documentation for more information on how to do this.
The Netra CT PMS probe brings together the PMS partner list and the Netra HA Suite master and vice-master cluster. Refer to the pms API man pages for more information on partner lists; the man pages are installed by default in /opt/SUNWnetract/mgmt2.0/man.
Refer to the Netra HA Suite documentation for more information on this application.
This section describes various ways to monitor your system.
The alarm card CLI provides many commands to display system status. Refer to the alarm card CLI commands in the section, Using the Alarm Card Command-Line Interface, in particular the show commands, to view system status. The alarm card also keeps several logs; refer to Viewing Alarm Card Logs for more information.
The system status panel is a module designed to give feedback on the status of the key components within the Netra CT server. The system status panel has one set of LEDs for each component within that particular server. FIGURE 3-1 shows the LEDs on the system status panel for the Netra CT 810 server, and FIGURE 3-2 shows the LEDs on the system status panel for the Netra CT 410 server.
TABLE 3-5 describes the system status panel LEDs for the Netra CT 810 server.
TABLE 3-6 describes the system status panel LEDs for the Netra CT 410 server.
I/O boards or satellite CPU boards installed in slot 4 and 5 |
||
Host CPU front transition card or host CPU front termination board |
||
Each major component in the Netra CT 810 server or Netra CT 410 server has a set of LEDs on the system status panel that gives the status on that particular component. Each component has either the green Power and the amber Okay to Remove LEDs (FIGURE 3-3), or the green Power and amber Fault LEDs (FIGURE 3-4). Note that the components in the Netra CT servers all have the green Power LED, and they have either the amber Okay to Remove LED or the amber Fault LED, but not both.
A green system power LED and system power button are also located on the system status panel. When the system is off, the system power LED is unlit. Pressing the system power button when the system is off will start the power-on sequence. Once the system is completely powered on, the system power LED remains on.
When the system is powered on, pressing the system power button for less than 4 seconds will start the orderly power-off sequence--in a manner that no persistent operating system data structures are corrupted--indicated by a blinking LED. In the orderly power-off, applications in service may be abnormally terminated and no further services will be invoked by the CPU. Once the CPU has reached a quiescent state (run level-0, as if init 0 had been invoked), then the power supply(s) will turn off, indicated by the LED changing from a blinking state to the off state.
If the button is held down for 4 seconds or longer, the power supply(s) are turned off without any intervention of the CPU; that is, the "emergency" power-off sequence occurs.
The MOH collects information about individual field replaceable units (FRUs) in your system and monitors their operational status. MOH can also monitor certain daemons; for example, if you installed the Netra High Availability Suite, MOH monitors daemons through that application.
If you installed the Solaris patches for MOH in a directory other than the default directory, specify that path instead. You must start the MOH application as superuser.
Refer to TABLE 2-6 for the options available with ctmgx start.
Once MOH is running, it interfaces with your SNMP or RMI application to discover network elements, monitor the system, and provide status messages. Refer to the Netra CT Server Software Developer's Guide for information on writing applications to interface with the MOH application.
MOH software generates an SNMP trap if a memory error occurs in a memory module on a CPU board. The trap includes information, such as the time stamp, alarm severity, the specific problem, plus possible response to the particular memory error.
Information for the trap is generated by the cediag tool (/opt/SUNWcest/bin/cediag), which interacts with the dual inline memory modules (DIMMs). The cediag tool provides trap information to the ctmgx agent, which regularly polls the cediag tool for status. The polling period is configurable using the ctmgx.cediag.period parameter in the /opt/SUNWnetract/mgmt2.0/etc/ctmgx.conf file; the default is 1800000 milliseconds. Setting this parameter too low could result in too many processes running.
For additional troubleshooting information, refer to the Netra CT Server Service Manual.
Most FRUs in the Netra CT system are hot-swappable.[1] Hot-swap, a key feature of the PICMG standard, means that a CompactPCI board that meets the PICMG standard can be reliably inserted into or extracted from a powered and operating CompactPCI platform without affecting the other functions of the platform.
The Netra CT system hot-swap modes are shown in TABLE 3-10.
The Netra CT system is configured for full hot-swap by default. You can change the mode of the slot for the CPU boards and I/O boards to basic or full hot-swap using the cfgadm(1M) command. You might want to change the hot-swap state of a slot to basic, for example, if you need to insert or remove a third-party I/O board that does not have full hot-swap support.
Note that whenever you reboot or power your system on and off, the hot-swap states revert back to the default full hot-swap state for all I/O slots. If you configure the alarm card or the CPU boards for basic hot-swap, after a host CPU reboot, alarm card reset, or system power off, the alarm card comes up in a disconnected or unconfigured state; you must reconfigure it with the cfgadm command from the host CPU board.
Complete information on hot-swapping FRUs is contained in the Netra CT Server Service Manual.
By default, the Netra CT server is configured to accept any cPCI FRU unless you specifically set an allowable plug-in for a specific slot. Refer to Configuring a Chassis Slot for a Board for more information.
When a board is inserted into the Netra CT server, the alarm card checks the midplane FRU ID information for allowable FRUs for that slot, then checks the inserted board's FRU ID to make sure the board is allowed in the particular slot. If the board is allowed in the slot, the alarm card powers on the board. If the board is not allowed in the slot, the alarm card does not enable power to the slot.
If a host or satellite CPU board is in use, that is, has applications currently running, the alarm card CLI power commands, such as poweron or poweroff, will not work for that CPU board.
You might want to change the hot-swap state of a slot from full to basic if you need to insert or remove a third-party I/O board that does not have full hot-swap support.
To determine the current hot-swap state of a slot, use the prtconf(1M) command. To enable or disable a type of hot-swap on a slot, use the cfgadm(1M) command. For many cfgadm commands, you must know the attachment point ID for the I/O slot that you will be working on.
To Determine the Current Hot-Swap State of a Slot |
As superuser on the server, enter the command:
For a Netra CT 810 server, the output is similar to the following:
To List Attachment Point IDs for I/O Slots |
As superuser on the server, enter the command:
For a Netra CT 810 server, the output is similar to the following:
where the attachment point ID is shown in the first column of the readout; for example, the attachment point ID for I/O slot 2 in a Netra CT 810 server would be IO-2.
For a Netra CT 410 server, the output is similar to the following:
where the attachment point ID is shown in the first column of the readout; for example, the attachment point ID for I/O slot 4 in a Netra CT 410 server would be IO-4.
To Disable Full Hot-Swap and Enable Basic Hot-Swap |
As root on the server, enter the command:
where ap_id is the attachment point ID in the server that you want to have basic hot-swap enabled on.
To Re-Enable Full Hot-Swap |
As root on the server, enter the command:
where ap_id is the attachment point ID in the server that you want to have full hot-swap enabled on.
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.