Sun N1 System Manager 1.2 Administration Guide

Chapter 5 Monitoring Your Servers

The first section of this chapter provides an explanation of what monitoring is, in the context of the N1 System Manager, and describes how to monitor servers that are part of the N1 System Manager. This chapter provides procedures for enabling and disabling monitoring, and for managing monitoring thresholds using the command line.

This chapter also contains information about managing jobs, event log entries, and about setting up notifications.

This chapter contains the following sections:

Some procedures are also possible using the browser interface. These procedures are provided in the Sun N1 System Manager browser interface help.

Introduction to Monitoring

Monitoring in the Sun N1 System Manager software enables you to track changes to specific attributes in specific managed objects. Managed objects include server hardware elements, operating systems, file systems, and networks. Attributes are the monitored elements, about which data is obtained and delivered by the N1 System Manager software. Examples of attributes are the average number of queued processes and the percentage of used memory. A list of attributes is provided in Hardware Sensor Attributes and in Table 5–2.

Attributes are associated with three main areas:

For a server or a group of servers, hardware health and operating system health and network connectivity are all monitored by the management server. All comparisons and verifications for monitoring are performed by the N1 System Manager. Provisionable servers are used only to access data about their health or network reachability.

Monitoring is connected with the broadcasting of the events for each monitored server or group of servers. Events are generated when certain conditions related to attributes occur. For information about events and when they occur, see Managing Event Log Entries. Monitoring data is stored as events in the N1 System Manager database instead of log files.

If monitoring is enabled for a server, each event causes a notification to be emitted from the N1 System Manager for that event. If monitoring is disabled for a server, monitoring events are not generated for that server. Lifecycle events continue to be generated, even with monitoring disabled. Lifecycle events include server discovery, server change or deletion, or server group creation. If you have requested notification of this type of event, you can still receive notifications for that event, even with monitoring disabled.

An SNMP agent that is used for data retrieval is provided in the N1 System Manager software. If the management server is running the N1 System Manager on the Solaris OS, this agent is based on the Sun Management Center 3.5 software SNMP agent. If the management server is running the N1 System Manager on Linux, this agent is based on the Sun Management Center 3.6 Linux SNMP agent. The agent is deployed when operating systems are deployed on servers that are managed by the N1 System Manager software. The N1 System Manager passively listens for the traps generated by the SNMP agent whenever there is a threshold breach. In case the traps generated by the SNMP agent are lost, the N1 System Manager also performs to types of polling-based monitoring as a backup: accessibility monitoring and status monitoring. Accessibility monitoring makes sure that the N1 System Manager can access the OS agent. Status monitoring periodically retrieves the current status from the SNMP agent and reports if the status is not OK.


Note –

The default SNMP port for the agent for the monitoring feature is port 161. Changing the port number from the default is not supported in this release.


Hardware Health Monitoring

The hardware health of discovered servers is monitored. Sensors provided in the hardware are used to monitor temperature, voltage, and fan speed. For more information about associated hardware, see the Sun N1 System Manager Connection Information in Sun N1 System Manager 1.2 Site Preparation Guide.

Sensor data is retrieved from the service processor for SPARC devices through the Advanced Lights Out Manager (ALOM) interface. Sensor data is retrieved from IPMI for x64 servers.


Note –

Servers that use ALOM do not send data to the management server by use of traps. Instead, they send management data by email. To ensure that the management server collects data from these servers, configure the management server server as an email server. This process is explained in To Configure the ALOM Email Alert Settings in Sun N1 System Manager 1.2 Installation and Configuration Guide.


The following characteristics of server hardware can be monitored:

A detailed list of these sensors is provided in Hardware Sensor Attributes.

You can view filtered hardware health monitoring information for all servers by using the show server command:


N1-ok> show server hardwarehealth hardwarehealth

See show server in Sun N1 System Manager 1.2 Command Line Reference Manual for details of possible values of the hardwarehealth filters. For more information and a graphic explaining filtering servers by health state, see To View Failed Servers.

Hardware Sensor Attributes

For x86 servers, the management server software obtains the list of hardware sensor attributes to monitor through IPMI from the service processor of the server. For servers running the SPARC architecture, the ALOM interface is used. The list of hardware sensor attributes can vary from server to server, and between firmware versions. A sample listing for some servers and firmware versions is provided in this section. The attributes depend on the server type and on the number of CPUs that the server has.


Note –

Hardware disk failure and memory failure are not monitored in this version of the N1 System Manager.


The following example lists sensor names and descriptions for a Sun Fire V40z server with firmware version 2.1.0.16.


ambienttemp     Ambient air temp
bulk.v12-0-s0   Bulk 12V S0 voltage at CPU 0
bulk.v12-2-s0   Bulk 12V S0 voltage at CPU 2
bulk.v12-3-s0   Bulk 12V S0 voltage at CPU 3
bulk.v1_8-s0    Bulk 1.8V S0 voltage
bulk.v1_8-s5    Bulk 1.8V S5 voltage
bulk.v2_5-s0    Bulk 2.5V S0 voltage
bulk.v2_5-s0-dc Bulk 2.5V S0 voltage at DC
bulk.v2_5-s5    Bulk 2.5V S5 voltage
bulk.v3_3-s0    Bulk 3.3V S0 voltage
bulk.v3_3-s0-dc Bulk 3.3V S0 voltage at DC
bulk.v3_3-s3    Bulk 3.3V S3 voltage
bulk.v3_3-s5    Bulk 3.3V S5 voltage
bulk.v3_3-s5-dc Aux 3.3V S5 voltage at DC
bulk.v5-s0      Bulk 5V S0 voltage
bulk.v5-s0-dc   Bulk 5V S0 voltage at DC
bulk.v5-s5      Bulk 5V S5 voltage
bulk.v5-s5-dc   Bulk 5V S5 voltage at DC
cpu0.dietemp    CPU 0 Die temperature
cpu0.heartbeat  CPU 0 Heartbeat
cpu0.inlettemp  CPU 0 Inlet temperature
cpu0.memtemp    CPU 0 Memory temperature
cpu0.v2_5-s0    CPU 0 VDDA (2.5V) S0 voltage
cpu0.v2_5-s3    CPU 0 VDD (2.5V) S3 voltage
cpu0.vcore-s0   CPU 0 VCore S0 voltage
cpu0.vid        CPU 0 VID Selection
cpu0.vldt0      CPU 0 LDT0 voltage
cpu0.vtt-s3     CPU 0 DDR VTT S3 voltage
cpu1.dietemp    CPU 1 Die temperature
cpu1.heartbeat  CPU 1 Heartbeat
cpu1.inlettemp  CPU 1 Inlet temperature
cpu1.memtemp    CPU 1 Memory temperature
cpu1.v2_5-s0    CPU 1 VDDA (2.5V) S0 voltage
cpu1.v2_5-s3    CPU 1 VDD (2.5V) S3 voltage
cpu1.vcore-s0   CPU 1 VCore S0 voltage
cpu1.vid        CPU 1 VID Selection
cpu1.vldt1      CPU 1 LDT1 voltage
cpu1.vldt2      CPU 1 LDT2 voltage
cpu1.vtt-s3     CPU 1 DDR VTT S3 voltage
cpu2.dietemp    CPU 2 Die temperature
cpu2.heartbeat  CPU 2 Heartbeat
cpu2.inlettemp  CPU 2 inlet temperature
cpu2.temp       CPU 2 downwind temperature
cpu2.v2_5-s0    CPU 2 VDDA (2.5V) S0 voltage
cpu2.v2_5-s3    CPU 2 VDD (2.5V) S3 voltage
cpu2.vcore-s0   CPU 2 VCore S0 voltage
cpu2.vid        CPU-2 VID Selection
cpu2.vtt-s3     CPU 2 DDR VTT voltage
cpu3.dietemp    CPU 3 Die temperature
cpu3.heartbeat  CPU 3 Heartbeat
cpu3.inlettemp  CPU 3 inlet temperature
cpu3.temp       CPU 3 downwind temperature
cpu3.v2_5-s0    CPU 3 VDDA (2.5V) S0 voltage
cpu3.v2_5-s3    CPU 3 VDD (2.5V) S3 voltage
cpu3.vcore-s0   CPU 3 VCore S0 voltage
cpu3.vid        CPU-3 VID Selection
cpu3.vtt-s3     CPU 3 DDR VTT voltage
fan1.tach       Fan 1 measured speed
fan10.tach      Fan 10 measured speed
fan11.tach      Fan 11 measured speed
fan12.tach      Fan 12 measured speed
fan2.tach       Fan 2 measured speed
fan3.tach       Fan 3 measured speed
fan4.tach       Fan 4 measured speed
fan5.tach       Fan 5 measured speed
fan6.tach       Fan 6 measured speed
fan7.tach       Fan 7 measured speed
fan8.tach       Fan 8 measured speed
fan9.tach       Fan 9 measured speed
faultswitch     System Fault Indication
g0.vldt1        AMD-8131 PCI-X Tunnel 0 LDT1 voltage
g1.vldt1        AMD-8131 PCI-X Tunnel 1 LDT1 voltage
gbeth.temp      Gigabit ethernet local temperature
golem-v1_8-s0   AMD-8131 PCI-X Tunnel 1.8V S0 voltage
identifyswitch  Identify switch
scsibp.temp     SCSI Disk backplane temperature
scsifault       SCSI Disk Fault Switch
sp.temp         SP local temperature
vldt-reg1-dc    LDT Regulator 1 Voltage
vldt-reg2-dc    LDT Regulator 2 Voltage

The following example lists sensor names and descriptions for a Sun Fire V20z server with firmware version 2.1.0.16.


ambienttemp    Ambient air temp
bulk.v12-0-s0  Bulk 12v supply voltage (cpu0)
bulk.v12-1-s0  Bulk 12v supply voltage (cpu1)
bulk.v1_8-s0   Bulk 1.8v S0 voltage
bulk.v1_8-s5   Bulk 1.8v S5 voltage
bulk.v2_5-s0   Bulk 2.5v S0 voltage
bulk.v2_5-s5   Bulk 2.5v S5 voltage
bulk.v3_3-s0   Bulk 3.3v supply
bulk.v3_3-s3   Bulk 3.3v S3 voltage
bulk.v3_3-s5   Bulk 3.3v S5 voltage
bulk.v5-s0     Bulk 5v supply voltage
bulk.v5-s5     Bulk 5v S5 voltage
cpu0.dietemp   CPU 0 die temp
cpu0.heartbeat CPU 0 heartbeat
cpu0.memtemp   CPU 0 memory temp
cpu0.temp      CPU 0 low side temp
cpu0.v2_5-s0   CPU VDDA voltage
cpu0.v2_5-s3   CPU 0 VDDIO voltage
cpu0.vcore-s0  CPU 0 core voltage
cpu0.vid       CPU-0 VID output
cpu0.vldt1     CPU0 HT 1 voltage
cpu0.vldt2     CPU 0 HT 2 voltage
cpu0.vtt-s3    CPU 0 VTT voltage
cpu1.dietemp   CPU 1 die temp
cpu1.heartbeat CPU 1 heartbeat
cpu1.memtemp   CPU 1 memory temp
cpu1.temp      CPU 1 low side temp
cpu1.v2_5-s3   CPU 1 VDDIO voltage
cpu1.vcore-s0  CPU 1 core voltage
cpu1.vid       CPU-1 VID output
cpu1.vtt-s3    CPU 1 VTT voltage
fan1.tach      Fan 1 measured speed
fan2.tach      Fan 2 measured speed
fan3.tach      Fan 3 measured speed
fan4.tach      Fan 4 measured speed
fan5.tach      Fan 5 measured speed
fan6.tach      Fan 6 measured speed
faultswitch    Fault switch (source for eval)
g.vldt1        AMD-8131 PCI-X Tunnel HT 1 voltage
gbeth.temp     Gigabit ethernet temp
golem.temp     PCIX bridge temp
hddbp.temp     Disk drive backplane temp
identifyswitch Identify switch
ps.fanfail     Power Supply fan failure sensor
ps.tempalert   Power Supply too hot sensor
sp.temp        SP temp
thor.temp      AMD-8111 I/O Hub temp

Monitoring data is retrieved by the N1 System Manager from most these sensors.

For Sun Fire X4100 and Sun Fire X4200 servers, the following sensors are monitored:


Chassis Sensors:
sys.id               Indicates chassis type
sys.intsw            State of the Chassis Intrusion switch. When the chassis cover to 
                     the CPU area is opened this sensor logs an event
sys.psfail           LED indicator shows state of PS Fail / Rear LED 
                     on the front panel
sys.tempfail         LED indicator shows state of Over Temperature LED 
                     on the front panel
sys.fanfail          LED indicator shows state of Over Temperature LED 
                     on the front panel

Back Panel Sensors
bp.power             LED indicator shows state of the Power LED on the back panel
bp.locate            LED indicator shows state of the Locate LED on the back panel
bp.locate.btn        Monitors the state of the back panel locate button
bp.alert             LED indicator shows state of Alert LED on the back panel

Front Panel Sensors
fp.prsnt             Monitors the presence of the front panel board
fp.ledbd.prsnt       Monitors the presence of the front panel LED board
fp.usbfail           Monitors the front panel USB over current sensor
fp.power             LED indicator shows state of Power LED on the front panel
fp.locate            LED indicator shows state of Locate LED on the front panel
fp.locate.btn        Monitors the state of the front panel locate button
fp.alert             LED indicator shows state of Alert LED on the front panel

I/O Sensors
io.id0.prsnt         Monitors the 2-disk I/O board presence signal
io.id1.prsnt         Monitors the 4-disk I/O board presence signal
io.f0.prsnt          Monitors the physical presence of the rear blower
                     (Sun Fire X4200 chassis only)
io.f0.speed          Monitors the speed of the rear blower 
                     (Sun Fire X4200 chassis only)
io.f0.fail           LED indicator shows state of the I/O fan assembly
io.hdd0.fail         LED indicator shows state of the Hard Disk Drive 0 fault LED
io.hdd1.fail         LED indicator shows state of the Hard Disk Drive 1 fault LED
                     (Unused on the 2-disk Sun Fire X4100)
io.hdd2.fail         LED indicator shows state of the Hard Disk Drive 2 fault LED
                     (Unused on the 2-disk Sun Fire X4100)
io.hdd3.fail         LED indicator shows state of the Hard Disk Drive 3 fault LED
                     (Unused on the 2-disk Sun Fire X4100)

CPU 0 Sensors
p0.fail              LED indicator shows state of the CPU 0 fault LED
                     Illuminated for CPU voltage and temperature events
p0.d0.fail           LED indicator shows state of the CPU 0 DIMM 0 fault LED 
                     Illuminated in response to ECC errors
                     PAIR 0 includes this and p0.d1.fail, both LEDs in the same pair
                     will be illuminated at the same time when one indicates a fault
p0.d1.fail           LED indicator shows state of the CPU 0 DIMM 1 fault LED 
                     Illuminated in response to ECC errors
                     PAIR 0 includes this and p0.d0.fail, both LEDs in the same pair
                     will be illuminated at the same time when one indicates a fault
p0.d2.fail           LED indicator shows state of the CPU 0 DIMM 2 fault LED 
                     Illuminated in response to ECC errors
                     PAIR 1 includes this and p0.d3.fail, both LEDs in the same pair
                     will be illuminated at the same time when one indicates a fault
p0.d3.fail           LED indicator shows state of the CPU 0 DIMM 3 fault LED 
                     Illuminated in response to ECC errors
                     PAIR 1 includes this and p0.d2.fail, both LEDs in the same pair
                     will be illuminated at the same time when one indicates a fault

CPU 1 Sensors
p1.fail              Same as p0.fail, but for CPU 1
p1.d0.fail           Same as p0.d0.fail, but for CPU 1
p1.d1.fail           Same as p0.d1.fail, but for CPU 1
p1.d2.fail           Same as p0.d2.fail, but for CPU 1
p1.d3.fail           Same as p0.d3.fail, but for CPU 1

Power Supply Sensors
ps0.prsnt            Indicates whether Power Supply 0 is present
ps0.vinok            Indicates whether Power Supply 0 is connected to AC power
ps0.pwrok            Indicates whether Power Supply 0 is turned on and powering the system
ps1.prsnt            Indicates whether Power Supply 1 is present
ps1.vinok            Indicates whether Power Supply 1 is turned on and powering the system
ps1.pwrok            Indicates whether Power Supply 1 is turned on and powering the system

Fan Control Temperature Sensors
fp.t_amb             Monitors front panel ambient temperature
p0.t_core            Monitors CPU 0 core temperature
p1.t_core            Monitors CPU 1 core temperature

Other Temperature Sensors
mb.t_amb             Monitors ambient temperature from the internal temperature
                     sensor in the chip on the mainboard
pdb.t_amb            Monitors the ambient temperature of the power distibution board
io.t_amb             Monitors the ambient temperature from near the I/O area in the chassis

Mainboard Voltage Sensors
mb.v_bat             Monitors the 3V RTC battery on the mainboard
mb.v_+3v3stby        Monitors the 3.3V standby input that powers 
                     the service processor and other standby devices
mb.v_+3v3            Monitors the 3.3V main input that is active when the power is on
mb.v_+5v             Monitors the 5V main input that is active when the power is on
mb.v_+12v            Monitors the 12V main input that is active when the power is on
mb.v_-12v            Monitors the -12V main input that is active when the power is on
mb.+2v5core          Monitors the 2.5V core input that is active when the power is on
mb.+1v8core          Monitors the 1.8V core input that is active when the power is on
mb.+1v2core          Monitors the 1.2V core input that is active when the power is on

CPU 0 Voltage Sensors
p0.v_+1v5            Monitors the CPU 0 1.5V input
p0.v_+2v5core        Monitors the CPU 0 2.5V core input
p0.v_+1v2core        Monitors the CPU 0 1.2V core input

CPU 1 Voltage Sensors
p1.v_+1v5            Monitors the CPU 1 1.5V input 
p1.v_+2v5core        Monitors the CPU 1 2.5V core input
p1.v_+1v2core        Monitors the CPU 1 1.2V core input

Fan Presence Sensors (Sun Fire X4200 chassis only)
ft0.fm0.prsnt        Indicates the presence of Fan Tray 0, Fan Module 0
ft0.fm1.prsnt        Indicates the presence of Fan Tray 0, Fan Module 1
ft0.fm2.prsnt        Indicates the presence of Fan Tray 0, Fan Module 2
ft1.fm0.prsnt        Indicates the presence of Fan Tray 1, Fan Module 0
ft1.fm1.prsnt        Indicates the presence of Fan Tray 1, Fan Module 1
ft1.fm2.prsnt        Indicates the presence of Fan Tray 1, Fan Module 2

Fan Speed Sensors
ft0.fm0.f0.speed     Monitors speed of fan at Fan Tray 0, Fan Module 0, Fan 0
ft0.fm0.f1.speed     Monitors speed of fan at Fan Tray 0, Fan Module 0, Fan 1
                     (Sun Fire X4100 only)
ft0.fm1.f0.speed     Monitors speed of fan at Fan Tray 0, Fan Module 1, Fan 0
ft0.fm1.f1.speed     Monitors speed of fan at Fan Tray 0, Fan Module 1, Fan 1
                     (Sun Fire X4100 only)
ft0.fm2.f0.speed     Monitors speed of fan at Fan Tray 0, Fan Module 2, Fan 0
ft0.fm2.f1.speed     Monitors speed of fan at Fan Tray 0, Fan Module 2, Fan 1
                     (Sun Fire X4100 only)
ft1.fm0.f0.speed     Monitors speed of fan at Fan Tray 1, Fan Module 0, Fan 0
ft1.fm0.f1.speed     Monitors speed of fan at Fan Tray 1, Fan Module 0, Fan 1
                     (Sun Fire X4100 only)
ft1.fm1.f0.speed     Monitors speed of fan at Fan Tray 1, Fan Module 1, Fan 0
ft1.fm1.f1.speed     Monitors speed of fan at Fan Tray 1, Fan Module 1, Fan 1
                     (Sun Fire X4100 only)
ft1.fm2.f0.speed     Monitors speed of fan at Fan Tray 1, Fan Module 2, Fan 0
ft1.fm2.f1.speed     Monitors speed of fan at Fan Tray 1, Fan Module 2, Fan 1
                     (Sun Fire X4100 only)

For Sun Fire X2100 servers, only sensors describing fan speed, voltage, and temperature are used to retrieve data: Here is a list of sensors we monitored:


DDR 2.6V
CPU core Voltage
VCC 3.3V
VCC 5V
VCC 12V
Battery Volt
CPU TEMP
SYS TEMP
CPU FAN
SYSTEM FAN3
SYSTEM FAN1
SYSTEM FAN2

OS Health Monitoring

OS health can be monitored by the N1 System Manager. As part of the add server feature command, with the agentip keyword, you provide credentials to access the monitored server's operating system through ssh with the agentssh keyword. See To Add the OS Monitoring Feature for additional details. This procedure is important for OS health monitoring but not for monitoring hardware health or network reachability.

Adding the OS monitoring feature provides support for OS monitoring and enables monitoring by default. After that, monitoring can be disabled and enabled by use of the set server command. See Enabling and Disabling Monitoring for more information.

Platform OS interface data is obtained through ssh and SNMP. All attribute data is retrieved from the server's operating system by using ssh and SNMP. Statistics related to the central processor unit (CPU) are provided, as is data related to memory, swap usage, and file systems. For the purposes of monitoring, system load data, memory usage, and swap usage data can be categorized as follows:

A list of these attributes is provided in Hardware Sensor Attributes.

You can filter OS health monitoring information for all servers by using the show server command:


N1-ok> show server oshealth oshealth

See show server in Sun N1 System Manager 1.2 Command Line Reference Manual for details of possible values of the oshealth filters. For more information and a graphic explaining filtering servers by health state, see To View Failed Servers.

The health of an OS resource can be shown as unknown if the server is reachable but the agent for the monitoring feature cannot be contacted on SNMP port 161. The health of an OS resource can be shown as unreachable if the server is unreachable due to, for example, being in standby mode. See also Understanding the Differences Between Unreachable and Unknown States for Provisionable Servers.

The monitoring of OS health allows you to set specific thresholds for individual monitored servers, or for groups of monitored servers, at the command line by using the set command. See Setting Threshold Values for details.

If you are not interested in the values of some attributes, you can disable the threshold severity for monitoring of those attributes. This action prevents annoyance alarms. Example 5–6 shows you how to accomplish this disabling action.

Network Reachability Monitoring

All management interfaces of provisionable servers and all platform interfaces are monitored by default by the N1 System Manager. Platform interfaces include the service processor's management interface, such as eth0, and data network interfaces, such as eth1 or eth2.

Reachability is verified for Linux servers and servers running the Solaris OS by using an ICMP ping to the interface IP address. For further information, see Discovery of Servers in the Factory Default State in Sun N1 System Manager 1.2 Installation and Configuration Guide.

The reachability of all network interfaces is verified at regular intervals. The monitoring of network reachability is based on the IP address. If any monitored IP address is unreachable, an event is generated.

You can filter information for all servers by using the show server command with the appropriate parameters to view monitoring information. See show server in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

Understanding the Differences Between Unreachable and Unknown States for Provisionable Servers

Distinguishing between the unreachable and unknown states for provisionable servers is important.


N1-ok> show server oshealth unreachable

This command lists all provisionable servers that are unreachable. Any provisionable server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its hardware health status. The ping command to the server is unsuccessful. This behavior does not necessarily mean that the server is not transmitting hardware health status information. The server could be in standby mode.


N1-ok> show server oshealth unknown

This command lists all provisionable servers that are not returning any information about hardware health status. The ping command might be successful but servers returned in the output of this command are not returning any hardware health information. The agent for the monitoring feature could not be contacted on port 161.


N1-ok> show server power unreachable

This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its power status. The ping command to the server is unsuccessful. This behavior does not necessarily mean that the server is not transmitting power status information. The server could be in standby mode.


N1-ok> show server power unknown

This command lists all provisionable servers that are not returning any information about power status. The ping command might be successful but servers returned in the output of this command are not returning any power status information. The agent for the monitoring feature could not be contacted on port 161.


N1-ok> show server oshealth unreachable

This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its OS health. The ping command to the server is unsuccessful. This behavior does not necessarily mean that the server is not transmitting OS health information. The server could be in standby mode.


N1-ok> show server oshealth unknown

This command lists all provisionable servers that are not returning any information about OS health. The ping command might be successful but servers returned in the output of this command are not returning any OS health information. The agent for the monitoring feature could not be contacted on port 161.

Supporting Monitoring

Before full monitoring of a provisionable server can be enabled, monitoring must be supported for that server. Monitoring is supported for a server when the base management and OS monitoring features are installed on the server.

The base management and OS monitoring features are installed when a provisionable server's OS is installed or updated by use of the load group or load server commands. See load group in Sun N1 System Manager 1.2 Command Line Reference Manual and load server in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Note –

If the load server or load group command is used to install software on the provisionable server, and the provisionable server's networktype attribute is to dhcp, the feature attribute cannot be used. Therefore if you want to load the base management and OS monitoring features when loading an OS with the load server or load group commands, set the networktype attribute to static. In addition, if you the networktype attribute to dhcp, every time the server reboots you have to change the agent IP address as explained in To Modify the Agent IP for a Server.


The base management and OS monitoring features can also be installed or updated when the add server command is used, as explained in Adding and Upgrading Base Management and OS Monitoring Features.

If the OS monitoring feature is not installed and you use the set server monitored command to enable monitoring, only hardware health monitoring is enabled. OS monitoring is not enabled if this command is executed without the OS monitoring feature first being installed. See Enabling and Disabling Monitoring for more information.

Adding and Upgrading Base Management and OS Monitoring Features

The base management and OS monitoring features provide support for monitoring and patching the installed OS profiles, and for executing remote commands. This section describes how to add the base management and OS monitoring features, modify supported attributes, remove feature support, and upgrade the base management and OS monitoring features to the latest versions.

Adding the OS monitoring features provides support for monitoring and enables monitoring by default. You can subsequently enable and disable monitoring by using the set server command as explained in Enabling and Disabling Monitoring.

This section describes the following tasks:

ProcedureTo Add the Base Management Feature

This procedure describes how to add the base management feature on a server with a newly deployed OS. The base management feature is used to enable remote command execution and package deployment.


Note –

Uninstallation of the base management feature is not supported.


The agent IP used in this procedure is the IP address of the provisionable server's data network interface to be monitored by the management server. The interface can be eth1/bge1 or eth0/bge0, but usually is eth0/bge0. For more information on the server's agent IP address, see To Modify the Agent IP for a Server.


Note –

You can add the base management feature automatically as part of the load server or load group commands. See load server in Sun N1 System Manager 1.2 Command Line Reference Manual or load group in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Before You Begin
Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    Note –

    The SSH user account that is used in the following command must have root privileges on the remote machine:



    N1-ok> add server server feature basemanagement agentip agentip agentssh username/password
    

    An Add Base Management Support job is started.

    The necessary packages and scripts are added. See add server in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

  3. After successful completion of the Add Base Management Support job, type the following command:


    N1-ok> show server server
    

    The Base Management Supported field should appear with OK as the value.

Next Steps

To Add the OS Monitoring Feature

ProcedureTo Add the OS Monitoring Feature

This procedure describes how to add the OS monitoring feature on a server. You can add the OS monitoring feature to a server that already has the base management feature added. Alternatively, you can add the OS monitoring feature to a server with a newly loaded OS and the base management feature is added automatically. The OS monitoring feature is used for OS health monitoring and inventory management. See Chapter 5, Monitoring Your Servers for details.

The add server feature osmonitor command creates an Add OS Monitoring Support job. You can submit multiple, overlapping add server feature osmonitor commands and have them run in parallel. However, you should limit the number of overlapping Add OS Monitoring Support jobs to a maximum of 15.

If you submit add server feature commands by using a script, see Example 5–1 for an example.


Note –

You can add the OS monitoring feature automatically as part of the load server or load group commands. See load server in Sun N1 System Manager 1.2 Command Line Reference Manual or load group in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Before You Begin
Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. To add the OS monitoring feature, perform one of the following actions:

    • If you have not added the base management feature, type the following command:


      Note –

      The SSH user account that is used in the following command must have root privileges on the remote machine.



      N1-ok> add server server feature osmonitor agentip agentip agentssh username/password
      
    • If you have already added the base management feature, type the following command:


      Note –

      You cannot specify the agent IP or SSH credentials when adding OS monitoring support to a server that has base management support.



      N1-ok> add server server feature osmonitor
      

    An Add OS Monitoring Support job starts.

    See add server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

  3. Track the Add OS Monitoring Support job to completion.

    After the job completes successfully, the Servers table on the System Dashboard tab appears with values for OS Usage and OS Resource Health.

    Verify that the OS monitoring feature is supported by issuing the show server command. Output for the server appears with the OS Monitoring Supported value as OK one of the following sets of commands on the provisionable server.


    Note –

    It can take 5-7 minutes before all OS monitoring data is fully initialized. You may see that CPU idle is at 0.0%, which causes a Failed Critical status with OS usage. This should clear up within 5-7 minutes after adding or upgrading the OS monitoring feature.


    If no monitoring data is available for the server, see Resolving Command Failures Related to OS Monitoring.

    If the provisionable server's IP address changes, use the set server command again before enabling or disabling monitoring


Example 5–1 Scripting OS Monitoring Support

The following example script issues multiple add server feature commands on servers that do not have the base management feature support:


n1sh add server 10.0.0.10 feature=osmonitor agentip 10.0.0.110 agentssh root/admin &
n1sh add server 10.0.0.11 feature=osmonitor agentip 10.0.0.111 agentssh root/admin &
n1sh add server 10.0.0.12 feature=osmonitor agentip 10.0.0.112 agentssh root/admin &

Troubleshooting

Adding the OS monitoring feature might fail due to stale SSH entries on the management server. If the add server feature osmonitor agentip command fails and no true security breach has occurred, remove the known_hosts file or the specific entry in the file that corresponds to the provisionable server. Then, retry the add server feature osmonitor agentip command. If the management server is running Linux, the known_hosts file is at /root/.ssh/known_hosts. If the management server is running the Solaris OS, the known_hosts file is at /.ssh/known_hosts.

Adding the OS monitoring feature will fail if you specify the agent IP or the SSH credentials in the add server feature osmonitor command when running it on servers that already have the base management feature support. To solve this problem, issue the add server feature osmonitor command without specifying values for the agent IP or for the SSH credentials.

ProcedureTo Remove the OS Monitoring Feature

There are two levels of removing the OS monitoring feature with this command. If you don't specify the uninstall keyword, the OS monitoring feature remains installed on the provisionable server, but the feature is no longer supported and the server's OS can no longer be monitored with the N1 System Manager. If you specify the uninstall keyword, the OS monitoring feature is completely uninstalled from the provisionable server and consequently the OS monitoring feature is no longer supported.

Once removed in either case, the OS resource health state for the server becomes uninitialized.

After you remove a feature, provided you used the recommended procedure, you can always use the add server command to add it back again. The Base Management Supported and OS Monitoring Supported fields in the show server output provide the current status on a server's features.


Note –

Do not manually remove the OS monitoring feature by attempting to delete the agent. Doing so will make it impossible to reinstall or reutilize the OS monitoring feature. Instead, to remove the OS monitoring feature, use the remove server feature procedure as described.


Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Remove the OS monitoring feature.


    N1-ok> remove server server feature osmonitor [uninstall]
    

    The necessary packages and scripts are removed. See remove server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

ProcedureTo Remove the Base Management Feature

The OS monitoring feature must be removed before the base management feature can be removed. See To Remove the OS Monitoring Feature for details.

When you remove the base management feature, the feature is uninstalled from the provisionable server and it is no longer supported.

After you remove a feature, provided you used the recommended procedure, you can always use the add server command to add it back again. The Base Management Supported and OS Monitoring Supported fields in the show server output provide the current status on a server's features.


Note –

Do not manually remove the base management feature by attempting to delete the agent. Doing so will make it impossible to reinstall or reutilize the base management feature. Instead, to remove the base management feature, use the remove server feature procedure as described.


Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Remove the OS monitoring feature.


    N1-ok> remove server server feature basemanagement
    

    The necessary packages and scripts are removed. See remove server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

ProcedureTo Modify the Agent IP for a Server

This procedure describes how to modify the agent IP for a server. The agent IP is the IP address of the provisionable server's data network interface to be monitored by the management server. The agent IP is not the same as the server's management network IP address.

The following graphic shows the agent IP address for a server from the results table of a job, displayed in the Jobs tab. The graphic distinguishes the agent IP address for the server from the server's IP address.

The graphic highlights a job step in the Jobs tab and distinguishes
the agent IP address for the server from the server's IP address.
Note –

If you change the provisionable server's IP address and credentials or manually remove some services outside the N1 System Manager, the enabling of the services will not succeed. Arbitrary changes to the OS outside of the N1 System Manager require a rediscovery and subsequent addition of the base and OS management features.


When the load server or load group command is used to install software on the provisionable server, the provisionable server's networktype attribute could be set to dhcp. This setting means that the server uses DHCP to get its provisioning network IP address. If the system reboots and obtains a different IP address than the one that was used for the agentip parameter during the load command or add server commands, then the following features may not work:

In this case, use the set server server agentip command to correct the server's agent IP address as shown in this procedure.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Run the following command:


    N1-ok> set server server agentip IP
    

    The agent IP is modified. See set server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax. This operation touches the provisionable server.

ProcedureTo Modify the Secure Shell Credentials for the Management Features of a Server

This procedure describes how to modify the Secure Shell (SSH) credentials for the base management and OS monitoring features for a provisionable server. These management SSH credentials are required by or used in many N1 System Manager commands including add server, set server, load server, start server, load group, and start group. These credentials, specifically for the base management and OS monitoring features for a provisionable server and referred to by the examples in this chapter as agentssh credentials, are not the same as the SSH credentials required for the server's management network IP address.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details. You need to have an SSH login and password for this step. Default SSH login/password pairs are provided in Discovering Servers.

  2. Run the following command:


    Note –

    The SSH user account that is used in the following command must have root privileges on the remote machine.



    N1-ok> set server server agentip IP agentssh username/password
    

    The agentssh user name and password are modified. See set server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

ProcedureTo Modify the SNMP Credentials for the Management Features of a Server

This procedure describes how to modify the management feature SNMP credentials for a server. The management feature SNMP credentials allow the N1 System Manager to communicate with the Sun Management Center SNMP agent and are specifically for the base management and OS monitoring features for a provisionable server. These credentials, specifically for the base management and OS monitoring features for a provisionable server and referred to by the examples in this chapter as agentsnmp credentials, are not the same as the SNMP credentials required for the server's management network IP address.

See Introduction to Monitoring for more information about the SNMP agents for OS monitoring in the N1 System Manager.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Run the following command to specify the SNMP credentials on a server:


    N1-ok> set server server agentsnmp agentsnmp
    

    The SNMP credentials are modified. See set server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

    This set server operation does not actually touch the provisionable server. It just synchronizes the data on the management server itself.

ProcedureTo Modify the SNMPv3 Credentials for the Management Features of a Server

This procedure describes how to modify the management feature SNMPv3 credentials for a server. The management feature SNMPv3 credentials allow the N1 System Manager to communicate with the Sun Management Center SNMP agent and are specifically for the base management and OS monitoring features for a provisionable server. These credentials, specifically for the base management and OS monitoring features for a provisionable server and referred to by the examples in this chapter as agentsnmpv3 credentials, are not the same as the SNMP credentials required for the server's management network IP address.

See Introduction to Monitoring for more information about the SNMP agents for OS monitoring in the N1 System Manager.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Run the following command to specify the SNMP credentials on a server:


    N1-ok> set server server agentsnmpv3 agentsnmpv3
    

    The SNMP credentials are modified. See set server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

    This set server operation does not actually touch the provisionable server. It just synchronizes the data on the management server itself.

ProcedureTo Manually Uninstall the Linux OS Monitoring Feature

After successful completion of this procedure, the OS monitoring feature is unsupported for the provisionable server:

Steps
  1. Log in to the provisionable server as root.

  2. Type the following command:


    # /etc/rc.d/rc3.d/S99es_agent stop
    
  3. Issue the following command and follow the prompts.


    # /opt/SUNWsymon/sbin/es-uninst
    

    The agent is uninstalled.

  4. Manually remove the feature.


    # rpm -e n1sm-linux-agent
    

    The feature is removed.

  5. Remove directories related to the feature.


    # rm -rf /var/opt/SUNWsymon
    

    The directories are removed.

ProcedureTo Manually Uninstall the Solaris OS Monitoring Feature

After successful completion of this procedure, the OS monitoring feature will be unsupported for the provisionable server.

Steps
  1. Log in to the provisionable server as root.

  2. Stop the agent.


    # /etc/rc3.d/S81es_agent stop
    
  3. Run the uninstaller.


    # /var/tmp/solx86-agent-installer/disk1/x86/sbin/es-uninst -X
    
  4. Remove the packages.

    For the Solaris OS running on the SPARC architecture:


    # pkgrm SUNWn1smsparcag-1-2
    

    For the Solaris OS running on the x86 architecture:


    # pkgrm SUNWn1smx86ag-1-2
    
  5. Remove associated directories.


    # /bin/rm -rf /opt/SUNWsymon
    # /bin/rm -rf /var/opt/SUNWsymon
    

    The directories are removed.

ProcedureTo Upgrade the Base Management Feature on a Server

This procedure describes how to upgrade the base management feature on a server. This procedure is necessary after upgrading the N1 System Manager from a previous release, for provisionable servers on which the previous version of the base management feature is still installed. This procedure is for individual servers. You can upgrade the base management feature on multiple servers at once. See Chapter 2, Upgrading the Sun N1 System Manager Software and Provisionable Server Management Agents, in Sun N1 System Manager 1.2 Installation and Configuration Guide for details.


Note –

If the server was freshly installed using the load server or load group commands from the latest version of the N1 System Manager, and the feature subcommand was used, this procedure is not necessary.


Use the add server feature basemanagement command with the upgrade keyword to upgrade a provisionable server to a new version from the existing base management feature.

If you submit add server feature commands by using a script, see Example 5–1 for an example.

Before You Begin
Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. To upgrade the base management feature, type the following command:


    N1-ok> add server server feature basemanagement upgrade
    

    An Add Base Management Support job starts.

    See add server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

  3. Track the Add Base Management Support job to completion.

    After the job completes successfully, the show server command output for the server appears with the OS Monitoring Supported value as OK. In addition, the Base Management Supported column on the Server Details page is marked as Yes. See Enabling and Disabling Monitoring for a graphic that shows this.

Troubleshooting

Adding the base management feature might fail due to stale SSH entries on the management server. If the add server feature osmonitor agentip command fails and no true security breach has occurred, remove the known_hosts file or the specific entry in the file that corresponds to the provisionable server. Then, retry the add server feature osmonitor agentip command. If the management server is running Linux, the known_hosts file is at /root/.ssh/known_hosts. If the management server is running the Solaris OS, the known_hosts file is at /.ssh/known_hosts.

ProcedureTo Upgrade the OS Monitoring Feature on a Server

This procedure describes how to upgrade the OS monitoring feature on a server. This procedure is necessary after upgrading the N1 System Manager from a previous release, for provisionable servers on which the previous version of the OS monitoring feature is still installed. This procedure is for individual servers. You can upgrade the OS monitoring feature on multiple servers at once. See Chapter 2, Upgrading the Sun N1 System Manager Software and Provisionable Server Management Agents, in Sun N1 System Manager 1.2 Installation and Configuration Guide for details.


Note –

If the server was freshly installed using the load server or load group commands from the latest version of the N1 System Manager, and the feature subcommand was used, this procedure is not necessary.


Use the add server feature osmonitor command with the upgrade keyword to upgrade a provisionable server to a new version from the existing base management feature and OS monitoring feature.

If you submit add server feature commands by using a script, see Example 5–1 for an example.

Before You Begin
Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. To upgrade the OS monitoring feature, type the following command:


    N1-ok> add server server feature osmonitor upgrade
    

    An Modify OS Monitoring Support job starts. Note that this command also upgrades the base management feature.

    See add server in Sun N1 System Manager 1.2 Command Line Reference Manual for details about command syntax.

  3. Track the Add OS Monitoring Support job to completion.

    After the job completes successfully, the Servers table on the System Dashboard tab appears with values for OS Usage and OS Resource Health.

    Verify that the OS monitoring feature is supported by issuing the show server command. Output for the server appears with the OS Monitoring Supported value as OK one of the following sets of commands on the provisionable server.


    Note –

    It can take 5-7 minutes before all OS monitoring data is fully initialized. You may see that CPU idle is at 0.0%, which causes a Failed Critical status with OS usage. This should clear up within 5-7 minutes after adding or upgrading the OS monitoring feature.


Troubleshooting

Upgrading the OS monitoring feature might fail due to stale SSH entries on the management server. If the add server feature osmonitor agentip command fails and no true security breach has occurred, remove the known_hosts file or the specific entry in the file that corresponds to the provisionable server. Then, retry the add server feature osmonitor agentip command. If the management server is running Linux, the known_hosts file is at /root/.ssh/known_hosts. If the management server is running the Solaris OS, the known_hosts file is at /.ssh/known_hosts.

Upgrading the OS monitoring feature will fail if you specify the agent IP or the SSH credentials in the add server feature osmonitor upgrade command when running it on servers that already have the base management feature support. To solve this problem, issue the add server feature osmonitor command without specifying values for the agent IP or for the SSH credentials.

Enabling and Disabling Monitoring

Monitored file system and OS health data for a provisionable server is not available unless an operating system is deployed on the provisionable server, and the OS monitoring feature has been installed.

Once the OS monitoring feature is installed on a server, monitoring is enabled by default. For information on installing the OS monitoring feature on a server, see Supporting Monitoring.

Use the set server monitored command to enable or disable monitoring. See Enabling and Disabling Monitoring. If the OS monitoring feature is not installed on a server or on every server in a group, using the set server monitored command enables only hardware monitoring for the server or group of servers.

The following graphic shows a section of the Server Details page. The server is powered on, an OS has been installed and the base management and OS monitoring features are supported. Monitoring is enabled for the server.

The graphic shows a section of the Server Details page. Monitoring
shown as enabled; base management and OS monitoring features are highlighted.

Disabling monitoring by use of the set server monitored command does not remove the monitoring support provided by the OS monitoring feature, which remains installed on the server. However, disabling monitoring by the set server monitored command disables both hardware health and OS health monitoring.

ProcedureTo Monitor a Server or a Server Group

The following procedure describes how to use the command line to enable the monitoring of hardware health and operating system health of a server or a server group. Hardware health and OS health monitoring are both enabled with this command, provided that the OS monitoring feature has been installed on the server or the server group. If the OS monitoring feature has not been installed on the server or server group, then only hardware health monitoring is enabled.

Before You Begin

To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Supporting Monitoring.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Set the monitored attribute to true by using the set server command.


    N1-ok> set server server monitored true
    

    In this procedure, server is the name of the provisionable server that you want to monitor.

    • For a server group, set the monitored attribute to true by using the set group command.


      N1-ok> set group group monitored true
      

      This command is executed for the group of servers that you have already named. See set group in Sun N1 System Manager 1.2 Command Line Reference Manual for details. In this procedure, group is the name of the group of provisionable servers that you want to monitor.

  3. View the server details.


    N1-ok> show server server
    
    • For a server group, view the server group details to determine if monitoring is enabled for each server in the group.


      N1-ok> show group group
      

    Detailed monitoring information appears in the output. Information is displayed about hardware health, OS health and network reachability. OS health monitoring threshold values are also displayed.Monitoring threshold values are explained in Monitoring Threshold Values.

ProcedureTo Disable Monitoring for a Server or a Server Group

The following procedure describes how to use the command line to disable the monitoring of hardware health and operating system health of a server or a server group. Hardware health and OS health monitoring are both disabled with this command, provided that the OS monitoring feature has been added.

You might want to disable monitoring of a hardware component to perform maintenance tasks without generating events.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Set the monitored attribute to false by using the set server command.


    N1-ok> set server server monitored false
    

    In this example, server is the name of the provisionable server that you want to stop monitoring. Executing this command disables monitoring of the server. With monitoring of a server disabled, the violation of threshold values by attributes related to that server does not generate events.

    • For a server group, set the monitored attribute to false by using the set group command.


      N1-ok> set group group monitored false
      

      This command is executed for the group of servers that you have already named. See set group in Sun N1 System Manager 1.2 Command Line Reference Manual for details. In this procedure, group is the name of the group of provisionable servers for which you want to disable monitoring.

  3. View the server details.


    N1-ok> show server server
    

    The output shows that monitoring is disabled.

    If you are not interested in the values of some OS health attributes, you can disable the threshold severity for the monitoring of those attributes, while continuing to monitor other OS health attributes. This action prevents annoyance alarms. Example 5–6 shows how to accomplish this task. For general information about threshold values, see Monitoring Threshold Values. You can also remove the OS health monitoring feature. See To Remove the OS Monitoring Feature.

    • For a server group, view the server group details to determine if monitoring is disabled for each server in the group.


      N1-ok> show group group
      

Default States of Monitoring

The default status of monitoring in the Sun N1 System Manager for discovered servers and initialized operating systems is as follows:

Default status of hardware monitoring

When a server or other hardware is discovered, monitoring of the server or other hardware is enabled by default. Before a server can be monitored, however, it must be discovered and correctly registered with the N1 System Manager. This process is described in Discovering Servers. The monitoring of hardware sensors is enabled by default for all managed servers. If a server is deleted and then rediscovered, all states related to that server for the purposes of monitoring are lost, regardless of whether monitoring was enabled or disabled for that server when the server was deleted. When the server is rediscovered, monitoring is set to true by default. For more information about discovering servers, see To Discover New Servers.

Default status of OS health monitoring

Disabled by default. When an OS has been successfully provisioned on a provisionable server and the N1 System Manager management features are supported by using the add server feature command with the agentip specified, OS health monitoring is enabled. The OS provisioning can be performed either through the N1 System Manager or by an external OS installation.

If you are not interested in the values of some OS health attributes, you can disable the threshold severity for the monitoring of those attributes, while continuing to monitor other OS health attributes. This action prevents annoyance alarms. Example 5–6 shows how to accomplish this task. For general information about threshold values, see Monitoring Threshold Values.

Default status of network reachability monitoring

When the management interface of the provisionable server is discovered, monitoring of the interface is enabled by default. When the management features are added, monitoring of other interfaces is enabled by default.

Monitoring Threshold Values

The value of any given monitored OS health attribute is compared to a threshold value. Low and high threshold values are defined and can be configured.

Attribute data is compared against thresholds at regular intervals.

When a monitored attribute's value is beyond the default or user-defined threshold safe range, an event is generated and a status is issued. If the value of the attribute is lower than the low threshold or higher than the high threshold, then depending on the severity of the threshold, an event is generated to show a status of nonrecoverable, critical, or warning. Otherwise, the status of the OS health monitored attribute is OK, provided that a value can be obtained.

If no value can be obtained, an event is generated to show that the status of the monitored attribute is unknown. The health of an OS resource can be shown as unknown if the server is reachable but the agent for the monitoring feature cannot be contacted on SNMP port 161. For more information, see Understanding the Differences Between Unreachable and Unknown States for Provisionable Servers.

The values nonrecoverable, critical, and warning are discussed in show server in Sun N1 System Manager 1.2 Command Line Reference Manual.

What Happens When a Threshold Is Broken

If the value of an OS health monitored attribute rises above the warninghigh threshold, a status of warninghigh is issued. If the value continues to rise and passes the criticalhigh threshold, a status of Failed Critical is issued. If the value continues to rise above the nonrecoverablehigh threshold, a status of nonrecoverablehigh is issued.

If the value then falls back to the safe range, no further events are generated until the value falls below the Failed Warning threshold, at which point an event is generated to show a status of normal.

If the value of a monitored attribute falls below the warninglow threshold, a status of Failed Warning is issued. If the value continues to fall, and passes the criticallow threshold, a status of Failed Critical is issued. If the value continues to fall below the nonrecoverablelow threshold, a status of nonrecoverablelow is issued.

If the value then rises back to the safe range, no further events are generated until the value rises above the warninglow threshold, at which point an event is generated to show a status of normal.

Threshold values for OS health attributes can be configured at the command line. This process is explained in Setting Threshold Values. For threshold values measuring percentages, the valid range is from 0 to 100%. If you try to set a threshold value outside of this range, an error is generated. For attributes that do not measure percentages, these values depend on the number of processors in your system and on the usage characteristics of your installation.

Tuning Threshold Values for Your Installation

After a period of usage, you can develop an awareness of what levels to set for OS health attribute values. You can adjust thresholds once you determine more closely what value indicates a genuine justification for an event to be generated and for an event notification to be sent to your pager or email address. For example, you might want to receive event notifications every time a certain attribute reaches a warninghigh severity threshold level. For more information, see Setting Up Event Notifications.

For important or crucial attributes at your installation, you can set the warninghigh threshold level to a low percentage value so that you are notified about a rising value as early as possible.

ProcedureTo Retrieve Threshold Values for a Server

Before You Begin

To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding and Upgrading Base Management and OS Monitoring Features.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the show server command:


    N1-ok> show server server
    

    In this procedure, server is the name of the provisionable server for which you want to retrieve threshold values.

    Detailed monitoring threshold values appear in the output, including threshold information for the server's hardware health, OS health, and network reachability. Default values are shown if no specific values have been set.

    See show server in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

    • Threshold information is also available from the Server Details page in the browser interface. This is shown in the following graphic.

      The graphic shows that OS monitoring information can be displayed
on the Server Details page, with threshold status information.

Managing Default Threshold Values

Factory-configured default threshold values are provided in the N1 System Manager software for some OS health thresholds. These values are stated as percentages. Table 5–1 lists default values for these OS health attributes for a Sun Fire V20z server.


Note –

Setting or modifying threshold values for hardware health attributes is not supported in this version of the Sun N1 System Manager.


Table 5–1 Sun Fire V20zFactory-Configured Default Threshold Values for OS Health Attributes

Attribute Name 

Description 

Default Threshold 

Default Threshold 

cpustats.loadavg1min

System load expressed as average number of queued processes over 1 minute 

warninghigh >4.00

criticalhigh >5.00

cpustats.loadavg5min

System load expressed as average number of queued processes over 5 minutes 

warninghigh >4.10

criticalhigh >5.10

cpustats.loadavg15min

System load expressed as average number of queued processes over 15 minutes 

warninghigh >4.10

criticalhigh >5.10

cpustats.pctusage

Percentage of overall CPU usage 

warninghigh >80%

criticalhigh >90.1%

cpustats.pctidle

Percentage of CPU idle 

warninglow <20%

criticallow <10%

memusage.mbmemfree

Memory free in MB 

warninghigh <39%

criticalhigh <29%

memusage.mbmemused

Memory used in MB 

warninghigh >1501

criticalhigh >2001

memusage.pctmemused

Percentage of memory in use 

warninghigh >80%

criticalhigh >90%

memusage.pctmemfree

Percentage of memory free 

warninglow <20%

criticallow <10%

memusage.kbswapused

Swap space in use in Kb 

warninghigh >500000

criticalhigh >1000000

fsusage.kbspacefree

File system free space in Kb 

warninglow <94.0Kb

criticallow <89.0Kb

Specific threshold values can be set at the command line by following the procedures described in Setting Threshold Values.

Table 5–2 provides the complete list of OS health attributes.

Table 5–2 All OS Health Attributes

Attribute Name 

Description 

Supported Threshold 

Supported Threshold 

cpustats.loadavg1min

System load expressed as average number of queued processes over 1 minute 

warninghigh

criticalhigh

cpustats.loadavg5min

System load expressed as average number of queued processes over 5 minutes 

warninghigh

criticalhigh

cpustats.loadavg15min

System load expressed as average number of queued processes over 15 minutes 

warninghigh

criticalhigh

cpustats.pctusage

Percentage of overall CPU usage 

warninghigh

criticalhigh

cpustats.pctidle

Percentage of CPU idle 

warninglow

criticallow

memusage.pctmemused

Percentage of memory in use 

warninghigh

criticalhigh

memusage.pctmemfree

Percentage of memory free 

warninglow

criticallow

memusage.mbmemused

Memory in use in MB 

warninghigh

criticalhigh

memusage.mbmemfree

Memory free in MB 

warninglow

criticallow

memusage.kbswapused

Swap space in use in Kb 

warninghigh

criticalhigh

memusage.mbswapfree

Free swap space in MB 

warninglow

criticallow

memusage.pctswapfree

Percentage of free swap space 

warninglow

criticallow

fsusage.pctused

Percentage of file system space in use 

warninghigh

criticalhigh

fsusage.kbspacefree

File system free space in Kb 

warninghigh

criticalhigh

Setting Threshold Values

Threshold values for OS health attributes can be set on specific servers. If you set specific threshold values at the command line for OS health attributes, that any factory-configured threshold values for the attributes.

ProcedureTo Set Threshold Values for a Server

Before You Begin

To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding and Upgrading Base Management and OS Monitoring Features.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Use the set server command with the threshold attribute.

    The syntax requires the threshold keyword to be followed by the attribute for which you are setting a threshold. The attribute is an OS health attribute. OS health attributes are described in OS Health Monitoring and listed in Table 5–2.

    The threshold is either criticallow, warninglow, warninghigh, or criticalhigh. The value is a numeric figure and usually represents a percentage.

    This set server operation does not actually touch the provisionable server. It just synchronizes the data on the management server itself.

    • To set one threshold value, type the following:


      N1-ok> set server server threshold attribute threshold value
      
    • To set multiple threshold values for the server, type the following:


      N1-ok> set server server threshold attribute threshold value threshold value
      
    • For a server group, use the set group command with the threshold attribute. To modify one threshold for the server group:


      N1-ok> set group group threshold attribute threshold value
      
    • To modify multiple thresholds for the server group:


      N1-ok> set group group threshold attribute threshold value threshold value
      

Example 5–2 Setting Multiple Threshold Values for CPU Percentage Usage on a Server

This example shows how to set the CPU usage warninghigh severity threshold on a provisionable server named serv1 to 53 percent. This example also shows how to set the criticalhigh severity threshold value to 75 percent.


N1-ok> set server serv1 threshold cpustats.pctusage warninghigh 53 criticalhigh 75


Example 5–3 Setting Multiple Threshold Values for File System Percentage Usage On a Server

This example sets the file system percentage usage warninghigh threshold on a provisionable server named serv1 to 75 percent. This example also sets the criticalhigh threshold value to 87 percent. This example sets the threshold for every file system on the server.


N1-ok> set server serv1 threshold fsusage.pctused warninghigh 75 criticalhigh 87

You can also specify the file system for which you want to set multiple threshold values. To set the warninghigh threshold to 75 percent and the criticalhigh threshold value to 87 percent, for the /usr file system on the same server, use the filesystem attribute:


N1-ok> set server serv1 filesystem /usr threshold fsusage.pctused 
warninghigh 75 criticalhigh 87


Example 5–4 Setting a Threshold Value for File System Free Space On a Server

This example sets the warninghigh threshold for file system free space for the /var file system on a provisionable server named serv1 to 150 Kbytes of free space.


N1-ok> set server serv1 filesystem /var threshold fsusage.kbspacefree warninghigh 150


Example 5–5 Setting a Threshold Value for Percentage of Free Memory On a Server

This example sets the criticalhigh threshold for the percentage of free memory on a provisionable server named serv1 to 5%.


N1-ok> set server serv1 threshold memusage.pctmemused criticalhigh 5


Example 5–6 Deleting a Threshold Value for File System Percentage Usage on a Server

This example shows how to delete a value that was set for the warninghigh threshold on a provisionable server named serv1.


N1-ok> set server serv1 threshold fsusage warninghigh none

In this case, any previously set value for this threshold at this severity is deleted. In effect, monitoring is disabled for the warninghigh threshold for file system usage for this server.



Example 5–7 Setting Multiple Threshold Values for File System Usage on a Server Group

This example shows how to set the file system usage warninghigh threshold to 75 percent on a group of provisionable servers with a group name of grp3. This example also shows how to set the criticalhigh threshold severity value to 87 percent.


N1-ok> set group grp3 threshold fsusage.pctused warninghigh 75 criticalhigh 87

Monitoring MIBs

Two MIBS are provided with the N1 System Manager. These MIBs provide the data structure that third-party monitoring tools can use to retrieve the data from the N1 System Manager using SNMP, and provide the data structure that third party monitoring tools can use to parse the SNMP notifications generated by the N1 System Manager. The MIBs can be found at /opt/sun/n1gc/etc/. These MIBs therefore enable you to use any SNMP client to query the N1 System Manager, and to listen for events using SNMP. The following MIBs are provided:

SUN-N1SM-INFO-MIB

This MIB describes the information that you can retrieve from the N1 System Manager by querying it using an SNMP client.

SUN-N1SM-TRAP-MIB

This MIB describes all of the events related to the N1 System Manager about which you can receive SNMP traps.

These MIBs are read-only. Using them requires a detailed knowledge of SNMP, although detailed descriptions of each object are provided in the MIBs. How you configure your monitoring system to start receiving traps depends on the nature of your monitoring system.

The MIBs are hardware independent.


Example 5–8 Receiving SNMP Traps

This example shows you how to use the simple UNIX trap listener, the snmptrapd command, to start receiving N1 System Manager traps.


# snmptrapd -m all -M /opt/sun/n1gc/etc:/usr/share/snmp/mibs -P

This example uses the snmptrapd command to start monitoring on default port 162 for SNMP traps. It also instructs the command to use the MIBs stored at /opt/sun/n1gc/etc and /usr/share/snmp/mibs to parse the contents of SNMP traps.


Managing Jobs

This section describes jobs and their integral role in of server monitoring.

Each major action you take in the N1 System Manager starts a job. Use the job log to track the status on a currently running action or to verify that a job has finished. Monitoring jobs is useful particularly because some N1 System Manager actions can take a long time to finish. An example of such an action is installing an OS distribution on one or more provisionable servers.

You can track jobs through the Jobs tab in the browser interface or the show job command. The show job command provides information about most of the following characteristics:

Job ID

Generated unique identifier.

Date

Date on which the job was started.

Job Type

Type of job. See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details. When using the show job command with the type parameter, jobs can be any of the following types:

  • addbase Add base management support.

  • addosmonitor Add OS monitoring support.

  • createos Create OS distribution from CD/DVD media or ISO files.

  • deletejob Delete job.

  • discover Server discovery.

  • loadfirmware Load firmware update.

  • loados Load OS.

  • loadupdate Load OS update.

  • refresh Server refresh.

  • reset Server reboot.

  • removeosmonitor Remove OS monitoring support.

  • setagentip Modify management feature configuration. Related to the base management and OS monitoring features.

  • start Server power on.

  • startcommand Remote command execution.

  • stop Server power off.

  • unloadupdate Unload OS update.

State

State of the current job step. Job steps indicate the progress of a job and update results. Each job step has a type, a start time and, when the job completes, a completion time. For the purposes of filtering, job progress is indicated with the following states:

notstarted

Jobs in a notstarted state cannot be stopped.

preflight

When you select a job by ID and view the details of that job, each step of that job can appear twice:the preflight check and the execution of the step itself.

running

The job is currently running. Jobs that are currently running cannot be deleted using the delete job command. Jobs that are currently running must finish running or be stopped using the stop job command.

Job completion is indicated with the following results:

completed

Indicates that the job step completed successfully.

warning

Indicates a warning during the job execution. A warning can be an issue reported that might be severe enough to terminate the job step, and the job, with errors.

stopped

Indicates that the job step stopped before it completed.

pendingstop

Indicates that the job is still running but that the job step cannot complete successfully.

error

Indicates a general error in that job step.

timed_out

Indicates that the job timed out before all of the job steps could complete successfully, or that the next step of the job started before the current step completed successfully.

Complete - Warning is issued in the output for an overall job status, if the job successfully completed all of its steps one or more WARNING states were issued for steps during the job execution and these warnings were not severe enough to terminate the job with errors.

You can filter jobs depending on their state. See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

Command

The command that was used to start the job.

Owner

The user who started the job. Also called the job creator.

Job Results

Provides details about the results of a completed job. You can review the standard output of remote command operations and completion statuses for all other job types.

ProcedureTo List Jobs

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. View the list of jobs.


    N1-ok> show job all
    

    A list of all jobs for the N1 System Manager is returned.

    See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–9 Listing All Jobs

This example shows that using the show job command with the all option returns a list of jobs by Job ID, together with the date and time at which the job was started. The job type and status are also returned, along with the identity of the user who created the job.


N1-ok> show job all
Job ID          Date                       Type                  Status        Owner
7               2005-09-16T10:51:07-0700   Discovery             Completed      root
6               2005-09-14T14:42:52-0700   Server Reboot         Error          root
5               2005-09-14T14:38:25-0700   Server Power On       Completed      root
4               2005-09-14T14:29:20-0700   Server Power Off      Completed      root
3               2005-09-09T13:01:35-0700   Discovery             Completed      root
2               2005-09-09T12:38:16-0700   Discovery             Completed      root
1               2005-09-09T10:32:40-0700   Discovery             Completed      root

ProcedureTo View a Specific Job

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. View a specific job.


    N1-ok> show job job
    

    Detailed information about the job appears in the output.

    See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–10 Viewing Job Details

This example shows that using the show job command with the Job ID returns the date and time at which the job was started, the job type and status, and the identity of the user who created the job. The job in this example is to load an OS profile on a server named 192.168.200.4 using the load server command. Further details are provided for each step of that job, including the time at which the step started and completed and whether the step was successful.


N1-ok> show job 21
Job ID:   21
Date:     2005-10-27T10:09:18-0600
Type:     Load OS
Status:   Completed (2005-10-27T10:37:23-0600)
Command:  load server 192.168.200.4 osprofile SLES9RC5 
bootip=192.168.200.30 networktype=static ip=192.168.200.31
Owner:    root
Errors:   0
Warnings: 0

Steps
ID     Type             Start                      Completion                 Result   
1      Acquire Host     2005-10-27T10:09:19-0600   2005-10-27T10:09:19-0600   Completed
2      Execute Java     2005-10-27T10:09:19-0600   2005-10-27T10:09:19-0600   Completed
3      Acquire Host     2005-10-27T10:09:21-0600   2005-10-27T10:09:21-0600   Completed
4      Execute Java     2005-10-27T10:09:21-0600   2005-10-27T10:37:22-0600   Completed

Results
Result 1: 
Server:   192.168.200.4
Status:   0
Message:  OS deployment using OS Profile SLES9RC5 was successful.
IP address 192.168.200.30 was assigned.

ProcedureTo Stop a Job

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Stop a specific job.


    N1-ok> stop job job
    

    The job is stopped.

    See stop job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

  3. View the job details.


    N1-ok> show job job
    

    The Result section of the output shows that the job was stopped.

    Any job can be stopped. In practice, however, only a job that is not in its last step can be stopped. Some jobs only have one step and so can never be stopped. Jobs in a notstarted state cannot be stopped. Operations that are performed on large groups of servers can take longer and might include a large number of steps.

    See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–11 Stopping a Job

This example shows that using the stop job command with the Job ID returns a message confirmed that the request has been received.


N1-ok> stop job 32

Stop Job "32" request received.

This example also shows that the show job command can be used with the Job ID of the job that was stopped to gain more data about the job that was stopped. The command returns the confirmation, in Status, that the job was stopped, and the command that was used to create the job. Further details are provided for each step of that job, including the time at which the step started and completed and whether the step was successful. The Result section shows that the job was stopped.


N1-ok> show job 32
Job ID:   32
Date:     2005-11-02T08:08:37-0700
Type:     Server Refresh
Status:   Stopped (2005-11-02T08:08:48-0700)
Command:  set server 192.168.200.2 refresh
Owner:    root
Errors:   0
Warnings: 0

Steps
ID   Type           Start                      Completion                 Result   
1    Acquire Host   2005-11-02T08:08:38-0700   2005-11-02T08:08:38-0700   Completed
2    Run Command    2005-11-02T08:08:38-0700   2005-11-02T08:08:38-0700   Completed
3    Acquire Host   2005-11-02T08:08:40-0700   2005-11-02T08:08:40-0700   Completed
4    Run Command    2005-11-02T08:08:40-0700   2005-11-02T08:08:47-0700   Stopped

See Also

To Issue Remote Commands on a Server or a Server Group

ProcedureTo Delete a Job

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Determine the job you want to delete.


    N1-ok> show job all
    

    All jobs and job IDs appear in the output.

    See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

  3. Delete the desired job.


    N1-ok> delete job job
    

    The job is deleted.

    See delete job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

  4. Verify that the job was deleted.


    N1-ok> show job all
    

    The deleted job should not appear in the output.

    See show job in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–12 Deleting a Job

This example shows how to delete a job.

First, the show job command is used with the all option, which lists all jobs in descending order.


N1-ok> show job all
Job ID     Date                       Type                Status        Creator
7          2005-02-16T10:51:07-0700   Discovery           Completed     root
6          2005-02-14T14:42:52-0700   Server Reboot       Error         root
5          2005-02-14T14:38:25-0700   Server Power On     Completed     root
4          2005-02-14T14:29:20-0700   Server Power Off    Completed     root
3          2005-02-09T13:01:35-0700   Discovery           Completed     root
2          2005-02-09T12:38:16-0700   Discovery           Completed     root
1          2005-02-09T10:32:40-0700   Discovery           Completed     root

Job ID 6 has an error and can be deleted. The delete job command is now used with the Job ID of the job to be deleted.


N1-ok> delete job 6

The show job command is used again with the all option, which lists all jobs in descending order. The deleted job no longer appears on the list.


N1-ok> show job all
Job ID     Date                       Type               Status        Creator
7          2005-02-16T10:51:07-0700   Discovery          Completed     root
5          2005-02-14T14:38:25-0700   Server Power On    Completed     root
4          2005-02-14T14:29:20-0700   Server Power Off   Completed     root
3          2005-02-09T13:01:35-0700   Discovery          Completed     root
2          2005-02-09T12:38:16-0700   Discovery          Completed     root
1          2005-02-09T10:32:40-0700   Discovery          Completed     root


Example 5–13 Deleting All Jobs

This example shows how to delete all jobs.

First, the show job command is used with the all option, which lists all jobs in descending order.


N1-ok> show job all
Job ID     Date                       Type               Status        Creator
7          2005-09-16T10:51:07-0700   Discovery          Completed     root
6          2005-09-14T14:42:52-0700   Server Reboot      Error         root
5          2005-09-14T14:38:25-0700   Server Power On    Completed     root
4          2005-09-14T14:29:20-0700   Server Power Off   Completed     root
3          2005-09-09T13:01:35-0700   Discovery          Running       root
2          2005-09-09T12:38:16-0700   Discovery          Completed     root
1          2005-09-09T10:32:40-0700   Discovery          Completed     root

The delete job command is now used with the all option, to delete all jobs.


N1-ok> delete job all

Unable to delete job "3"

The show job command is used with the all option, to confirm whether all jobs were successfully deleted.


N1-ok> show job all
Job ID     Date                       Type             Status     Creator
3          2005-09-09T13:01:35-0700   Discovery        Running    root

Job ID 3 is still running. This is because jobs that were in a running state when the delete job command was issued must finish running, or must be stopped, before they can be deleted.

To stop the job and then delete it, first the stop job command is used with the ID of the job to be stopped.


N1-ok> stop job 3

Stop Job "3" request received.

The show job command is used to confirm that the job has been stopped.


N1-ok> show job all
Job ID     Date                       Type             Status        Creator
3          2005-09-09T13:02:35-0700   Discovery        Aborted       root

The job has been stopped while running and is in the aborted state. The delete job command is now used with the all option, to delete all jobs.


N1-ok> delete job all

The show job command is used to confirm that all jobs have now been deleted.


N1-ok> show job all
Job ID     Date                      Type              Status        Creator

Job Queueing

Each type of job in the N1 System Manager has a weight associated with it. The weight is a reflection of the load created by the job on the system resources. There is also a global limit on how much total load can be placed on the system. The following table provides a listing of the weight for each type of (user level) job. The maximum load permitted is 5000.

Table 5–3 Job Weight Values

Task 

Weight 

OS Deployment 

500 

Package Deployment 

500 

Package Uninstall 

500 

Discovery 

200 

Firmware Deployment 

500 

Remote Command Execution 

200 

Job Deletion 

400 

Create OS 

1000 

Reset Server 

200 

Server Power Off 

200 

Server Power On 

200 

Server Refresh 

200 

Set Server Feature 

200 

Remove Server 

100 

Add Server 

100 

   

The total load is the sum of the loads of all the current running jobs. The system will compare the current total load with the maximum permitted load at the following points in time:

If the difference between the current total load and the maximum permitted load is great enough to accommodate the job at the head of the job queue, then that job is promoted to a running state. Otherwise, it is left in the queued state. The current total load governs the permissible concurrent running job mix within the system.

For example, only two OS Deployment jobs can be running at one time:

500 + 500 = 1000

Or only one OS Deployment job and two Server Power Off jobs can be running at one time:

500 + 200 + 200 < 1000

Managing Event Log Entries

This section describes events and their integral role in to monitoring your servers.

Events are generated when certain conditions related to attributes occur. Each event has an associated topic. For example, when a server is discovered by the management server, an event is generated with the topic Action.Physical.Discovered. For a complete list of event topics, see create notification in Sun N1 System Manager 1.2 Command Line Reference Manual.

Events can be monitored. Monitoring is connected with the broadcasting of events for each monitored server or group of servers. When a monitored attribute's value is beyond the default or user-defined threshold safe range, an event is generated and a status is issued.

See Introduction to Monitoring for more information about monitoring.

See Setting Up Event Notifications for more information about event notifications.

Lifecycle events continue to be generated even with monitoring disabled. Lifecycle events include server discovery, server change or deletion, or server group creation. If you have requested notification of this type of event, you can still receive notifications even with monitoring disabled.

Event logs are created when events occur. For example, if any monitored IP address is unreachable, an event is generated. This event creates an event log record, which is visible from the browser interface.


Note –

Machines based on the Advanced Lights Out Manager (ALOM) standard use email to send event notifications to the management server. This must be configured as shown in Configuring the Management Server Mail Service and Account in Sun N1 System Manager 1.2 Site Preparation Guide. Troubleshooting information is provided in Fixing Notifications From ALOM-based Servers.


Event Log Overview

During the installation and configuration of the N1 System Manager, you can configure which events to log and you can also interactively configure severity levels for event topics. See Configuring the N1 System Manager System in Sun N1 System Manager 1.2 Installation and Configuration Guide.

Even if a log is not saved, it can still generate an event notification.

Use the show log command to view the following information about events:

The n1smconfig script can be used to change the number of days for which event logs are kept. Reducing the number of days for which event logs are stored reduces the average size of the event log files. This task ensures that the event log file size does not impair performance. The n1smconfig script is stored at /usr/bin for both the Linux and Solaris OS platforms. This script can be used to set the number of days for which event logs are held. To configure event logging, specify an event category and a resource category. The following event categories are defined:

Use the all event category to indicate that all events are to be logged. To understand how other event categories relate to actual events, see the event notification topics at create notification in Sun N1 System Manager 1.2 Command Line Reference Manual. General log files are saved to the syslog file at /var/adm/messages or /var/log/messages

ProcedureTo View the Event Log

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> show log [count count]

    The Events log appears with events listed most recent first. The value for the count attribute is the number of events to show in the output. The default value for count is 500. See show log in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

See Also

Event Log Overview

ProcedureTo Filter the Event Log

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> show log [after after] [before before] [count count] [severity severity]

    The output shows only the events that match the specified criteria. The before or after variable values must be formatted appropriately, for example, 2005-07-20T11:53:04. The possible values for severity are as follows:

    • unknown

    • other

    • information

    • warning

    • minor

    • major

    • critical

    • fatal

    See show log in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

ProcedureTo View Event Details

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> show log log
    

    The details of the event appear in the output. The log variable is the log ID. See show log in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–14 Viewing Event Details


N1-ok> show log 72
ID:       72
Date:     2005-03-15T13:35:59-0700
Subject:  RemoteCmdPlan
Topic:    Action.Logical.JobStarted
Severity: Information
Level:    FINE
Source:   Job Service
Role:     root
Message:  RemoteCmdPlan job initiated by root: job ID = 15. 

Setting Up Event Notifications

The N1 System Manager provides the ability to set up email or SNMP event notifications when events occur, either within the N1 System Manager itself or when specific events occur on provisionable servers. You can set up customized event notification rules for as many different scenarios as you need. Setting up default notifications for events can be done using the n1smconfig utility at install time. See Configuring the N1 System Manager System in Sun N1 System Manager 1.2 Installation and Configuration Guide for more information about installing and configuring the N1 System Manager.

You can create additional event notifications at the command line. Use the create notification command to create event notification rules based on events that occur or might occur about which you are interested. Use a topic to create an event notification.

For setting up event notifications using SNMP traps, use the SNMP MIB located at /opt/sun/n1gc/etc/SUN-N1SM-TRAP-MIB.mib. For more information about SNMP MIBs, see Monitoring MIBs.

A notification rule can be used to send a notification of each type of event to a selected destination, using either email or SNMP as the communication medium. For example, you can create a notification rule so that each time a new provisionable server is discovered by the management server, you receive a message on your pager to indicate that the event has happened:


create notification notification destination destination topic topic 
type type [description description]

See create notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details of the terms used in this command syntax.

You can configure your SMTP server to use event notification, during the installation and configuration of the N1 System Manager. See Configuring the N1 System Manager System in Sun N1 System Manager 1.2 Installation and Configuration Guide.

Viewing and Modifying Event Notifications

Use the show notification and set notification commands to view and modify event notification details. Type help show notification or help set notification at the N1–ok command line for syntax and parameter details.

ProcedureTo View Event Notifications

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> show notification all
    

    The event notifications for which you have read privileges appear in the output. See show notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

ProcedureTo View Event Notification Details

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> show notification notification
    

    The specified event notification details appear in the output. See show notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–15 Viewing Event Notification Details

This example shows how to use the show notification command to display the details about a notification.


N1-ok> show notification notif33
Name:          notif33
Event Topic:   EReport.Physical.ThresholdExceeded
Notifier Type: Email
Destination:   nobody@sun.com
State:         enabled

ProcedureTo Modify an Event Notification

This procedure describes how to change the name, description, or destination of an event notification.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> set notification notification name name description description
     destination destination
    

    The specified event notification attributes are set to the new values specified. See set notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–16 Modifying an Event Notification Name

This example shows how to use the set notification command with the name option to change a notification name from notif22 to notif23.


N1-ok> set notification notif22 name notif23

Creating, Testing, and Deleting Event Notifications

Use the create notification or delete notification commands to create and delete event notifications.

Use the start notification command with the test keyword to test an even notification.

Type help create notification or help delete notification at the N1–ok command line for syntax and parameter details.

ProcedureTo Create and Test an Event Notification

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> create notification notification topic topic
    type type destination destination
    

    The event notification is created and enabled. See create notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details and valid topics.

  3. Type the following command:


    N1-ok> start notification notification test
    

    A test notification message is sent. See start notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.


Example 5–17 Creating an Email Notification

This example shows how to create an event notification to be sent by email if a server group is created. Note that an SMTP email server must first be configured using the n1smconfig utility as described in Configuring the N1 System Manager System in Sun N1 System Manager 1.2 Installation and Configuration Guide.

The event notification is called notif2. The recipient's email address is nobody@sun.com


N1-ok> create notification notif2 destination nobody@sun.com
Lifecycle.Logical.CreateGroup type email

The show notification command can be used to verify that the event notification has been created.


N1-ok> show notification
Name    Event Topic                         Destination       State
notif2  EReport.Physical.ThresholdExceeded  nobody@sun.com    enabled 

The event can be invoked by creating a false group, as a test.


N1-ok> create group test

An email should be sent if the notification was created successfully. Otherwsie, the following error message is displayed:


Notification test failed.

Verify if the SMTP server is configured correctly and is reachable, and if the email address used in the notification rule is valid.



Example 5–18 Creating an SNMP Notification

This example shows how to create an event notification to be sent by SNMP if a physical threshold value is exceeded. The event notification is called notif3. The recipient SNMP address is sun.com


N1-ok> create notification notif3 destination sun.com
topic EReport.Physical.ThresholdExceeded type snmp

The show notification command can be used to verify that the event notification has been created.


N1-ok> show notification
Name    Event Topic                         Destination  State
notif3  EReport.Physical.ThresholdExceeded  sun.com      enabled

You can specify the event notification you want to see by using show notificationcommand with the notification attribute value.


N1-ok> show notification notif3
Name    Event Topic                         Destination  State
notif3  EReport.Physical.ThresholdExceeded  sun.com      enabled

ProcedureTo Delete an Event Notification

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> delete notification notification
    

    The event notification is deleted.

Starting and Stopping Event Notifications

Event notifications are enabled, or started, by default at creation. Use the start notification command to enable an event notification that has been disabled. Type help start notification at the N1–ok command line for syntax and parameter details.

ProcedureTo Start an Event Notification

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> start notification notification
    

    The event notification is enabled. See start notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.

ProcedureTo Stop an Event Notification

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line for details.

  2. Type the following command:


    N1-ok> stop notification notification
    

    The event notification is disabled. See stop notification in Sun N1 System Manager 1.2 Command Line Reference Manual for details.