The first section of this chapter provides an explanation of what monitoring is, in the context of the N1 System Manager, and describes how to monitor servers that are part of the N1 System Manager. This chapter provides procedures for enabling and disabling monitoring, and for managing monitoring thresholds and polling intervals, using the command line.
This chapter also contains information about managing jobs, event log entries, and about setting up notifications.
This chapter contains the following sections:
Some procedures are also possible using the browser interface. These procedures are provided in the Sun N1 System Manager browser interface help.
This chapter contains descriptions of the following tasks:
Monitoring in the Sun N1 System Manager software enables you to track changes to specific attributes in specific managed objects. Managed objects include server hardware elements, operating systems, file systems, and networks. Attributes are the monitored elements, about which data is obtained and delivered by the N1 System Manager software. Examples of attributes are the average number of queued processes and the percentage of used memory. A list of attributes is provided in Hardware Sensor Attributes and in Table 5–2.
Attributes are associated with one of three main areas:
Hardware health attributes. For information about hardware health monitoring, see Hardware Health Monitoring.
OS resource utilization attributes. For information about OS resource utilization monitoring, see OS Resource Utilization Monitoring.
Network connectivity, or reachability. For information about network reachability monitoring, see Network Reachability Monitoring.
For a server or a group of servers, hardware health and operating system utilization and network connectivity are all monitored by the management server. All comparisons and verifications for monitoring are performed by the N1 System Manager. Provisionable servers are used only to access data.
An SNMP agent that is used for data retrieval is provided in the N1 System Manager software. If the management server is running the N1 System Manager on the Solaris OS, this agent is based on the Sun Management Center 3.5 software SNMP agent. If the management server is running the N1 System Manager on Linux, this agent is based on the Sun Management Center 3.6 Linux SNMP agent. The agent is deployed when operating systems are deployed on servers that are managed by the N1 System Manager software.
On Linux platforms, the N1 System Manager software only monitors ext3 file systems. Other types of file systems are not monitored for Linux platforms.
Monitoring is connected with the broadcasting of the events for each monitored server or group of servers. Events are generated when certain conditions related to attributes occur. For information about events and when they occur, see Managing Event Log Entries. There are no log files related to monitoring. Instead of log files, monitoring data is stored as events in the N1 System Manager database.
If monitoring is enabled for a server, each event causes a notification to be emitted from the N1 System Manager for that event. If monitoring is disabled for a server, monitoring events are not generated for that server. Lifecycle events continue to be generated, even with monitoring disabled. Lifecycle events include server discovery, server change or deletion, or server group creation. If you have requested notification of this type of event, you can still receive notifications even with monitoring disabled.
The hardware health of discovered servers is monitored. Sensors provided in the hardware are used to monitor temperature, voltage, and fan speed. For more information about associated hardware, see the Sun N1 System Manager Connection Information in Sun N1 System Manager 1.1 Site Preparation Guide.
Sensor data is retrieved from the service processor for SPARC devices through the Advanced Lights Out Manager (ALOM) interface. Sensor data is retrieved from IPMI for x64 servers.
General management interface data for Sun Fire V20z and Sun Fire V40z machines is obtained through the command line. General management interface data for Sun Fire x4100 and Sun Fire x4200 servers is obtained through IPMI. Data can be retrieved dynamically from the command line.
The following characteristics of server hardware can be monitored:
CPU temperature
Ambient temperature
Fan speed in revolutions per minute
Voltages
LEDs
A detailed list of these sensors is provided in Hardware Sensor Attributes.
You can view filtered hardware health monitoring information for all servers by using the show server command:
N1-ok> show server health health |
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details of possible values of the health filters.
OS resource utilization is monitored by the N1 System Manager. As part of the add server feature command, with the agentip keyword, you provide credentials to access the monitored server's operating system through ssh with the agentssh keyword. See To Add the OS Monitoring Feature for additional details. This procedure is important for OS resource utilization monitoring but not for monitoring hardware health or network reachability.
Access to the operating system by this mechanism is required primarily for the Remote Command Execution feature. Access to the operating system by this mechanism is how the management features are used to retrieve data for OS resource utilization monitoring. Platform OS interface data is obtained through ssh and SNMP; all attribute data is retrieved from the server's operating system by using ssh and SNMP. Statistics related to the central processor unit (CPU) are provided, as is data related to memory, swap usage, and file systems. For the purposes of monitoring, system load data, memory usage, and swap usage data can be broken down as follows:
System usage, including system idle times
System load, expressed as the average number of queued processes over 1, 5, and 15 minutes
Memory usage and memory free statistics, in megabytes and as percentages
Physical load statistics
Swap space used and space available, in megabytes and as percentages
File system used and space available, as percentages
A list of these attributes is provided in Hardware Sensor Attributes.
You can filter OS resource utilization monitoring information for all servers by using the show server command:
N1-ok> show server utilization utilization |
N1-ok> show server utilization unreachable |
The health of an OS resource can be shown as unknown if the server is reachable but the monitoring agent cannot be contacted on SNMP port 161.
The health of an OS resource can be shown as unreachable if the server is unreachable due to, for example, being in standby mode.
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
The monitoring of OS resource utilization attributes enables you to modify the default threshold values for all servers being managed by the N1 System Manager, through the creation and editing of a configuration file. See Changing Threshold Values With the Monitoring Configuration File for details.
The monitoring of OS resource utilization attributes also enables you to set specific thresholds for individual monitored servers, or for groups of monitored servers, at the command line by using the set command. See Setting Threshold Values for details.
If you are not interested in the values of some attributes, you can disable the threshold severity for monitoring of those attributes. This action prevents annoyance alarms. Example 5–4 shows you how to accomplish this disabling action.
All management interfaces of provisionable servers and all platform interfaces are monitored by default by the N1 System Manager. Platform interfaces include the service processor's management interface, such as eth0, and data network interfaces, such as eth1 or eth2.
Reachability is verified for Linux servers and servers running the Solaris OS by using an ICMP ping to the interface IP address. For further information, see Discovery of Servers in the Factory Default State in Sun N1 System Manager 1.1 Installation and Configuration Guide.
The reachability of all network interfaces is verified at regular intervals. These polling intervals are configurable. For information about configuring polling intervals, see Setting Polling Intervals. The monitoring of network reachability is based on the IP address. If any monitored IP address is unreachable, an event is generated.
You can filter information for all servers by using the show server command with the appropriate parameters to view monitoring information. See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
It is important to distinguish between the unreachable and unknown states for provisionable servers.
N1-ok> show server health unreachable |
This command lists all provisionable servers that are unreachable. Any provisionable server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its hardware health status. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting hardware health status information. The server could be in standby mode.
N1-ok> show server health unknown |
This command lists all provisionable servers that are not returning any information about hardware health status. The ping command may be successful but servers returned in the output of this command are not returning any hardware health information. The monitoring agent could not be contacted on port 161.
N1-ok> show server power unreachable |
This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its power status. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting power status information. The server could be in standby mode.
N1-ok> show server power unknown |
This command lists all provisionable servers that are not returning any information about power status. The ping command may be successful but servers returned in the output of this command are not returning any power status information. The monitoring agent could not be contacted on port 161.
N1-ok> show server utilization unreachable |
This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its OS resource utilization. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting OS resource utilization information. The server could be in standby mode.
N1-ok> show server utilization unknown |
This command lists all provisionable servers that are not returning any information about OS resource utilization. The ping command may be successful but servers returned in the output of this command are not returning any OS resource utilization information. The monitoring agent could not be contacted on port 161.
For all provisionable servers, that is to say for all physical servers that have been discovered by the Sun N1 System Manager software, management features are supported when the add server command is used to create monitorable objects. The management features are used to periodically retrieve CPU statistics, filesystem, and memory data, for monitoring purposes.
Monitored file system data for a provisionable server is not available unless an operating system is deployed on the provisionable server, and the management features have been added by using the add server feature command with the agentip keyword:
N1-ok> add server server-name feature basemanagement agentip agentip agentssh username/password |
N1-ok> add server server-name feature osmonitor agentip agentip agentssh username/password |
The agentip is the IP address of the provisioning network interface of the provisionable server that you want to monitor. See add server in Sun N1 System Manager 1.1 Command Line Reference Manual for details. See also To Add the Base Management Feature and To Add the OS Monitoring Feature for additional details on the syntax used in these commands.
When you specify or change features, you must use the add server command. The set server command cannot be used to specify a feature.
The add server command is useful for enabling OS resource utilization monitoring and network reachability monitoring, but not for monitoring hardware health. Hardware health is already monitored by default as soon as the Sun N1 System Manager software discovers a physical server.
The polling of network reachability is not possible if OS resource utilization monitoring is not enabled.
For more information about the agentip subcommand, see To Add the OS Monitoring Feature.
The add server command needs to be issued only once for a server and not each time you want to enable or disable monitoring.
If the provisionable server's IP address changes, use the set server command again before enabling or disabling monitoring.
The default status of monitoring in the Sun N1 System Manager for discovered servers and initialized operating systems is as follows:
When a server or other hardware is discovered, monitoring of the server or other hardware is enabled by default. Before a server can be monitored, however, it must be discovered and correctly registered with the N1 System Manager. This process is described in Discovering Servers. The monitoring of hardware sensors is enabled by default for all managed servers. If a server is deleted and then rediscovered, all states related to that server for the purposes of monitoring are lost. This is the case regardless of whether monitoring was enabled or disabled for that server when the server was deleted. When the server is rediscovered, monitoring is set to true by default. For more information about discovering servers, see To Discover New Servers.
Disabled by default. When an OS has been successfully provisioned on a provisionable server and the N1 System Manager management features are supported by using the add server feature command with the agentip specified, OS resource utilization monitoring is enabled. The OS provisioning can be performed either through the N1 System Manager or by an external OS installation.
If you are not interested in the values of some OS resource utilization attributes, you can disable the threshold severity for the monitoring of those attributes, while continuing to monitor other OS resource utilization attributes. This action prevents annoyance alarms. Example 5–4 shows how to accomplish this task. For general information about threshold values, see Monitoring Threshold Values.
When the management interface of the provisionable server is discovered, monitoring of the interface is enabled by default. When the management features are added, monitoring of other interfaces is enabled by default.
The following procedure describes how to use the command line to enable the monitoring of hardware health, operating system utilization, and network reachability of a server.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Set the monitored attribute to true by using the set server command.
N1-ok> set server server monitored true |
In this procedure, server is the name of the provisionable server that you want to monitor.
View the server details.
N1-ok> show server server |
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features. This procedure is important for OS resource utilization monitoring but not for monitoring hardware health.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Set the monitored attribute to true by using the set group command.
N1-ok> set group group monitored true |
This command is executed for the group of servers that you have already named. See set group in Sun N1 System Manager 1.1 Command Line Reference Manual for details. In this procedure, group is the name of the group of provisionable servers that you want to monitor.
View the server group details to determine if monitoring is enabled for each server in the group.
N1-ok> show group group |
View the specific monitoring details for individual servers in the group.
N1-ok> show server server |
Detailed monitoring information appears in the output. Information is displayed about polling intervals and threshold values for the monitoring of hardware health, OS resource utilization and network reachability. Polling intervals are explained in Setting Polling Intervals. Monitoring threshold values are explained in Monitoring Threshold Values.
You might want to disable monitoring of a hardware component to perform maintenance tasks without generating events.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Set the monitored attribute to false by using the set server command.
N1-ok> set server server monitored false |
In this example, server is the name of the provisionable server that you want to stop monitoring. Executing this command disables monitoring of the server. With monitoring of a server disabled, the violation of threshold values by attributes related to that server does not generate events.
View the server details.
N1-ok> show server server |
The output shows that monitoring is disabled.
If you are not interested in the values of some OS resource utilization attributes, you can disable the threshold severity for the monitoring of those attributes, while continuing to monitor other OS resource utilization attributes. This action prevents annoyance alarms. Example 5–4 shows how to accomplish this task. For general information about threshold values, see Monitoring Threshold Values. You can also completely remove the OS resource utilization monitoring feature. See To Remove the OS Monitoring Feature.
This procedure describes how to disable monitoring for a server group. You might want to disable monitoring of hardware components to perform maintenance tasks without generating events.
When you disable monitoring for a server, hardware health monitoring, OS monitoring, and network reachability monitoring are all disabled for that server.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Set the monitored attribute to false by using the set group command.
N1-ok> set group group monitored false |
This command is executed for the group of servers that you have already named. See set group in Sun N1 System Manager 1.1 Command Line Reference Manual for details. In this procedure, group is the name of the group of provisionable servers that you want to stop monitoring. Executing this command disables monitoring for all servers in the group. With monitoring of a server group disabled, the violation of threshold values by attributes related to servers in that group does not generate events.
View the server group details to determine if monitoring is disabled for all servers in the group.
N1-ok> show group group |
The value of any given monitored attribute is compared to a threshold value. Low and high threshold values are defined and can be configured.
Attribute data is compared against thresholds at regular intervals. These polling intervals are configurable. For further information about polling intervals, see Setting Polling Intervals.
When a monitored attribute is polled and the value of the attribute is beyond the default or user-defined threshold safe range, an event is generated and a status is issued. If the value of the attribute is lower than the low threshold or higher than the high threshold, then depending on the severity of the threshold, an event is generated to show a status of nonrecoverable, critical, or warning. Otherwise, the status of the monitored attribute is OK, provided that a value can be obtained.
If no value can be obtained, an event is generated to show that the status of the monitored attribute is unknown. The health of an OS resource can be shown as unknown if the server is reachable but the monitoring agent cannot be contacted on SNMP port 161.
The values nonrecoverable, critical, and warning are discussed in show server in Sun N1 System Manager 1.1 Command Line Reference Manual.
If the value of a monitored attribute rises above the warninghigh threshold, a status of warninghigh is issued. If the value continues to rise and passes the criticalhigh threshold, a status of criticalhigh is issued. If the value continues to rise above the nonrecoverablehigh threshold, a status of nonrecoverablehigh is issued.
If the value then falls back to the safe range, no further events are generated until the value falls below the warninghigh threshold, at which point an event is generated to show a status of normal.
If the value of a monitored attribute falls below the warninglow threshold, a status of warninglow is issued. If the value continues to fall, and passes the criticallow threshold, a status of criticallow is issued. If the value continues to fall below the nonrecoverablelow threshold, a status of nonrecoverablelow is issued.
If the value then rises back to the safe range, no further events are generated until the value rises above the warninglow threshold, at which point an event is generated to show a status of normal.
Threshold values for OS resource utilization attributes can be configured at the command line. This process is explained in Setting Threshold Values. For threshold values measuring percentages, the valid range is from 0 to 100%. If you try to set a threshold value outside of this range, an error is generated. For attributes that do not measure percentages, these values depend on the number of processors in your system and on the usage characteristics of your installation.
After a period of usage, you can develop an awareness of what levels to set for OS resource utilization attribute values. You can adjust thresholds once you determine more closely what value indicates a genuine justification for an event to be generated and for a notification to be sent to your pager or email address. For example, you might want to receive notifications every time a certain attribute reaches a warninghigh severity threshold level.
For important or crucial attributes at your installation, you can set the warninghigh threshold level to a low percentage value so that you are notified about a rising value as early as possible.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the show server command:
N1-ok> show server server |
In this procedure, server is the name of the provisionable server for which you want to retrieve threshold values.
Detailed monitoring threshold values appear in the output, including threshold information for the server's hardware health, OS resource utilization, and network reachability. Default values are shown if no specific values have been set.
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Factory-configured default threshold values are provided in the N1 System Manager software for some OS resource utilization thresholds. These values are stated as percentages. Table 5–1 lists default values for these OS resource utilization attributes.
Setting or modifying threshold values for hardware health attributes is not supported in this version of the Sun N1 System Manager.
Attribute Name |
Description |
Default Threshold |
Default Threshold |
---|---|---|---|
cpustats.pctusage |
Percentage of overall CPU usage |
warninghigh 80% |
criticalhigh 90% |
cpustats.pctidle |
Percentage of CPU idle |
warninglow 20% |
criticallow 10% |
memusage.pctmemused |
Percentage of memory in use |
warninghigh 80% |
criticalhigh 90% |
memusage.pctmemfree |
Percentage of memory free |
warninglow 20% |
criticallow 10% |
memusage.pctswapused |
Percentage of swap space in use |
warninghigh 80% |
criticalhigh 90% |
fsusage.pctused |
Percentage of file system space in use |
warninghigh 80% |
criticalhigh 90% |
Table 5–2 provides the complete list of OS resource utilization attributes and their default values. Where factory-configured default values exist for attributes, these are shown in parentheses.
Table 5–2 All OS Resource Utilization Attributes
Attribute Name |
Description |
Supported Threshold (Default) |
Supported Threshold (Default) |
---|---|---|---|
cpustats.loadavg1min |
System load expressed as average number of queued processes over 1 minute |
warninghigh |
criticalhigh |
cpustats.loadavg5min |
System load expressed as average number of queued processes over 5 minutes |
warninghigh |
criticalhigh |
cpustats.loadavg15min |
System load expressed as average number of queued processes over 15 minutes |
warninghigh |
criticalhigh |
cpustats.pctusage |
Percentage of overall CPU usage |
warninghigh (80%) |
criticalhigh (90%) |
cpustats.pctidle |
Percentage of CPU idle |
warninglow (20%) |
criticallow (10%) |
memusage.pctmemused |
Percentage of memory in use |
warninghigh (80%) |
criticalhigh (90%) |
memusage.pctmemfree |
Percentage of memory free |
warninglow (20%) |
criticallow (10%) |
memusage.mbmemused |
Memory in use in MB |
warninghigh |
criticalhigh |
memusage.mbmemfree |
Memory free in MB |
warninglow |
criticallow |
memusage.pctswapused |
Percentage of swap space in use |
warninghigh (80%) |
criticalhigh (90%) |
memusage.mbswapfree |
Free swap space in MB |
warninglow |
criticallow |
fsusage.pctused |
Percentage of file system space in use |
warninghigh (80%) |
criticalhigh (90%) |
You can modify default values for thresholds by editing the monitoring.properties configuration file.
If the monitoring.properties configuration file is not present, create and save it in /etc/opt/sun/n1gc/. The monitoring.properties configuration file is not created by default at installation.
Any entries that you make in the monitoring.properties configuration file for the threshold values of the attributes listed in Table 5–1 overwrite the factory-configured defaults for the corresponding threshold values.
The monitoring.properties configuration file should be stored only on the management server and not on provisionable servers.
Modifying or adding new entries to the monitoring.properties configuration file affects all the provisionable servers managed by the N1 System Manager.
Specific threshold values can be set at the command line by following the procedures described in Setting Threshold Values.
Once a default value for a monitored item has been modified by manually adding it in the monitoring.properties configuration file, that modified default value applies to all provisionable servers except those servers for which specific values for the monitored attribute have been set at the command line.
You do not need to reboot the management server or the monitored provisionable server for changes to the monitoring.properties file to take effect.
Monitored attributes for hardware health that are declared as percentages cannot be changed either at the command line or in the monitoring.properties file.
To modify default threshold values, edit the /etc/opt/sun/n1gc/monitoring.properties file. Only those default threshold values that relate to OS resource utilization attributes can be modified. Hardware health attribute default threshold values cannot be modified for servers.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Open the /etc/opt/sun/n1gc/monitoring.properties file.
If the file does not exist, create it.
Modify or add lines in the monitoring.properties file that describe default threshold values.
threshold.attribute.threshold value
The syntax requires the threshold keyword to be followed by the attribute for which you are setting a threshold. The attribute is an OS resource utilization attribute. OS resource utilization attributes are described in OS Resource Utilization Monitoring.
The threshold is either criticallow, warninglow, warninghigh, or criticalhigh.
The value is a numeric figure and usually represents a percentage value.
Save the file.
You do not need to reboot the management server or the provisionable server for the changes to take effect. The modified default threshold values now apply to all servers managed by the N1 System Manager.
This example shows how to modify the default criticalhigh threshold value for file system usage to 75 percent of maximum file system usage capacity. The following line is added to or amended in the /etc/opt/sun/n1gc/monitoring.properties file:
threshold.fsusage.pctused.criticalhigh=75 |
This value applies to all provisionable servers, unless you have set specific values for the threshold value at the command line, by using the set command as described in Setting Threshold Values.
Threshold values can be disabled. This process is shown in Example 5–4.
For x86 servers, the management server software obtains the list of hardware sensor attributes to monitor through IPMI from the service processor of the server. For servers running the SPARC architecture, the ALOM interface is used. The list of hardware sensor attributes can vary from server to server, and between firmware versions. A sample listing for some servers and firmware versions is provided in this section. It depends on the server type and on the number of CPUs that the server has.
Hardware disk failure and memory failure are not monitored in this version of the N1 System Manager.
The following list contains sensor names and descriptions for a Sun Fire V40z server with firmware version 2.1.0.16:
ambienttemp Ambient air temp bulk.v12-0-s0 Bulk 12V S0 voltage at CPU 0 bulk.v12-2-s0 Bulk 12V S0 voltage at CPU 2 bulk.v12-3-s0 Bulk 12V S0 voltage at CPU 3 bulk.v1_8-s0 Bulk 1.8V S0 voltage bulk.v1_8-s5 Bulk 1.8V S5 voltage bulk.v2_5-s0 Bulk 2.5V S0 voltage bulk.v2_5-s0-dc Bulk 2.5V S0 voltage at DC bulk.v2_5-s5 Bulk 2.5V S5 voltage bulk.v3_3-s0 Bulk 3.3V S0 voltage bulk.v3_3-s0-dc Bulk 3.3V S0 voltage at DC bulk.v3_3-s3 Bulk 3.3V S3 voltage bulk.v3_3-s5 Bulk 3.3V S5 voltage bulk.v3_3-s5-dc Aux 3.3V S5 voltage at DC bulk.v5-s0 Bulk 5V S0 voltage bulk.v5-s0-dc Bulk 5V S0 voltage at DC bulk.v5-s5 Bulk 5V S5 voltage bulk.v5-s5-dc Bulk 5V S5 voltage at DC cd.lp CDROM Light path location LED cpu0.dietemp CPU 0 Die temperature cpu0.heartbeat CPU 0 Heartbeat cpu0.inlettemp CPU 0 Inlet temperature cpu0.lp CPU 0 Light path location LED cpu0.mem0.lp CPU 0 Dimm 0 Light path location LED cpu0.mem1.lp CPU 0 Dimm 1 Light path location LED cpu0.mem2.lp CPU 0 Dimm 2 Light path location LED cpu0.mem3.lp CPU 0 Dimm 3 Light path location LED cpu0.memtemp CPU 0 Memory temperature cpu0.memvrm.lp CPU 0 Memory VRM Light path location LED cpu0.v2_5-s0 CPU 0 VDDA (2.5V) S0 voltage cpu0.v2_5-s3 CPU 0 VDD (2.5V) S3 voltage cpu0.vcore-s0 CPU 0 VCore S0 voltage cpu0.vid CPU 0 VID Selection cpu0.vldt0 CPU 0 LDT0 voltage cpu0.vrm.lp CPU 0 VRM Light path location LED cpu0.vtt-s3 CPU 0 DDR VTT S3 voltage cpu1.dietemp CPU 1 Die temperature cpu1.heartbeat CPU 1 Heartbeat cpu1.inlettemp CPU 1 Inlet temperature cpu1.lp CPU 1 Light path location LED cpu1.mem0.lp CPU 1 Dimm 0 Light path location LED cpu1.mem1.lp CPU 1 Dimm 1 Light path location LED cpu1.mem2.lp CPU 1 Dimm 2 Light path location LED cpu1.mem3.lp CPU 1 Dimm 3 Light path location LED cpu1.memtemp CPU 1 Memory temperature cpu1.memvrm.lp CPU 1 Memory VRM Light path location LED cpu1.v2_5-s0 CPU 1 VDDA (2.5V) S0 voltage cpu1.v2_5-s3 CPU 1 VDD (2.5V) S3 voltage cpu1.vcore-s0 CPU 1 VCore S0 voltage cpu1.vid CPU 1 VID Selection cpu1.vldt1 CPU 1 LDT1 voltage cpu1.vldt2 CPU 1 LDT2 voltage cpu1.vrm.lp CPU 1 VRM Light path location LED cpu1.vtt-s3 CPU 1 DDR VTT S3 voltage cpu2.dietemp CPU 2 Die temperature cpu2.heartbeat CPU 2 Heartbeat cpu2.inlettemp CPU 2 inlet temperature cpu2.lp CPU 2 Light path location LED cpu2.mem0.lp CPU 2 Dimm 0 Light path location LED cpu2.mem1.lp CPU 2 Dimm 1 Light path location LED cpu2.mem2.lp CPU 2 Dimm 2 Light path location LED cpu2.mem3.lp CPU 2 Dimm 3 Light path location LED cpu2.memvrm.lp CPU 2 Memory VRM Light path location LED cpu2.temp CPU 2 downwind temperature cpu2.v2_5-s0 CPU 2 VDDA (2.5V) S0 voltage cpu2.v2_5-s3 CPU 2 VDD (2.5V) S3 voltage cpu2.vcore-s0 CPU 2 VCore S0 voltage cpu2.vid CPU-2 VID Selection cpu2.vrm.lp CPU 2 VRM Light path location LED cpu2.vtt-s3 CPU 2 DDR VTT voltage cpu3.dietemp CPU 3 Die temperature cpu3.heartbeat CPU 3 Heartbeat cpu3.inlettemp CPU 3 inlet temperature cpu3.lp CPU 3 Light path location LED cpu3.mem0.lp CPU 3 Dimm 0 Light path location LED cpu3.mem1.lp CPU 3 Dimm 1 Light path location LED cpu3.mem2.lp CPU 3 Dimm 2 Light path location LED cpu3.mem3.lp CPU 3 Dimm 3 Light path location LED cpu3.memvrm.lp CPU 3 Memory VRM Light path location LED cpu3.temp CPU 3 downwind temperature cpu3.v2_5-s0 CPU 3 VDDA (2.5V) S0 voltage cpu3.v2_5-s3 CPU 3 VDD (2.5V) S3 voltage cpu3.vcore-s0 CPU 3 VCore S0 voltage cpu3.vid CPU-3 VID Selection cpu3.vrm.lp CPU 3 VRM Light path location LED cpu3.vtt-s3 CPU 3 DDR VTT voltage cpuplanar.lp Daughtercard Light path location LED fan1.tach Fan 1 measured speed fan10.tach Fan 10 measured speed fan11.tach Fan 11 measured speed fan12.tach Fan 12 measured speed fan2.tach Fan 2 measured speed fan3.tach Fan 3 measured speed fan4.tach Fan 4 measured speed fan5.tach Fan 5 measured speed fan6.tach Fan 6 measured speed fan7.tach Fan 7 measured speed fan8.tach Fan 8 measured speed fan9.tach Fan 9 measured speed faultswitch System Fault Indication floppy.lp Floppy Light path location LED frontpanel.lp LCD Light path location LED g0.vldt1 AMD-8131 PCI-X Tunnel 0 LDT1 voltage g1.vldt1 AMD-8131 PCI-X Tunnel 1 LDT1 voltage gbeth.temp Gigabit ethernet local temperature golem-v1_8-s0 AMD-8131 PCI-X Tunnel 1.8V S0 voltage identifyswitch Identify switch pci1.lp PCI Slot 1 Light path location LED pci2.lp PCI Slot 2 Light path location LED pci3.lp PCI Slot 3 Light path location LED pci4.lp PCI Slot 4 Light path location LED pci5.lp PCI Slot 5 Light path location LED pci6.lp PCI Slot 6 Light path location LED pci7.lp PCI Slot 7 Light path location LED pcifan.lp Fan Board Light path location LED planar.lp Motherboard Light path location LED scsibp.lp SCSI Backplane Light path location LED scsibp.temp SCSI Disk backplane temperature scsifault SCSI Disk Fault Switch sp.temp SP local temperature vldt-reg1-dc LDT Regulator 1 Voltage vldt-reg2-dc LDT Regulator 2 Voltage |
The following list contains sensor names and descriptions for a Sun Fire V20z server with firmware version 2.1.0.16:
ambienttemp Ambient air temp bulk.v12-0-s0 Bulk 12v supply voltage (cpu0) bulk.v12-1-s0 Bulk 12v supply voltage (cpu1) bulk.v1_8-s0 Bulk 1.8v S0 voltage bulk.v1_8-s5 Bulk 1.8v S5 voltage bulk.v2_5-s0 Bulk 2.5v S0 voltage bulk.v2_5-s5 Bulk 2.5v S5 voltage bulk.v3_3-s0 Bulk 3.3v supply bulk.v3_3-s3 Bulk 3.3v S3 voltage bulk.v3_3-s5 Bulk 3.3v S5 voltage bulk.v5-s0 Bulk 5v supply voltage bulk.v5-s5 Bulk 5v S5 voltage cd.lp CD-ROM Light path location led cpu0.dietemp CPU 0 die temp cpu0.heartbeat CPU 0 heartbeat cpu0.lp CPU 0 Light path location led cpu0.mem0.lp CPU 0 Dimm 0 Light path location led cpu0.mem1.lp CPU 0 Dimm 1 Light path location led cpu0.mem2.lp CPU 0 Dimm 2 Light path location led cpu0.mem3.lp CPU 0 Dimm 3 Light path location led cpu0.memtemp CPU 0 memory temp cpu0.memvrm.lp CPU 0 Memory VRM Light path location led cpu0.temp CPU 0 low side temp cpu0.v2_5-s0 CPU VDDA voltage cpu0.v2_5-s3 CPU 0 VDDIO voltage cpu0.vcore-s0 CPU 0 core voltage cpu0.vid CPU-0 VID output cpu0.vldt1 CPU0 HT 1 voltage cpu0.vldt2 CPU 0 HT 2 voltage cpu0.vrm.lp CPU 0 VRM Light path location led cpu0.vtt-s3 CPU 0 VTT voltage cpu1.dietemp CPU 1 die temp cpu1.heartbeat CPU 1 heartbeat cpu1.lp CPU 1 Light path location led cpu1.mem0.lp CPU 1 Dimm 0 Light path location led cpu1.mem1.lp CPU 1 Dimm 1 Light path location led cpu1.mem2.lp CPU 1 Dimm 2 Light path location led cpu1.mem3.lp CPU 1 Dimm 3 Light path location led cpu1.memtemp CPU 1 memory temp cpu1.memvrm.lp CPU 1 Memory VRM Light path location led cpu1.temp CPU 1 low side temp cpu1.v2_5-s3 CPU 1 VDDIO voltage cpu1.vcore-s0 CPU 1 core voltage cpu1.vid CPU-1 VID output cpu1.vrm.lp CPU 1 VRM Light path location led cpu1.vtt-s3 CPU 1 VTT voltage fan1.tach Fan 1 measured speed fan2.tach Fan 2 measured speed fan3.tach Fan 3 measured speed fan4.tach Fan 4 measured speed fan5.tach Fan 5 measured speed fan6.tach Fan 6 measured speed faultswitch Fault switch (source for eval) floppy.lp Floppy Disk Drive Light path location led frontpanel.lp LCD Light path location led g.vldt1 AMD-8131 PCI-X Tunnel HT 1 voltage gbeth.temp Gigabit ethernet temp golem.temp PCIX bridge temp hdd1.lp Hard Disk Drive 1 Light path location led hdd2.lp Hard Disk Drive 2 Light path location led hddbp.lp Hard Disk Drive Backplane Light path location led hddbp.temp Disk drive backplane temp identifyswitch Identify switch pci1.lp PCI Slot 1 Light path location led pci2.lp PCI Slot 2 Light path location led planar.lp Motherboard Light path location led ps.fanfail Power Supply fan failure sensor ps.lp Powersupply Light path location led ps.tempalert Power Supply too hot sensor sp.temp SP temp thor.temp AMD-8111 I/O Hub temp |
Monitoring data is retrieved by the N1 System Manager from many of these sensors. For Sun Fire x4100 and x4200 servers, sensors other than analog sensors are not used to retrieve data. Only sensors describing fan speed, voltage and temperature are used to retrieve data. For descriptions of sensors in the Sun Fire x4100 and x4200 servers, refer to the IPMI reference information in the Sun Fire x4100 and x4200 server product documentation.
Threshold values for monitored objects can be set on specific servers. Setting specific threshold values at the command line for attributes of a monitored object overrides for that object any factory-configured threshold values concerning the attribute. Any entries in the monitoring.properties configuration file concerning the attribute are also overridden.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Use the set server command with the threshold attribute.
The syntax requires the threshold keyword to be followed by the attribute for which you are setting a threshold. The attribute is an OS resource utilization attribute. OS resource utilization attributes are described in OS Resource Utilization Monitoring and listed in Table 5–2.
The threshold is either criticallow, warninglow, warninghigh, or criticalhigh. The value is a numeric figure and usually represents a percentage.
This example shows how to set the CPU usage warninghigh severity threshold on a provisionable server named serv1 to 53 percent. This example also shows how to set the criticalhigh severity threshold value to 75 percent.
N1-ok> set server serv1 threshold cpustats.pctusage warninghigh 53 criticalhigh 75 |
These values override the default values stored in the monitoring.properties configuration file on the management server for the server named serv1.
This example sets the file system usage warninghigh threshold on a provisionable server named serv1 to 75 percent. This example also shows how to set the criticalhigh threshold value to 87 percent.
N1-ok> set server serv1 threshold fsusage.pctused warninghigh 75 criticalhigh 87 |
This example shows how to delete a value that was set for the warninghigh threshold on a provisionable server named serv1.
N1-ok> set server serv1 threshold fsusage warninghigh none |
In this case, any previously set value for this threshold at this severity is deleted. The threshold severity value does not revert back to the default threshold value, which is stored in the monitoring.properties configuration file, or to the factory-configured default, if this default existed for the attribute. In effect, monitoring is disabled for the warninghigh threshold for file system usage for this server.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Use the set group command with the threshold attribute.
The syntax requires the threshold keyword to be followed by the attribute for which you are setting a threshold. The attribute is an OS resource utilization attribute. OS resource utilization attributes are described in OS Resource Utilization Monitoring and listed in Table 5–2.
The threshold is either criticallow, warninglow, warninghigh, or criticalhigh. The value is a numeric figure, and usually represents a percentage.
This example shows how to set the file system usage warninghigh threshold to 75 percent on a group of provisionable servers with a group name of grp3. This example also shows how to set the criticalhigh threshold severity value to 87 percent.
N1-ok> set group grp3 threshold fsusage.pctused warninghigh 75 criticalhigh 87 |
The monitoring of an object consists of regular checks, or polls, of the monitored object. The frequency of these polls is controlled by setting the polling interval. The appropriate interval length between polls of the monitored object is related to the object being monitored and its environment, and the performance conditions to which the monitored object is being subjected. Default polling intervals are provided for some monitored objects, including server hardware objects such as fans. Default polling intervals apply for those servers or groups of servers for which specific interval values have not been set by using the set command.
You can modify default values for polling intervals for hardware health, OS resource utilization, and network reachability by editing the monitoring.properties configuration file.
The polling of network reachability is not possible if OS monitoring is not enabled.
If the monitoring.properties configuration file is not present, create it and save it in /etc/opt/sun/n1gc/monitoring.properties. The monitoring.properties is not created by default at installation.
Factory-configured default polling intervals are provided in the N1 System Manager software. These values are stated in seconds. The factory-configured defaults are provided in Table 5–3.
Table 5–3 Factory-Configured Default Polling Intervals
Type of Monitoring |
Default Polling Interval |
---|---|
Hardware health |
120 seconds |
OS resources |
120 seconds |
Network reachability |
60 seconds |
Any entries you make in the monitoring.properties configuration file overwrite these factory-configured defaults.
The minimum default polling interval that you can set is 60 seconds
The monitoring.properties configuration file exists only on the management server and not on provisionable servers. Modifying the default polling intervals stored in the monitoring.properties configuration file affects all the provisionable servers managed by the N1 System Manager.
You do not need to reboot the management server or the monitored provisionable server for changes to the monitoring.properties file to take effect.
Default polling intervals stored in the monitoring.properties configuration file apply to all servers unless specific values have been set at the command line for a specific server or group of servers. Set specific polling interval values by using the set command, as described in Setting Polling Intervals.
After a period of usage after installation and deployment, you can develop an awareness of how frequently you should be polling hardware health attributes and OS resource utilization attributes, and how often you need to poll your network reachability. Your configuration of the N1 System Manager depends on what your priorities are, in terms of crucial events. When setting polling intervals, or when changing default polling intervals, consider the number of servers you are managing with your N1 System Manager software. Consider also the application loads or application expected loads of your provisionable servers, and the capabilities of your network. Your expected responsiveness to events is also relevant. If you are able to react quickly to events as they occur, polling more frequently is appropriate.
For further information about tuning polling intervals for your installation, see To Increase the N1 System Manager Performance in Sun N1 System Manager 1.1 Installation and Configuration Guide.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the show server command:
N1-ok> show server server |
In this procedure, server is the name of the provisionable server for which you want to retrieve polling intervals.
Detailed monitoring polling intervals appear in the output, including polling interval information for the server's hardware health, OS resource utilization, and network reachability.
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
To enable the management agent IP and security credentials on a server named server, add the management features on the server as explained in Adding Base and OS Management Features.
Open the /etc/opt/sun/n1gc/monitoring.properties file.
If the file does not exist, create it.
Modify or add lines in the monitoring.properties file that describe default polling intervals.
pollinginterval.monitor=value
The syntax requires the pollinginterval keyword.
monitor is either hardwarehealth, osresources or network. The polling of network reachability is not possible unless OS resource monitoring has been enabled, as described in Enabling Monitoring.
The value is in seconds, and the minimum value is 60.
Save the file.
You do not need to reboot the management server or the provisionable server for the changes to take effect. The modified default polling intervals values now apply to all servers managed by the N1 System Manager.
This example shows how to set the hardware health monitoring polling interval to 180 seconds, the OS resource utilization monitoring polling interval to 175 seconds, and the network reachability monitoring polling interval to 160 seconds. The following entries are made in the monitoring.properties configuration file.
pollinginterval.hardwarehealth=180 pollinginterval.osresources=175 pollinginterval.network=160 |
This section contains procedures that describe how to set the polling intervals for a server or a server group.
This procedure shows you how to set a polling interval for a server at the command line. Any value set this way overwrites the factory-configured default value or the value in the monitoring.properties configuration file, if the file exists.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the set server command with the monitor attribute.
set server server monitor monitor interval value |
This command is executed for a server that you have already named. In this procedure, this name appears as server. See set server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
The monitor is either hardwarehealth, osresources, or network.
The value is in seconds.
The minimum polling interval that you can set is 60 seconds.
This example shows how to set a polling interval of 280 seconds for hardware health monitoring of a provisionable server named serv1.
N1-ok> set server serv1 monitor hardwarehealth interval 280 |
Any value set this way overwrites the factory-configured default value or the value in the monitoring.properties configuration file, if the file exists.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the set group command with the monitor attribute.
set group group monitor monitor interval value |
This command is executed for a group of servers that you have already named. In this procedure, this name appears as group. See set group in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
The monitor is either hardwarehealth, osresources, or network.
The value is in seconds.
The minimum polling interval that you can set is 60 seconds.
This example shows how to set a polling interval of 250 seconds for network reachability monitoring of a group of provisionable servers named grp5.
N1-ok> set group grp5 monitor network interval 250 |
Two MIBS are provided with the N1 System Manager. These MIBs provide the data structure that third-party monitoring tools can use to retrieve the data from the N1 System Manager using SNMP, and provide the data structure that third party monitoring tools can use to parse the SNMP notifications generated by the N1 System Manager. The MIBs can be found at /opt/sun/n1gc/etc/. These MIBs therefore enable you to use any SNMP client to query the N1 System Manager, and to listen for events using SNMP. The following MIBs are provided:
This MIB describes the information that you can retrieve from the N1 System Manager by querying it using an SNMP client.
This MIB describes all of the events related to the N1 System Manager about which you can receive SNMP traps.
These MIBs are read-only. Using them requires a detailed knowledge of SNMP, although detailed descriptions of each object are provided in the MIBs. How you configure your monitoring system to start receiving traps depends on the nature of your monitoring system.
The MIBs are hardware independent.
This example shows you how to use the simple UNIX trap listener, the snmptrapd command, to start receiving N1 System Manager traps.
N1-ok> snmptrapd -m all -M /opt/sun/n1gc/etc:/usr/share/snmp/ mibs -P 1010 |
This example uses the snmptrapd command to start monitoring port 1010 for SNMP traps. It also instructs the command to use the MIBs stored at /opt/sun/n1gc/etc and /usr/share/snmp/mibs to parse the contents of SNMP traps.
How you configure your monitoring system to start receiving traps depends on the nature of your monitoring system.
This section describes jobs and how they are an integral part of server monitoring.
Each major action you take in the N1 System Manager starts a job. Use the job log to track the status on a currently running action or to verify that a job has finished. Monitoring jobs is useful particularly because some N1 System Manager actions can take a long time to finish. An example of such an action is installing an OS distribution on one or more provisionable servers.
You can track jobs through the Jobs tab in the browser interface or the show job command. The show job command provides information about most of the following characteristics:
Generated unique identifier.
Date on which the job was started.
Type of job. See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details. When using the show job command with the type parameter, jobs can be any of the following types:
addbase – Add base management support.
addbasemonitor – Add OS monitoring support.
createos – Create OS distribution from CD/DVD media or ISO files.
deletejob – Delete job.
discover – Server discovery.
loadfirmware – Load firmware update.
loados – Load OS.
loadupdate – Load OS update.
refresh – Server refresh.
removeosmonitor – Remove OS monitoring support.
setagentip – Modify OS monitoring support.
start – Server power on.
stop – Server power off.
unloadupdate – Unload OS update.
State of the current job step. Job steps indicate the progress of a job and update results. Each job step has a type, a start time and, when the job completes, a completion time. For the purposes of filtering, job progress is indicated with the following states:
Jobs in a notstarted state cannot be stopped.
When you select a job by ID and view the details of that job, each step of that job appears twice – the preflight check and the execution of the step itself.
The job is currently running. Jobs that are currently running cannot be deleted using the delete job command. Jobs that are currently running must finish running or be stopped using the stop job command.
Job completion is indicated with the following results:
Indicates that the job step completed successfully.
Indicates a warning during the job execution. A warning can be an issue reported that might or might not necessarily be severe enough to terminate the job step, and the job, with errors.
Indicates that the job step stopped before it completed.
Indicates that the job is still running but that the job step cannot complete successfully.
Indicates a general error in that job step.
Indicates that the job timed out before all of the job steps could complete successfully, or that the next step of the job started before the current step completed successfully.
Complete - Warning is issued in the output for an overall job status, if the job successfully completed all of its steps but there were one or more WARNING states issued for steps during the job execution and these warnings were not severe enough to terminate the job with errors.
You can filter jobs depending on their state. See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
The user who started the job. Also called the job creator.
Provides details about the results of a completed job. You can review the standard output of remote command operations and completion statuses for all other job types.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
View the list of jobs.
N1-ok> show job all |
A list of all jobs for the N1 System Manager is returned.
See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows that using the show job command with the all option returns a list of jobs by Job ID, together with the date and time at which the job was started. The job type and status are also returned, along with the identity of the user who created the job.
N1-ok> show job all Job ID Date Type Status Creator 7 2005-09-16T10:51:07-0700 Discovery Completed root 6 2005-09-14T14:42:52-0700 Server Reboot Error root 5 2005-09-14T14:38:25-0700 Server Power On Completed root 4 2005-09-14T14:29:20-0700 Server Power Off Completed root 3 2005-09-09T13:01:35-0700 Discovery Completed root 2 2005-09-09T12:38:16-0700 Discovery Completed root 1 2005-09-09T10:32:40-0700 Discovery Completed root |
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
View a specific job.
N1-ok> show job job |
Detailed information about the job appears in the output.
See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows that using the show job command with the Job ID returns the date and time at which the job was started, the job type and status, and the identity of the user who created the job. Further details are provided for each step of that job, including the time at which the step started and completed and whether the step was successful.
N1-ok> show job 5 Job ID: 5 Date: 2005-02-14T14:38:25-0700 Type: Server Power On Status: Completed Creator: root Errors: 0 Warnings: 0 Step 1: Type: 103 Description: native procedure /bin/sh /opt/sun/n1gc/bin/serverPowerOn.sh :[SERVER_NAME] :[JOBID_KEY] Start: 2005-02-14T14:38:25-0700 Completion: 2005-02-14T14:38:25-0700 Result: Complete Exception: No Data Available Step 2: Type: 103 Description: native procedure /bin/sh /opt/sun/n1gc/bin/serverPowerOn.sh :[SERVER_NAME] :[JOBID_KEY] Start: 2005-02-14T14:38:28-0700 Completion: 2005-02-14T14:38:35-0700 Result: Complete Exception: No Data Available Step 3: Type: 135 Description: connect and lock hosts Start: 2005-02-14T14:38:25-0700 Completion: 2005-02-14T14:38:25-0700 Result: Complete Exception: No Data Available Step 4: Type: 135 Description: connect and lock hosts Start: 2005-02-14T14:38:27-0700 Completion: 2005-02-14T14:38:28-0700 Result: Complete Exception: No Data Available Result 1: Server: 192.168.200.3 Status: 0 Message: The server operation was successful. N1-ok> |
Each step appears twice in the output. The first appearance of the step in the list is the preflight check, and the second appearance of the step in the list is the actual execution of the step.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Stop a specific job.
N1-ok> stop job job |
The job is stopped.
See stop job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
View the job details.
N1-ok> show job job |
The Result section of the output shows that the job was stopped.
Any job can be stopped. In practice, however, only a job that is not in its last step can be stopped. Some jobs only have one step and so can never be stopped. Jobs in a notstarted state cannot be stopped. Operations that are performed on large groups of servers can take longer and might include a large number of steps.
See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows that using the stop job command with the Job ID returns a message confirmed that the request has been received.
N1-ok> stop job 9 Stop Job "9" request received. |
This example also shows that the show job command can be used with the Job ID of the job that was stopped to gain more data about the job that was stopped. This returns the confirmation, in Status, that the job was stopped, and that the job was a remote command job. Further details are provided for each step of that job, including the time at which the step started and completed and whether the step was successful. The Result section shows that the job was canceled.
N1-ok> show job 9 Job ID: 9 Date: 2005-02-15T16:43:58-0700 Type: Remote Command Status: Stopped Owner: root Errors: 0 Warnings: 0 Step 1: Type: 135 Description: connect and lock hosts Start: 2005-02-15T16:43:58-0700 Completion: 2005-02-15T16:43:58-0700 Result: Complete Exception: No Data Available Step 2: Type: 103 Description: native procedure /bin/sh /opt/sun/n1gc/bin/remotecmd.sh :[RCMD_KEY] Start: 2005-02-15T16:43:58-0700 Completion: 2005-02-15T16:43:58-0700 Result: Complete Exception: No Data Available Step 3: Type: 135 Description: connect and lock hosts Start: 2005-02-15T16:44:00-0700 Completion: 2005-02-15T16:44:00-0700 Result: Complete Exception: No Data Available Step 4: Type: 103 Description: native procedure /bin/sh /opt/sun/n1gc/bin/remotecmd.sh :[RCMD_KEY] Start: 2005-02-15T16:44:00-0700 Completion: 2005-02-15T16:44:49-0700 Result: Incomplete - Aborted Exception: No Data Available Result : Server: server1 Status: -1 Message: Command running on server1 was canceled. Command: /root/sleep.sh 60 Standard Output: Sleeping for 60 seconds... |
Each step appears twice in the output. The first appearance of the step in the list is the preflight check, and the second appearance of the step in the list is the actual execution of the step.
To Issue Remote Commands on a Server or a Server Group
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Determine the job you want to delete.
N1-ok> show job all |
All jobs and job IDs appear in the output.
See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Delete the desired job.
N1-ok> delete job job |
The job is deleted.
See delete job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Verify that the job was deleted.
N1-ok> show job all |
The deleted job should not appear in the output.
See show job in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows how to delete a job.
First, the show job command is used with the all option, which lists all jobs in descending order.
N1-ok> show job all Job ID Date Type Status Creator 7 2005-02-16T10:51:07-0700 Discovery Completed root 6 2005-02-14T14:42:52-0700 Server Reboot Error root 5 2005-02-14T14:38:25-0700 Server Power On Completed root 4 2005-02-14T14:29:20-0700 Server Power Off Completed root 3 2005-02-09T13:01:35-0700 Discovery Completed root 2 2005-02-09T12:38:16-0700 Discovery Completed root 1 2005-02-09T10:32:40-0700 Discovery Completed root |
Job ID 6 has an error and can be deleted. The delete job command is now used with the Job ID of the job to be deleted.
N1-ok> delete job 6 |
The show job command is used again with the all option, which lists all jobs in descending order. The deleted job no longer appears on the list.
N1-ok> show job all Job ID Date Type Status Creator 7 2005-02-16T10:51:07-0700 Discovery Completed root 5 2005-02-14T14:38:25-0700 Server Power On Completed root 4 2005-02-14T14:29:20-0700 Server Power Off Completed root 3 2005-02-09T13:01:35-0700 Discovery Completed root 2 2005-02-09T12:38:16-0700 Discovery Completed root 1 2005-02-09T10:32:40-0700 Discovery Completed root |
This example shows how to delete all jobs.
First, the show job command is used with the all option, which lists all jobs in descending order.
N1-ok> show job all Job ID Date Type Status Creator 7 2005-09-16T10:51:07-0700 Discovery Completed root 6 2005-09-14T14:42:52-0700 Server Reboot Error root 5 2005-09-14T14:38:25-0700 Server Power On Completed root 4 2005-09-14T14:29:20-0700 Server Power Off Completed root 3 2005-09-09T13:01:35-0700 Discovery Running root 2 2005-09-09T12:38:16-0700 Discovery Completed root 1 2005-09-09T10:32:40-0700 Discovery Completed root |
The delete job command is now used with the all option, to delete all jobs.
N1-ok> delete job all Unable to delete job "3" |
The show job command is used with the all option, to confirm whether all jobs were successfully deleted.
N1-ok> show job all Job ID Date Type Status Creator 3 2005-09-09T13:01:35-0700 Discovery Running root |
Job ID 3 is still running. This is because jobs that were in a running state when the delete job command was issued must finish running, or must be stopped, before they can be deleted.
To stop the job and then delete it, first the stop job command is used with the ID of the job to be stopped.
N1-ok> stop job 3 Stop Job "3" request received. |
The show job command is used to confirm that the job has been stopped.
N1-ok> show job all Job ID Date Type Status Creator 3 2005-09-09T13:02:35-0700 Discovery Aborted root |
The job has been stopped while running and is in the aborted state. The delete job command is now used with the all option, to delete all jobs.
N1-ok> delete job all |
The show job command is used to confirm that all jobs have now been deleted.
N1-ok> show job all Job ID Date Type Status Creator |
This section describes events and how they are integral to monitoring your servers.
Events are generated when certain conditions related to attributes occur. Each event has an associated topic. For example, when a server is discovered by the management server, an event is generated with the topic Action.Physical.Discovered. For a complete list of event topics, see create notification in Sun N1 System Manager 1.1 Command Line Reference Manual.
Events can be monitored: Monitoring is connected with the broadcasting of events for each monitored server or group of servers. When a monitored attribute is polled and the value of the attribute is beyond the default or user-defined threshold safe range, an event is generated and a status is issued.
If monitoring is enabled for a server, provided a notification rule has been added for the event, the event causes a notification to be emitted from the management server for that event.
If monitoring is disabled for a server, monitoring events are not generated for that server. You might want to disable monitoring of a hardware component to perform maintenance tasks without generating events.
See Introduction to Monitoring for more information about monitoring.
See Setting Up Notifications for more information about notifications.
Lifecycle events continue to be generated, even with monitoring disabled. Lifecycle events include server discovery, server change or deletion, or server group creation. If you have requested notification of this type of event you can still receive notifications even with monitoring disabled.
Logs are created when events occur. For example, if any monitored IP address is unreachable, an event is generated. This event creates a log record, which is visible from the browser interface.
During the installation and configuration of the N1 System Manager, you can configure which events to log and you can also interactively configure severity levels for event topics. See Configuring the N1 System Manager System in Sun N1 System Manager 1.1 Installation and Configuration Guide.
Even if a log is not saved, it can still generate a notification.
Use the show command with the log keyword to view the following information about events:
Date – The date and time of the event.
Subject – The server on which the event occurred.
Topic – The topic of the event, which can be useful for setting up notifications. Refer to Setting Up Notifications for information.
Severity – Relative severity of the event.
Level – Relative level of the event.
Source – The name of the component that generated the event. For events that are generated during the execution of a job, the source is the job number.
Role – Role or user name of the user who initiated the event.
Message – Complete text of the event log message.
The n1smconfig script can be used to change the number of days for which logs are kept. Reducing the number of days for which logs are stored reduces the average size of the log files. This task ensures that the log file size does not impair performance. The n1smconfig script is stored at /opt/sun/n1gc/bin. This script can be used to set the number of days for which logs are held.To configure logging, you must specify an event category and a resource category. The following event categories are defined:
Action
Ereport
Lifecycle
List
Problem
Statistic
all
Use the all event category to indicate that all events are to be logged. To understand how other event categories relate to actual events, see the notification topics at create notification in Sun N1 System Manager 1.1 Command Line Reference Manual.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> show log [count count] |
The Events log appears with events listed most recent first. The value for the count attribute is the number of events to show in the output. The default value for count is 500. See show log in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> show log [severity severity] [before date] [after date] |
The output shows only the events that match the specified criteria. The date variable values must be formatted appropriately, for example, 2005-07-20T11:53:04. The possible values for severity are critical, fatal, information, major, minor, other, unknown, and warning. See show log in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> show log log |
The details of the event appear in the output. The log variable is the log ID. See show log in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
N1-ok> show log 72 ID: 72 Date: 2005-03-15T13:35:59-0700 Subject: RemoteCmdPlan Topic: Action.Logical.JobStarted Severity: Information Level: FINE Source: Job Service Role: root Message: RemoteCmdPlan job initiated by root: job ID = 15. |
The N1 System Manager provides the ability to set up email or SNMP notifications when events occur, either within the N1 System Manager itself or when specific events occur on provisionable servers. You can set up customized notification rules for as many different scenarios as you need. Setting up notifications can be done only through the command line.
Use the create notification command to create notification rules based on events that occur or might occur about which you are interested. Use a topic to create a notification.
For setting up notifications using SNMP traps, use the SNMP MIB located at /opt/sun/n1gc/etc/SUN-N1SM-TRAP-MIB.mib. For more information about SNMP MIBs, see Monitoring MIBs.
A notification rule can be used to send a notification of each type of event to a selected destination, using either email or SNMP as the communication medium. For example, you can create a notification rule so that each time a new provisionable server is discovered by the management server, you receive a message on your pager to indicate that the event has happened:
create notification notification destination destination topic topic type type [description description] |
See create notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details of the terms used in this command syntax.
You can configure your SMTP server to use event notification, during the installation and configuration of the N1 System Manager. See Configuring the N1 System Manager System in Sun N1 System Manager 1.1 Installation and Configuration Guide.
Use the show and set commands with the notification option to view and modify notification details. Type help show notification or help set notification at the N1–ok command line for syntax and parameter details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> show notification all |
The notifications for which you have read privileges appear in the output. See show notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> show notification notification |
The specified notification details appear in the output. See show notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
N1-ok> show notification test2 Name: test2 Event Topic: EReport.Physical.ThresholdExceeded Notifier Type: Email Destination: nobody@sun.com State: enabled |
This procedure describes how to change the name, description, or destination of a notification.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> set notification notification name name description description destination destination |
The specified notification attributes are set to the new values specified. See set notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows how to use the set notification command with the name option to change a notification name from test2 to test3.
N1-ok> set notification test2 name test3 |
Use the create or delete command with the notification option to create and delete notifications.
Use the create command with the notification option and the test subcommand to test a notification.
Type help create notification or help delete notification at the N1–ok command line for syntax and parameter details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> create notification notification topic topic type type destination destination |
The notification is created and enabled. See create notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details and valid topics.
Type the following command:
N1-ok> start notification notification test |
A test notification message is sent. See start notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
This example shows how to create a notification to be sent by email if a physical threshold value is exceeded. The notification is called test2. The recipient's email address is nobody@sun.com
N1-ok> create notification test2 destination nobody@sun.com topic EReport.Physical.ThresholdExceeded type email |
The show notification command can be used to verify that the notification has been created.
N1-ok> show notification Name Event Topic Destination State test2 EReport.Physical.ThresholdExceeded nobody@sun.com enabled |
This example shows how to create a notification to be sent by SNMP if a physical threshold value is exceeded. The notification is called test23. The recipient SNMP address is sun.com
N1-ok> create notification test23 destination sun.com topic EReport.Physical.ThresholdExceeded type snmp |
The show notification command can be used to verify that the notification has been created.
N1-ok> show notification Name Event Topic Destination State test23 EReport.Physical.ThresholdExceeded sun.com enabled |
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> delete notification notification |
The notification is deleted.
Notifications are enabled, or started, by default at creation. Use the start command with the notification option to enable a notification that has been disabled. Type help start notification at the N1–ok command line for syntax and parameter details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> start notification notification |
The notification is enabled. See start notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
Log in to the N1 System Manager.
See To Access the N1 System Manager Command Line for details.
Type the following command:
N1-ok> stop notification notification |
The notification is disabled. See stop notification in Sun N1 System Manager 1.1 Command Line Reference Manual for details.