Monitoring in the Sun N1 System Manager software enables you to track changes to specific attributes in specific managed objects. Managed objects include server hardware elements, operating systems, file systems, and networks. Attributes are the monitored elements, about which data is obtained and delivered by the N1 System Manager software. Examples of attributes are the average number of queued processes and the percentage of used memory. A list of attributes is provided in Hardware Sensor Attributes and in Table 5–2.
Attributes are associated with one of three main areas:
Hardware health attributes. For information about hardware health monitoring, see Hardware Health Monitoring.
OS resource utilization attributes. For information about OS resource utilization monitoring, see OS Resource Utilization Monitoring.
Network connectivity, or reachability. For information about network reachability monitoring, see Network Reachability Monitoring.
For a server or a group of servers, hardware health and operating system utilization and network connectivity are all monitored by the management server. All comparisons and verifications for monitoring are performed by the N1 System Manager. Provisionable servers are used only to access data.
An SNMP agent that is used for data retrieval is provided in the N1 System Manager software. If the management server is running the N1 System Manager on the Solaris OS, this agent is based on the Sun Management Center 3.5 software SNMP agent. If the management server is running the N1 System Manager on Linux, this agent is based on the Sun Management Center 3.6 Linux SNMP agent. The agent is deployed when operating systems are deployed on servers that are managed by the N1 System Manager software.
On Linux platforms, the N1 System Manager software only monitors ext3 file systems. Other types of file systems are not monitored for Linux platforms.
Monitoring is connected with the broadcasting of the events for each monitored server or group of servers. Events are generated when certain conditions related to attributes occur. For information about events and when they occur, see Managing Event Log Entries. There are no log files related to monitoring. Instead of log files, monitoring data is stored as events in the N1 System Manager database.
If monitoring is enabled for a server, each event causes a notification to be emitted from the N1 System Manager for that event. If monitoring is disabled for a server, monitoring events are not generated for that server. Lifecycle events continue to be generated, even with monitoring disabled. Lifecycle events include server discovery, server change or deletion, or server group creation. If you have requested notification of this type of event, you can still receive notifications even with monitoring disabled.
The hardware health of discovered servers is monitored. Sensors provided in the hardware are used to monitor temperature, voltage, and fan speed. For more information about associated hardware, see the Sun N1 System Manager Connection Information in Sun N1 System Manager 1.1 Site Preparation Guide.
Sensor data is retrieved from the service processor for SPARC devices through the Advanced Lights Out Manager (ALOM) interface. Sensor data is retrieved from IPMI for x64 servers.
General management interface data for Sun Fire V20z and Sun Fire V40z machines is obtained through the command line. General management interface data for Sun Fire x4100 and Sun Fire x4200 servers is obtained through IPMI. Data can be retrieved dynamically from the command line.
The following characteristics of server hardware can be monitored:
CPU temperature
Ambient temperature
Fan speed in revolutions per minute
Voltages
LEDs
A detailed list of these sensors is provided in Hardware Sensor Attributes.
You can view filtered hardware health monitoring information for all servers by using the show server command:
N1-ok> show server health health |
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details of possible values of the health filters.
OS resource utilization is monitored by the N1 System Manager. As part of the add server feature command, with the agentip keyword, you provide credentials to access the monitored server's operating system through ssh with the agentssh keyword. See To Add the OS Monitoring Feature for additional details. This procedure is important for OS resource utilization monitoring but not for monitoring hardware health or network reachability.
Access to the operating system by this mechanism is required primarily for the Remote Command Execution feature. Access to the operating system by this mechanism is how the management features are used to retrieve data for OS resource utilization monitoring. Platform OS interface data is obtained through ssh and SNMP; all attribute data is retrieved from the server's operating system by using ssh and SNMP. Statistics related to the central processor unit (CPU) are provided, as is data related to memory, swap usage, and file systems. For the purposes of monitoring, system load data, memory usage, and swap usage data can be broken down as follows:
System usage, including system idle times
System load, expressed as the average number of queued processes over 1, 5, and 15 minutes
Memory usage and memory free statistics, in megabytes and as percentages
Physical load statistics
Swap space used and space available, in megabytes and as percentages
File system used and space available, as percentages
A list of these attributes is provided in Hardware Sensor Attributes.
You can filter OS resource utilization monitoring information for all servers by using the show server command:
N1-ok> show server utilization utilization |
N1-ok> show server utilization unreachable |
The health of an OS resource can be shown as unknown if the server is reachable but the monitoring agent cannot be contacted on SNMP port 161.
The health of an OS resource can be shown as unreachable if the server is unreachable due to, for example, being in standby mode.
See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
The monitoring of OS resource utilization attributes enables you to modify the default threshold values for all servers being managed by the N1 System Manager, through the creation and editing of a configuration file. See Changing Threshold Values With the Monitoring Configuration File for details.
The monitoring of OS resource utilization attributes also enables you to set specific thresholds for individual monitored servers, or for groups of monitored servers, at the command line by using the set command. See Setting Threshold Values for details.
If you are not interested in the values of some attributes, you can disable the threshold severity for monitoring of those attributes. This action prevents annoyance alarms. Example 5–4 shows you how to accomplish this disabling action.
All management interfaces of provisionable servers and all platform interfaces are monitored by default by the N1 System Manager. Platform interfaces include the service processor's management interface, such as eth0, and data network interfaces, such as eth1 or eth2.
Reachability is verified for Linux servers and servers running the Solaris OS by using an ICMP ping to the interface IP address. For further information, see Discovery of Servers in the Factory Default State in Sun N1 System Manager 1.1 Installation and Configuration Guide.
The reachability of all network interfaces is verified at regular intervals. These polling intervals are configurable. For information about configuring polling intervals, see Setting Polling Intervals. The monitoring of network reachability is based on the IP address. If any monitored IP address is unreachable, an event is generated.
You can filter information for all servers by using the show server command with the appropriate parameters to view monitoring information. See show server in Sun N1 System Manager 1.1 Command Line Reference Manual for details.
It is important to distinguish between the unreachable and unknown states for provisionable servers.
N1-ok> show server health unreachable |
This command lists all provisionable servers that are unreachable. Any provisionable server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its hardware health status. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting hardware health status information. The server could be in standby mode.
N1-ok> show server health unknown |
This command lists all provisionable servers that are not returning any information about hardware health status. The ping command may be successful but servers returned in the output of this command are not returning any hardware health information. The monitoring agent could not be contacted on port 161.
N1-ok> show server power unreachable |
This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its power status. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting power status information. The server could be in standby mode.
N1-ok> show server power unknown |
This command lists all provisionable servers that are not returning any information about power status. The ping command may be successful but servers returned in the output of this command are not returning any power status information. The monitoring agent could not be contacted on port 161.
N1-ok> show server utilization unreachable |
This command lists all provisionable servers that are unreachable. Any server returned in the output of this command is unreachable due to a network problem: the server cannot be contacted about its OS resource utilization. The ping command to the server is unsuccessful. This does not necessarily mean that the server is not transmitting OS resource utilization information. The server could be in standby mode.
N1-ok> show server utilization unknown |
This command lists all provisionable servers that are not returning any information about OS resource utilization. The ping command may be successful but servers returned in the output of this command are not returning any OS resource utilization information. The monitoring agent could not be contacted on port 161.