This chapter describes the most common monitoring problems, their causes, and the solution for each problem. The following topics are discussed:
Adding the OS monitoring feature to a managed server that has the base management feature installed might fail. The following job output shows the error:
N1-ok> show job 61 Job ID: 61 Date: 2005-08-16T16:14:27-0400 Type: Modify OS Monitoring Support Status: Error (2005-08-16T16:14:38-0400) Command: add server 192.168.2.10 feature osmonitor agentssh root/rootpasswd Owner: root Errors: 1 Warnings: 0 Steps ID Type Start Completion Result 1 Acquire Host 2005-08-16T16:14:27-0400 2005-08-16T16:14:28-0400 Completed 2 Run Command 2005-08-16T16:14:28-0400 2005-08-16T16:14:28-0400 Completed 3 Acquire Host 2005-08-16T16:14:29-0400 2005-08-16T16:14:30-0400 Completed 4 Run Command 2005-08-16T16:14:30-0400 2005-08-16T16:14:36-0400 Error Results Result 1: Server: 192.168.2.10 Status: -3 Message: Repeate attempts for this operation are not allowed. |
This error indicates that SSH credentials have previously been supplied and cannot be altered. To avoid this error, issue the add server feature osmonitor command without agentssh credentials for instructions.
Use the grep command as follows to determine whether the OS monitoring agents were successfully installed.
To verify the Solaris OS feature, type the following commands:
# pkginfo |grep n1sm sparc: SUNWn1smsparcag-1-2 solx86: SUNWn1smx86ag-1-2 # ps -ef |grep -i esd root 23817 1 0 19:57:59 ? 0:01 esd - init agent -dir /var/opt/SUNWsymon -q |
To verify the Linux feature, type the following commands:
# rpm -qa | grep n1sm-linux-agent # ps -ef | grep -i esd root 1940 1 0 Jan28 ? 00:00:14 esd - init agent -dir /var/opt/SUNWsymon -q |
The ports of some models of manageable servers use the Advanced Lights Out Manager (ALOM) standard. These servers, detailed in Manageable Server Requirements in Sun N1 System Manager 1.3 Site Preparation Guide, use email instead of SNMP traps to send notifications about hardware events to the management server. For information about other events, see Managing Event Log Entries in Sun N1 System Manager 1.3 Discovery and Administration Guide and Setting Up Event Notifications in Sun N1 System Manager 1.3 Discovery and Administration Guide.
If no notifications appear about hardware events from ALOM architecture manageable servers, probably all managed servers are healthy. If you are using an external mail service instead of the internal secure N1 System Manager mail service, the external mail service might not have been configured correctly as an email server, or that email configuration might have been invalidated due to other issues such as network error or domain name change.
To resolve, do one of the following:
Reconfigure the N1 System Manager by running n1smconfig, and choose the secure internal N1 System Manager mail service.
Check and reset your external email server configuration. See Resetting Email Accounts for ALOM-based Managed Servers
Installing the base management feature support might fail due to stale SSH entries on the management server. If the add server feature command fails and no true security breach has occurred, note the name and IP address of the managed server. Remove the entry for that server as described in To Update the ssh_known_hosts File.
If monitoring is enabled as described in Enabling and Disabling Monitoring in Sun N1 System Manager 1.3 Discovery and Administration Guide, and the status in the output of the show server or show group commands is unknown or unreachable, then the server or server group is not being reached successfully for monitoring.
If the status remains unknown or unreachable for less than 10 minutes, a transient network problem might be occurring. However if the status remains unknown or unreachable for more than 30 minutes, monitoring might have failed. This failure could be the result of any of the following issues.
The base monitoring agent on the managed server has stopped running.
The managed server has been powered off or been unplugged.
The managed server IP address or name has been changed independently of N1 System Manager.
If monitoring traps are lost, a particular threshold status may not be refreshed for up to 30 hours, although the overall status should still be refreshed every 10 minutes.
A time stamp is provided in the monitoring data output. The relationship between this time stamp and the current time can also be used to judge whether a problem exists with the monitoring agent.
It can take 5 to 7 minutes before all OS monitoring data is fully initialized. You may see that CPU idle is at 0.0 %, which causes a Failed Critical status with OS usage. This should clear up within 5-7 minutes after adding or upgrading the OS monitoring feature to the managed server. At that point, OS monitoring data should be available for the managed server by using the show server server command. For further information, see To Add the OS Monitoring Feature in Sun N1 System Manager 1.3 Discovery and Administration Guide
Adding the base management feature to a managed server might fail due to stale or obsolete SSH entries for that managed server in the known_hosts file on the management server. If the add server server-name feature osmonitor agentip command fails and no true security breach has occurred, remove the entry for that managed server from the known_hosts file as described in To Update the ssh_known_hosts File. Then, retry the add command.
Under certain circumstances, a Sun Blade X8400 server blade will not be listed in its chassis group, but will be listed as a separate managed server with the status unreachable.
This problem can be caused by any one or more of the following situations:
The Sun Blade X8400 server blade has been removed from the Sun Blade X8000 chassis
The Sun Blade X8400 server blade SP is not accessible by N1 System Manager due to SP problems, IP address reassignment, or other management network problems
To resolve this problem:
Ensure the IP address assigned to the Sun Blade X8400 server blade is correct.
Ensure that the Sun Blade X8400 server blades can be accessed using ssh.
Physically check the blade, and if necessary, power cycle the blade SP. If the blade SP has hung, you will need to go to the blade to power cycle the SP.
After you have verified that the Sun Blade X8400 server blade is accessible using N1 System Manager and standard access protocols, refresh the server blade using either of the following two methods:
Type set server server id refresh in the N1 System Manager browser interface command line prompt where server id is either the Sun Blade X8400 server blade IP address or the name you have assigned to the Sun Blade X8400 server blade.
Type n1sh set server server id refresh in a root login terminal window on the management server where server id is either the Sun Blade X8400 server blade IP address or the name you have assigned to the Sun Blade X8400 server blade.