This section describes the most common monitoring problems, their causes, and solutions.
If monitoring is enabled as described in Enabling and Disabling Monitoring in Sun N1 System Manager 1.3 Discovery and Administration Guide, and the status in the output of the show server or show group commands is unknown or unreachable, then the server or server group is not being reached successfully for monitoring. If the status remains unknown or unreachable for less than 30 minutes, it is possible that a transient network problem is occurring. However if the status remains unknown or unreachable for more than 10 minutes, it is possible that monitoring has failed. This could be the result of any of the following issues.
The base monitoring agent on the managed server has stopped running.
The managed server has been powered off or been unplugged.
The managed server IP address or name has been changed independently of N1 System Manager.
If monitoring traps are lost, a particular threshold status may not be refreshed for up to 30 hours, although the overall status should still be refreshed every 10 minutes.
A time stamp is provided in the monitoring data output. The relationship between this time stamp and the current time can also be used to judge if there is an error with the monitoring agent.
It can take 5 to 7 minutes before all OS monitoring data is fully initialized. You may see that CPU idle is at 0.0%, which causes a Failed Critical status with OS usage. This should clear up within 5-7 minutes after adding or upgrading the OS monitoring feature to the managed server. At that point, OS monitoring data should be available for the managed server by using the show server server command. For further information, see To Add the OS Monitoring Feature in Sun N1 System Manager 1.3 Discovery and Administration Guide
Adding the base management feature to a managed server might fail due to stale or obsolete SSH entries for that managed server on the management server known_hosts file. If the add server server-name feature osmonitor agentip command fails and no true security breach has occurred, remove the entry for that managed server from the known_hosts as described in To Update the ssh_known_hosts File. Then, retry the add command.
The ports of some models of manageable servers use the Advanced Lights Out Manager (ALOM) standard. These servers, detailed in Manageable Server Requirements in Sun N1 System Manager 1.3 Site Preparation Guide, use email instead of SNMP traps to send notifications about hardware events to the management server. For information about other events, see Managing Event Log Entries in Sun N1 System Manager 1.3 Discovery and Administration Guide and Setting Up Event Notifications in Sun N1 System Manager 1.3 Discovery and Administration Guide.
If there are no notifications about hardware events from ALOM architecture manageable servers, it could mean that all managed servers are all healthy. If you are using an external mail service instead of the internal secure N1 System Manager mail service, it is possible that the external mail service has not been configured correctly as an email server, or that email configuration has been invalidated due to other issues such as network error or domain name change.
To resolve, either:
Reconfigure the N1 System Manager by running n1smconfig, and choose the secure internal N1 System Manager mail service.
Check and reset your external email server configuration. See Resetting Email Accounts for ALOM-based Managed Servers