Sun N1 System Manager 1.3.1 Troubleshooting Guide

Chapter 1 Guidelines and Considerations

This chapter provides troubleshooting guidelines and information that you should consider about N1 System Manager that can assist you in troubleshooting.

In this book, the term manageable server is used for a server that is accessible by the N1 System Manager network, but has not yet been discovered by the N1 System Manager. A managed server is a server that has been successfully discovered by the N1 System Manager and is subsequently managed by the N1 System Manager.

Troubleshooting Guidelines and Logs

This section provides generate trouble shooting guidelines.


Tip –

Check this manual's index for specific topics and problems.


Installation

Examine the installation log /var/tmp/installer.log.latest to determine the cause of the installation failure. Resolutions for the majority of installation problems are provided in this guide.

Configuration

The N1 System Manager configuration utility n1smconfig does not generate logs. When n1smconfig is run, the current N1 System Manager configuration is displayed. Examine the displayed configuration and ensure that your N1 System Manager management network, provisioning network, and data network are assigned to the correct management server Ethernet ports. Also ensure that all other configuration settings are correct. If configuration needs to be corrected, reconfigure the N1 System Manager as described in Configuring the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide.

Runtime

To determine the cause of an error or a problem, first examine the following items:

  • N1 System Manager event logs: Use the command show log log to display the N1 System Manager event logs. For further information, see show log in Sun N1 System Manager 1.3 Command Line Reference Manual..

  • Job details: Use the command show job to display the N1 System Manager jobs. For further information, see show job in Sun N1 System Manager 1.3 Command Line Reference Manual.

  • Management server system logs: Operating system log locations are dependent on the operating system. Refer to your operating system documentation for the location of the system logs. For example, Solaris OS system logs are stored in the directory /var/adm/messages, and Linux system logs are stored in the directory /var/log.

  • The Windows RIS server debug log C:\WINDOWS\Debug\binlsvc.log contains information that might be useful when debugging Windows deployment issues.

Considerations and Constraints

This section provides information concerning N1 System Manager operational processes that can assist you in troubleshooting. The following topics are discussed:

DHCP Service Conflict With N1 Grid Service Provisioning System

If you are using both the N1 System Manager and the Sun N1TM Service Provisioning System with the OS provisioning plug-in, you must choose which product you want to use for OS deployment for a given target set of servers. Based on the product chosen for OS deployment, you must ensure that the DHCP service supplied by the other product is manually shut down (as the root user) using operating system commands. Failure to shut the service down might result in unreliable behavior of OS deployment operations as well as potential network related problems.

Discovery and Routers

Discovery of manageable servers works across routers if the network services used by the discovery process are not blocked by a firewall. Network services used by the discovery process can include SSH, IPMI, Telnet and SNMP.

For information about which ports and protocols can be configured, see Appendix A, Sun N1 System Manager Protocol, Ports, and Features Reference, in Sun N1 System Manager 1.3 Installation and Configuration Guide.

Hot-Plugging Sun Blade 8000 Chassis Modules

Because the Sun Blade 8000 chassis systems support hot-pluggable I/O modules, the network boot device list reported by N1 System Manager might be stale if a Sun Blade X8400 Server Blade has not been reset for a very long time. If you select a blade for provisioning using a stale network boot device list and you specify a logical interface using the load server command or use the load server command defaults, then the interface might not map to the expected physical port. Provisioning will fail if the interface you specify does not map to the correct physical port.

Use either of the two following methods to ensure mapping to proper physical port.

Identifying Hardware and OS Threshold Breaches

If the value of a monitored hardware health attribute, or OS resource utilization attribute breaches a threshold value, an event log is immediately created. The event log indicates that the threshold has been breached. The event log is available from the browser interface. A symbol appears among the monitored data table in the browser interface to indicate that a threshold has been breached, as shown in the graphic at To Retrieve Threshold Values for a Server in Sun N1 System Manager 1.3 Discovery and Administration Guide.

Alternatively, use the show log command to verify that the event log has been generated:


N1-ok> show log
Id            Date                       Severity    Subject     Message
.
. 
10            2005-11-22T01:45:02-0800   WARNING     Sun_V20z_XG041105786
A critical high threshold was violated for server Sun_V20z_XG041105786: Attribute cpu0.vtt-s3 Value 1.32

13            2005-11-22T01:50:08-0800   WARNING     Sun_V20z_XG041105786
A normal low  threshold was violated for server Sun_V20z_XG041105786: Attribute cpu0.vtt-s3 Value 1.2

If monitoring traps are lost, a particular threshold status may not be refreshed for up to 30 hours, although the overall status can still be refreshed every 10 minutes.

N1 System Manager Cannot Be Used to Manage System Management Servers

Do not use the N1 System Manager to manage servers that have system management software installed on them such as Sun Management Center, Sun Control Station, and any other system management applications including the N1 System Manager.

Regenerating Security Keys

The N1 System Manager uses strong encryption techniques and common agent container security keys to ensure secure communication between the management server and each managed server.

The security keys used by the N1 System Manager must be identical across all servers. Under normal operation, the security keys used by the keys can be left in their default configuration. You should regenerate the security keys if any of the following cases occur:

In each of the above cases, the security keys must be regenerated, and the N1 System Manager management daemon restarted, as described in To Regenerate Common Agent Container Security Keys.