Sun N1 System Manager 1.1 Administration Guide

Previous: Chapter 5 Monitoring Your Servers

Chapter 6 Troubleshooting

This chapter provides troubleshooting information on the following topics:

Security

This section provides security-based troubleshooting information.

The Sun N1 System Manager Server uses strong encryption techniques to ensure secure communication between the management server and each managed server.

The keys used by the Sun N1 System Manager are stored under the /etc/opt/sun/cacao/security directory on each server where the servers are running Linux. These keys should be identical across all servers. For servers running the Solaris OS, these keys are stored under the /etc/opt/SUNWcacao/security directory.

Under normal operation, these keys can be left in their default configuration. You might have to regenerate security keys. For example, if there is a risk that the root password of the management server has been exposed or compromised, regenerate the security keys.

How to Regenerate Common Agent Container Security Keys

Steps

On the management server as root, stop the common agent container management daemon.

If the management server is running Linux:
# /opt/sun/cacao/bin/cacaoadm stop
If the management server is running the Solaris OS:
# /opt/SUNWcacao/bin/cacaoadm stop

Regenerate security keys using the create-keys subcommand.

If the management server is running Linux:
# /opt/sun/cacao/bin/cacaoadm create-keys --force
If the management server is running the Solaris OS:
# /opt/SUNWcacao/bin/cacaoadm create-keys --force

As root on the management server, restart the common agent container management daemon.

If the management server is running Linux:
# /opt/sun/cacao/bin/cacaoadm start
If the management server is running the Solaris OS:
# /opt/SUNWcacao/bin/cacaoadm start

General Security Considerations

The following list provides general security considerations that you should be aware of when you are using the N1 System Manager:

The Java^TM Web Console that is used to launch the N1 System Manager's browser interface uses self-signed certificates. These certificates should be treated with the appropriate level of trust by clients and users.
The terminal emulator applet that is used by the browser interface for the serial console feature does not provide a certificate-based authentication of the applet. The applet also requires that you enable SSHv1 for the management server. For certificate-based authentication or to avoid enabling SSHv1, use the serial console feature by running the connect command from the n1sh shell.
SSH fingerprints that are used to connect from the management server to the provisioning network interfaces on the provisionable servers are automatically acknowledged by the N1 System Manager software. This automation might make the provisionable servers vulnerable to “man-in-the middle” attacks.
The Web Console (Sun ILOM Web GUI) autologin feature for Sun Fire X4100 and Sun Fire X4200 servers exposes the server's service processor credentials to users who can view the web page source for the Login page. To avoid this security issue, disable the autologin feature by running the n1smconfig utility. See Configuring the N1 System Manager System in Sun N1 System Manager 1.1 Installation and Configuration Guide for details.

Troubleshooting OS Distributions

This section describes scenarios that cause OS deployment to fail and explains how to correct failures.

Distribution Copy Failures

If the creation of an OS distribution fails with a copying files error, check the size of the ISO image and ensure that it is not corrupted. You might see output similar to the following in the job details:

bash-3.00# /opt/sun/n1gc/bin/n1sh show job 25
Job ID:   25
Date:     2005-07-20T14:28:43-0600
Type:     Create OS Distribution
Status:   Error (2005-07-20T14:29:08-0600)
Owner:    root
Errors:   1
Warnings: 0

Steps
ID     Type             Start
Completion                 Result
1      Acquire Host     2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
2      Run Command      2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
3      Acquire Host     2005-07-20T14:28:46-0600
2005-07-20T14:28:46-0600   Completed
4      Run Command      2005-07-20T14:28:46-0600
2005-07-20T14:29:06-0600   Error 1

Errors
Error 1:
Description: INFO   : Mounting /images/rhel-3-U4-i386-es-disc1.iso at
/mnt/loop23308
INFO   : Version is 3ES, disc is 1
INFO   : Version is 3ES, disc is 1
INFO   : type redhat ver: 3ES
cp: /var/opt/SUNWscs/data/allstart/image/3ES-bootdisk.img: Bad address
INFO   : Could not copy PXE file bootdisk.img
INFO   : umount_exit: mnt is: /mnt/loop23308
INFO   : ERROR: Could not add floppy to the Distro

Results
Result 1:
Server:   -
Status:   -1
Message:  Creating OS rh30u4-es failed.

Patching Solaris 9 Distributions

The inability to deploy Solaris 9 OS distributions to servers from a Linux management server is usually due to a problem with NFS mounts. To solve this problem, you need to apply a patch to the mini-root of the Solaris 9 OS distribution. This section provides instructions for applying the required patches. The instructions differ according to the management and patch server configuration scenarios in the following table.

Table 6–1 Task Map for Patching a Solaris 9 Distribution


Management Server	Patch Server	Task
Red Hat 3.0 u2	Solaris 9 OS on x86 platform	To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on x86 Patch Server
Red Hat 3.0 u2	Solaris 9 OS on SPARC platform	To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on SPARC Patch Server

Using a Provisionable Server to Patch OS Distributions

When you are using a patch server to perform the following tasks, you need to have root access to both the management server and the provisionable server at once. For some tasks, you need to first patch the provisionable server, then mount the management server and patch the distribution.

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on x86 Patch Server

This procedure describes how to patch a Solaris 9 OS distribution in the N1 System Manager. The steps in this procedure need to be performed on both the patch server and the management server. Consider opening two terminal windows to complete the steps. The following steps first guide you through patching the patch server and then provide steps for patching the distribution.

Before You Begin

Create a Solaris 9 OS distribution on the management server. See To Copy an OS Distribution From CDs or a DVD or To Copy an OS Distribution From ISO Files. Type show os os-name at the command line to view the ID of the OS distribution. This number is used in place of DISTRO_ID in the instructions.
Install the Solaris 9 OS on x86 platform software on a machine that is not the management server.
Create a /patch directory on the Solaris 9 x86 patch server.
For a Solaris OS on x86 distribution, download and unzip the following patches into the /patch directory on the Solaris 9 OS on x86 patch server: 117172-17 and 117468-02. You can access these patches from http://sunsolve.sun.com.
For a Solaris OS on SPARC distribution, download and unzip the following patches into the /patch directory on the Solaris 9 OS on x86 patch server: 117171-17, 117175-02, and 113318-20. You can also access these patches from http://sunsolve.sun.com.

Steps

Patch the Solaris 9 OS on x86 patch server.
1. Log in as root.
  % su password:password
  The root prompt appears.
2. Reboot the Solaris 9 patch server to single-user mode.
  # reboot -- -s
3. In single-user mode, change to the patch directory.
  # cd /patch
4. Install the patches.
  # patchadd -M . 117172-17 # patchadd -M . 117468-02
  Tip –
  Pressing Control+D returns you to multiuser mode.

Prepare to patch the distribution on the management server.
1. Log in to the management server as root.
  % su password:password
  The root prompt appears.
2. Edit the /etc/exports file.
  # vi /etc/exports
3. Change /js *(ro,no_root_squash) to /js *(rw,no_root_squash).
4. Save and close the /etc/exports file.
5. Restart NFS.
  # /etc/init.d/nfs restart

Patch the distribution that you copied to the management server.
1. Log in to the Solaris 9 patch server as root.
  % su password:password
  The root prompt appears.
2. Mount the management server.
  # mount -o rw management-server-IP:/js/DISTRO_ID /mnt
3. Install the patches by performing one of the following actions:
  - If you are patching an x86 distribution, type the following commands:
    # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117172-17 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117468-02
  - If you are patching a SPARC distribution, type the following commands:
    # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117171-17 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117175-02 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 113318-20
    Note –
    You will receive a partial error for the first patch installation. Ignore this error.
4. Unmount the management server.
  # unmount /mnt

Restart NFS on the management server.
1. Edit the /etc/exports file.
  # vi /etc/exports
2. Change /js *(rw,no_root_squash) to /js *(ro,no_root_squash).
3. Restart NFS.
  # /etc/init.d/nfs restart
  NFS is restarted.
  
  The Solaris 9 OS on SPARC distribution is ready for deployment to target servers.

Fix the Solaris 9 OS on x86 distribution.
1. Change to /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris.
  # cd /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris
2. Re-create the bootenv.rc link.
  # ln -s ../../tmp/root/boot/solaris/bootenv.rc .
  The Solaris 9 OS on x86 distribution is ready for deployment to target servers.

Troubleshooting

If you want to patch another distribution, you might have to delete the /patch/117172-17 directory and re-create it using the unzip 117172-17.zip command. When the first distribution is patched, the patchadd command makes a change to the directory that causes problems with the next patchadd command execution.

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on SPARC Patch Server

This procedure describes how to patch a Solaris 9 OS distribution in the N1 System Manager. The steps in this procedure need to be performed on the provisionable server and the management server. Consider opening two terminal windows to complete the steps. The following steps first guide you through patching the provisionable server and then provide steps for patching the distribution.

Before You Begin

Create a Solaris 9 OS distribution on the management server. See To Copy an OS Distribution From CDs or a DVD or To Copy an OS Distribution From ISO Files. Type show os os-name at the command line to view the ID of the OS distribution. This number is used in place of DISTRO_ID in the instructions.
Install the Solaris 9 OS on SPARC software on a machine that is not the management server. See To Load an OS Profile on a Server or a Server Group.
Create a /patch directory on the Solaris 9 SPARC patch server.
For a Solaris OS on x86 distribution, download and unzip the following patches into the /patch directory on the Solaris 9 OS on x86 patch server: 117172-17 and 117468-02. You can access these patches from http://sunsolve.sun.com.
For a Solaris OS on SPARC distribution, download and unzip the following patches into the /patch directory on the Solaris 9 OS on x86 patch server: 117171-17, 117175-02, and 113318-20. You can access these patches from http://sunsolve.sun.com.

Steps

Set up and patch the Solaris 9 OS on SPARC machine.
1. Log in to the Solaris 9 machine as root.
  % su password:password
2. Reboot the Solaris 9 machine to single-user mode.
  # reboot -- -s
3. In single-user mode, change to the patch directory.
  # cd /patch
4. Install the patches.
  # patchadd -M . 117171-17 # patchadd -M . 117175-02 # patchadd -M . 113318–20
  Tip –
  Pressing Control+D returns you to multiuser mode.

Patch the distribution that you copied to the management server.
1. Log in to the Solaris 9 machine as root.
  % su password:password
2. Mount the management server.
  # mount -o rw management-server-IP:/js/DISTRO_ID /mnt
3. Install the patches by performing one of the following actions:
  - If you are patching a Solaris OS on x86 software distribution, type the following commands:
    # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117172-17 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117468-02
  - If you are patching a Solaris OS on SPARC software distribution, type the following commands:
    # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117171-17 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117175-02 # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 113318-20
    Note –
    You will receive a partial error for the first patch installation. Ignore this error.
4. Unmount the management server.
  # unmount /mnt

Restart NFS on the management server.
1. Edit the /etc/exports file.
  # vi /etc/exports
2. Change /js *(rw,no_root_squash) to /js *(ro,no_root_squash).
3. Restart NFS.
  # /etc/init.d/nfs restart
  NFS is restarted.
  
  The Solaris 9 OS on SPARC distribution is ready for deployment to target servers.

Fix the Solaris 9 OS on x86 distribution.
1. Change to /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris.
  # cd /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris
2. Re-create the bootenv.rc link.
  # ln -s ../../tmp/root/boot/solaris/bootenv.rc .
  The Solaris 9 OS on x86 distribution is ready for deployment to target servers.

Troubleshooting

If you want to patch another distribution you might have to delete the /patch/117172-17 directory and re-create it using the unzip 117172-17.zip command. When the first distribution is patched, the patchadd command makes a change to the directory that causes problems with the next patchadd command execution.

OS Profile Deployment Failures

OS profile deployments might fail if any of the following conditions occur:

Partitions are not modified to suit a Sun Fire V40z or SPARC V440 server. See To Modify the Default Solaris OS Profile for a Sun Fire V40z or a SPARC v440 Server.
Scripts are not modified to install the driver needed to recognize the Ethernet interface on a Sun Fire V20z server. See To Modify a Solaris 9 OS Profile for a Sun Fire V20z Server With a K2.0 Motherboard.
DHCP is not correctly configured. See Solaris Deployment Job Times Out or Stops.
OS profile installs only the Solaris Core System Support distribution group. See Solaris OS Profile Installation Fails.
The target server cannot access DHCP information or mount distribution directories. See Invalid Management Server Netmask.
The management server cannot access files during a Load OS operation. See Restarting NFS to Resolve Boot Failed Errors.
The Linux deployment stops. See Linux Deployment Stops.

Use the following graphic as a guide to troubleshooting best practices. The graphic describes steps to take when you initiate provisioning operations. Taking these steps will help you troubleshoot deployments with greater efficiency.

This graphic illustrates
troubleshooting steps to take when initiating
deployments.

To Modify the Default Solaris OS Profile for a Sun Fire V40z or a SPARC v440 Server

This procedure describes how to modify the Solaris OS profile that is created by default. The following modification is required for successful installation of the default Solaris OS profile on a Sun Fire V40z or a SPARC v440 server.

Steps

Clone the default profile.

N1-ok> create osprofile sol10v40z clone sol10

Remove the root partition.

N1-ok> remove osprofile sol10v40z partition /

Remove the swap partition.

N1-ok> remove osprofile sol10v40z partition swap

Add new root parameters.

N1-ok> add osprofile sol10v40z partition / device c1t0d0s0 sizeoption free
 type ufs

Add new swap parameters.

N1-ok> add osprofile sol10v40z partition swap device c1t0d0s1 size 2000
 type swap sizeoption fixed

To Modify a Solaris 9 OS Profile for a Sun Fire V20z Server With a K2.0 Motherboard

This procedure describes how to create and add a script to your Solaris OS profile. This script installs the Broadcom 5704 NIC driver needed for Solaris 9 x86 to recognize the NIC Ethernet interface on a Sun Fire V20z server with a K2.0 motherboard. Earlier versions of the Sun Fire V20z server use the K1.0 motherboard. Newer versions use the K2.0 motherboard.

Note –

This patch is needed for K2.0 motherboards but can also be used on K1.0 motherboards without negative consequences.

Steps

Type the following command:
% /opt/sun/n1gc/bin/n1sh show os
The list of available OS distributions appears. Note the name of the Solaris 9 distribution.

Run the as_distro.pl script, and view the output.
# /scs/sbin/as_distro.pl -l

Note down the DISTRO_ID for the Solaris 9 distribution.

You use this ID in the next step.

Type the following command:
# mkdir /js/DISTRO_ID/patch
A patch directory is created for the Solaris 9 distribution.

Download the 116666-04 patch from http://sunsolve.sun.com to the /js/DISTRO_ID/patch directory.

Change to the /js/DISTRO_ID/patch directory.
# cd /js/DISTRO_ID/patch

Unzip the patch file.
# unzip 116666-04.zip

Type the following command:
# mkdir /js/scripts

In the /js/scripts directory, create a script called patch_sol9_k2.sh that includes the following three lines:
#!/bin/sh echo "Adding patch for bge devices." patchadd -R /a -M /cdrom/patch 116666-04
Note –
Ensure the script is executable. You can use the chmod 775 patch_sol9_k2.sh command.

Add the script to the Solaris 9 OS profile.

N1-ok> add osprofile osprofile script /js/scripts/patch_sol9_k2.sh type post

Example 6–1 Adding a Script to a Solaris OS Profile

This example shows how to add a script to an OS profile. The type attribute specifies that the script is to be run after the installation.

N1-ok> add osprofile sol9K2 script /js/scripts/patch_sol9_k2.sh 
type post

Next Steps

To load the modified Solaris OS profile, see To Load an OS Profile on a Server or a Server Group.

Solaris Deployment Job Times Out or Stops

If you attempt to load a Solaris OS profile and the OS Deploy job times out or stops, check the output in the job details to ensure that the target server completed a PXE boot. For example:

PXE-M0F: Exiting Broadcom PXE ROM.
      Broadcom UNDI PXE-2.1 v7.5.14
     Copyright (C) 2000-2004 Broadcom Corporation
     Copyright (C) 1997-2000 Intel Corporation
     All rights reserved.      
CLIENT MAC ADDR: 00 09 3D 00 A5 FC  GUID: 68D3BE2E 6D5D 11D8 BA9A 0060B0B36963
     DHCP.

If the PXE boot fails, the /etc/dhcpd.conf file on the management server might have not been set up correctly by the N1 System Manager.

Note –

The best diagnostic tool is to open a console window on the target machine and then run the deployment. See To Open a Server's Serial Console.

If you suspect that the /etc/dhcpd.conf file was configured incorrectly, complete the following procedure to modify the configuration.

To Modify the Network Interface Configuration

Steps

Inspect the dhcpd.conf file for errors.
# vi /etc/dhcpd.conf

If errors exist that need to be corrected, run the following command:
# /usr/bin/n1smconfig
The n1smconfig utility appears.

Modify the provisioning network interface configuration.

See Configuring the N1 System Manager System in Sun N1 System Manager 1.1 Installation and Configuration Guide for detailed instructions.

Load the OS profile on the target server.

Solaris OS Profile Installation Fails

OS profiles that install only the Core System Support distribution group do not load successfully. Specify “Entire Distribution plus OEM Support” as the value for the distributiongroup parameter. Doing so configures a profile that will install the needed version of SSH and other tools that are required for servers to be managed by the N1 System Manager.

Invalid Management Server Netmask

If the target server cannot access DHCP information or mount the distribution directories on the management server during a Solaris 10 deployment, you might have network problems caused by an invalid netmask. The console output might be similar to the following:

Booting kernel/unix...
  krtld: Unused kernel arguments: `install'.
  SunOS? Release 5.10 Version Generic 32-bit
  Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  Unsupported Tavor FW version: expected: 0003.0001.0000, actual: 0002.0000.0000
  NOTICE: tavor0: driver attached (for maintenance mode only)
  Configuring devices.
  Using DHCP for network configuration information.
  Beginning system identification...
  Searching for configuration file(s)...
  Using sysid configuration file /sysidcfg
  Search complete.
  Discovering additional network configuration...
  Completing system identification...
  Starting remote procedure call (RPC) services: done.
  System identification complete.
  Starting Solaris installation program...
  Searching for JumpStart directory...
  /sbin/dhcpinfo: primary interface requested but no primary interface is set
  not found
  Warning: Could not find matching rule in rules.ok
  Press the return key for an interactive Solaris install program...

To fix the problem, set the management server netmask value to 255.255.255.0. See To Configure the Sun N1 System Manager System in Sun N1 System Manager 1.1 Installation and Configuration Guide.

Linux Deployment Stops

If you are deploying a Linux OS and the deployment stops, check the console of the target server to see if the installer is in interactive mode. If the installer is in interactive mode, the deployment timed out because of a delay in the transmission of data from the management server to the target server. This delay usually occurs because the switch or switches connecting the two machines has spanning tree enabled. Either turn off spanning tree on the switch or disable spanning tree for the ports that are connected to the management server and the target server.

If spanning tree is already disabled and OS deployment stops, you may have a problem with your network.

Restarting NFS to Resolve `Boot Failed` Errors

Error: boot: lookup /js/4/Solaris_10/Tools/Boot failed boot: cannot open kernel/sparcv9/unix

Solution:

The message differs depending on the OS that is being deployed. If the management server cannot access files during a Load OS operation, it might be caused by a network problem. To possibly correct this problem, try restarting NFS.

On a Solaris system, type the following:

# /etc/init.d/nfs.server stop
# /etc/init.d/nfs.server start

On a Linux system, type the following:

# /etc/init.d/nfs restart

Resolving `wget` Command Failures Related to OS Monitoring

You must manually install the wget information if the add server feature osmonitor agentip command fails with the following error: Internal error: wget command failed: /usr/bin/wget —0 /tmp/hostinstall.pl http://xx.xx.xx.xx/pub/hostinstall.pl, where xx.xx.xx.xx is the IP address of the machine in question.

For a Solaris system, install the SUNWwgetu and SUNWwgetr packages in /usr/sfw/bin/wget.
For a Linux system, install all RPMs that begin with wget- in /usr/bin/wget.

Adding the feature might also fail due to stale SSH entries on the management server. If the add server server-name feature osmonitor agentip command fails and no true security breach has occurred, remove the /root/.ssh/known_hosts file or the specific entry in the file that corresponds to the provisionable server. Then, retry the add command.

Additionally, adding the OS monitoring feature to a server that has the base management feature might fail. The following job output shows the error: Repeat attempts for this operation are not allowed. This error indicates that SSH credentials have previously been supplied and cannot be altered. To avoid this error, issue the add server feature osmonitor command without agentssh credentials. See To Add the OS Monitoring Feature for instructions.

N1-ok> show job 61
Job ID: 61
Date: 2005-08-16T16:14:27-0400
Type: Modify OS Monitoring Support
Status: Error (2005-08-16T16:14:38-0400)
Owner: root
Errors: 1
Warnings: 0

Steps
ID Type Start Completion Result
1 Acquire Host 2005-08-16T16:14:27-0400 2005-08-16T16:14:28-0400 Completed
2 Run Command 2005-08-16T16:14:28-0400 2005-08-16T16:14:28-0400 Completed
3 Acquire Host 2005-08-16T16:14:29-0400 2005-08-16T16:14:30-0400 Completed
4 Run Command 2005-08-16T16:14:30-0400 2005-08-16T16:14:36-0400 Error

Results
Result 1:
Server: 192.168.2.10
Status: -3
Message: Repeate attempts for this operation are not allowed.

OS Update Problems

This section describes possible solutions for the following troubleshooting scenarios:

OS Update Creation Failures

The name that is specified when you create a new OS update must be unique. The OS update to be created also needs to be unique. That is, in addition to the uniqueness of the file name for each OS update, the combination of the internal package name, version, release, and file name also needs to be unique.

For example, if test1.rpm is the source for an RPM named test1, another OS update called test2 cannot have the same file name as test1.rpm. To avoid additional naming issues, do not name an OS update with the same name as the internal package name for any other existing packages on the provisionable server.

You can specify an adminfile value when you create an OS update. For the Solaris OS update packages, a default admin file is located at /opt/sun/n1gc/etc/admin.

mail=
   instance=unique
   partial=nocheck
   runlevel=nocheck
   idepend=nocheck
   rdepend=nocheck
   space=quit
   setuid=nocheck
   conflict=nocheck
   action=nocheck
   basedir=default
   authentication=nocheck

The default admin file setting used for Solaris package deployments in the N1 System Manager is instance=unique. If you want to report errors for duplicated packages, change the admin file setting to instance=quit. This change causes an error to appear in the Load Update job results if a duplicate package is detected.

See the admin(4) man page for detailed information about admin file parameter settings. Type man -s4 admin as root user on a Solaris system to view the man page.

For Solaris packages, a response file might also be needed. For instructions on how to specify an admin file and a response file when you create an OS update, see To Copy an OS Update.

OS Update Deployment Failures

This section describes troubleshooting scenarios and possible solutions for the following categories of failures:

Failures that occur before the job is submitted
Load Update job failures
Unload Update job failures
Stop Job failures for Load Update

In the following unload command, the update could be either the update name in the list that appears when you type show update all list, or the update could be the actual package name on the target server.

N1-ok> load server server update update

Always check the package is targeted to the correct architecture. The N1 System Manager does not distinguish 32-bit from 64-bit for the Solaris (x86 or SPARC) OS, so the package or patch might not install successfully if it is installed on an incompatible OS. If the package or patch does install successfully, but performance decreases, check that the architecture of the patch matches the architecture of the OS.

The following are common failures that can occur before the job is submitted:

Target server is not initialized

Solution:

Check that the add server feature osmonitor command was issued and that it succeeded.

Another running job on the target server

Solution:

Only one job is allowed at a time on a server. Try again after the job completes.

Update is incompatible with operating system on target server

Solution:

Check that the OS type of the target server matches one of the update OS types. Type show update update-name at the N1–ok> prompt to view the OS type for the update.

Target server is not in a good state or is powered off

Solution:

Check that the target server is up and running. Type show server server-name at the N1–ok> prompt to view the server status. Type reset server server-name force to force a reboot.

The following are possible causes for Load Update job failures:

Sometimes, Load Update jobs fail because either the same package already exists or because a higher version of the package exists. Ensure that the package does not already exist on the target server if the job fails.

error: Failed dependencies:

A prerequisite package and should be installed.

Solution:

Use an RPM tool to address and resolve Linux RPM dependencies. For a Solaris system, configure the idepend= parameter in the admin file.

Preinstall or postinstall scripts failure: Non-zero status

pkgadd: ERROR: ... script did not complete successfully

Solution:

Check the pre-installation or post installation scripts for possible errors to resolve this error.

Interactive request script supplied by package

Solution:

This message indicates that the response file is missing or that the setting in the admin file is incorrect. Add a response file to correct this error.

patch-name was installed without backing up the original files

Solution:

This message indicates that the Solaris OS update was installed without backing up the original file. No action needs to be taken.

Insufficient diskspace

Solution:

Load Update jobs might fail due to insufficient disk space. Check the available disk space by typing df -k. Also check the package size. If the package size is too large, create more available disk space on the target server.

The following are stop job failures for loading or unloading update operations:

If you stop a Load Update or Unload Update job and the job does not stop, manually ensure that the following process is killed on the management server:

# ps -ef |grep swi_pkg_pusher
ps -ef |grep pkgadd, pkgrm, scp, ...

Then, check any processes that are running on the provisionable server:

# ps -ef |grep pkgadd, pkgrm, ...

The following are common failures for Unload Server and Unload Group jobs:

The rest of this section provides errors and possible solutions for failures related to the following commands: unload server server-name update update-name and unload group group-name update update-name.

Removal of <SUNWssmu> was suspended (interaction required)

Solution:

This message indicates a failed dependency for uninstalling a Solaris package. Check the admin file setting and provide an appropriate response file.

Job step failure without error details

Solution:

This message might indicate that the job was not successfully started internally. Contact a Sun Service Representative for more information.

Job step failure with vague error details: Connection to 10.0.0.xx

Solution:

This message might indicate that the uninstallation failed because some packages or RPMs were not fully installed. In this case, manually install the package in question on the target server. For example:

To manually install an RPM, type the following command:

# rpm -Uvh rpm-name

To manually install a .pkg file, type the following command:

# pkgadd -d pkg-name -a admin-file

To manually install a patch, type the following command:

# patchadd -d patch-name -a admin-file

Then, run the unload command again.

Job hangs

Solution:

If the job appears to hang, stop the job and manually kill the remaining processes. For example:

To manually kill the job, type the following command:

# n1sh stop job job-ID

Then, find the PID of the RPM and kill the process, by typing the following commands:

# ps -ef |grep rpm-name
# pkill rpm-PID

Or, find the PID of the PKG and kill the process, by typing the following commands:

# ps -ef |grep pkgadd
# pkill pkgadd-PID

Then run the unload command again.

Downloading V20z and V40z Server Firmware Updates

This section provides detailed information to help you download and prepare the firmware versions that are required to discover Sun Fire V20z and V40z servers.

To Download and Prepare Sun Fire V20z and V40z Server Firmware

Steps

Create directories into which the V20z and V40z firmware update zip files are to be saved.

Create separate directories for each server type firmware download. For example:
# mkdir V20z-firmware V40z-firmware

In a web browser, go to http://www.sun.com/servers/entry/v20z/downloads.html.

The Sun Fire V20z/V40z Server downloads page appears.

Click Current Release.

The Sun Fire V20z/V40z NSV Bundles 2.3.0.11 page appears.

Click Download.

The download Welcome page appears. Type your username and password, and then click Login.

The Terms of Use page appears. Read the license agreement carefully. You must accept the terms of the license to continue and download the firmware. Click Accept and then click Continue.

The Download page appears. Several downloadable files are displayed.

To download the V20z firmware zip file, click V20z BIOS and SP Firmware, English (nsv-v20z-bios-fw_V2_3_0_11.zip).

Save the 10.21–Mbyte file to the directory that you created for the V20z firmware in Step 2.

To download the V40z firmware zip file, click V40z BIOS and SP Firmware, English (nsv-v40z-bios-fw_V2_3_0_11.zip).

Save the 10.22–Mbyte file to the directory you created for the V40z firmware in Step 2.

Change to the directory where you downloaded the V20z firmware file.
1. Type unzip to unpack the file.
  
  Type y to continue.
  
  The sw_images directory is extracted.
  
  The following files in the sw_images directory are used by the N1 System Manager to update V20z provisionable server firmware:
  - Service Processor:
    
    sw_images/sp/spbase/V2.3.0.11/install.image
  - BIOS
    
    sw_images/platform/firmware/bios/ V2.33.5.2/bios.sp

Change to the directory where you downloaded the V40z firmware zip file.
1. Type unzip nsv-v40z-bios-fw_V2_3_0_11.zip to unpack the zip file.
  
  The sw_images directory is extracted.
  
  The following files in the sw_images directory are used by the N1 System Manager to update V40z provisionable server firmware:
  - Service Processor:
    
    sw_images/sp/spbase/V2.3.0.11/install.image
  - BIOS:
    
    sw_images/platform/firmware/bios/V2.33.5.2/bios.sp

Next Steps

Copy the firmware updates to the N1 System Manager as described in To Copy a Firmware Update.
Update the firmware on a single server or server group provisionable server as described in To Load a Firmware Update on a Server or a Server Group.

Handling Threshold Breaches

If a threshold value is breached for a monitored attribute, an event is generated. You can create notification rules to warn you about this type of event. Notification of threshold breaches or warnings is done through the event log. This log is most easily viewed through the browser interface.

Notifications can be created using the create notification command and the resulting notification sent by email or to a pager. See create notification in Sun N1 System Manager 1.1 Command Line Reference Manual for syntax details.

Identifying Hardware and OS Threshold Breaches

If the value of a monitored hardware health attribute, or OS resource utilization attribute breaches a threshold value, an event log indicates that the threshold has been breached. The event log becomes available from the browser interface. The length of time it takes for the event log to be available from the browser interface depends on the polling interval for the attribute:

t + polling interval

The time at which the breach occurs is indicated by t. The polling interval is in seconds, and is the amount of time between successive polls of the monitored attribute. See Setting Polling Intervals for more information. Use the show log command to verify that the event log has been generated:

N1-ok> show log
Id            Date                       Severity    Subject     Message
.
. 
10            2004-11-22T01:45:02-0800   WARNING     Sun_V20z_XG041105786
A critical high threshold was violated for server Sun_V20z_XG041105786: Attribute cpu0.vtt-s3 Value 1.32

13            2004-11-22T01:50:08-0800   WARNING     Sun_V20z_XG041105786
A normal low  threshold was violated for server Sun_V20z_XG041105786: Attribute cpu0.vtt-s3 Value 1.2

Identifying Network Connectivity Failure

If the IP addresses of the management server, monitoring agent or the data network are unavailable, an event indicates that there is a network connectivity problem. This is part of network reachability monitoring. See Network Reachability Monitoring for more information. The event log becomes available from the browser interface. The length of time it takes for the event log to be available from the browser interface depends on the polling interval for the attribute:

t + polling interval

N1-ok> show log
.
.
13            2004-11-19T10:24:33-0800   INFORMATION  Sun_V20z_XGserial_number
Ip Address /<ip_address> on server Sun_V20z_XGserial_number is unreachable.

14            2004-11-19T10:24:38-0800   INFORMATION  Sun_V20z_XGserial_number
Ip Address /<ip_address> on server Sun_V20z_XGserial_number is unreachable.

Identifying Monitoring Failure

If monitoring is enabled, as described in Enabling Monitoring, and the status in the output of the show server or show group commands is unknown or unreachable, then the server or server group is not being reached successfully for monitoring. If the status remains unknown or unreachable over the duration of less than five polling intervals, it is possible that a transient network problem is occurring. However if the status remains unknown or unreachable over the duration of more than five polling intervals, it is possible that monitoring has failed. This could be the result of a failure in the monitoring agent.

A time stamp is provided in the monitoring data output. The relationship between this time stamp and the value of the polling interval can also be used to judge if there is an error with the monitoring agent. If the monitored output for a provisionable server continues to show the same timestamp, even after several polling intervals have passed, this indicates that the provisionable server has not been successfully polled, and is no longer being monitored. This could be the result of a failure in the monitoring agent.

Previous: Chapter 5 Monitoring Your Servers

Chapter 6 Troubleshooting

Security

How to Regenerate Common Agent Container Security Keys

Steps

General Security Considerations

Troubleshooting OS Distributions

Distribution Copy Failures

Patching Solaris 9 Distributions

Using a Provisionable Server to Patch OS Distributions

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on x86 Patch Server

Before You Begin

Steps

Troubleshooting

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on SPARC Patch Server

Before You Begin

Steps

Troubleshooting

OS Profile Deployment Failures

To Modify the Default Solaris OS Profile for a Sun Fire V40z or a SPARC v440 Server

Steps

See Also

To Modify a Solaris 9 OS Profile for a Sun Fire V20z Server With a K2.0 Motherboard

Steps

Example 6–1 Adding a Script to a Solaris OS Profile

Next Steps

Solaris Deployment Job Times Out or Stops

To Modify the Network Interface Configuration

Steps

Solaris OS Profile Installation Fails

Invalid Management Server Netmask

Linux Deployment Stops

Restarting NFS to Resolve Boot Failed Errors

Resolving wget Command Failures Related to OS Monitoring

OS Update Problems

OS Update Creation Failures

OS Update Deployment Failures

Downloading V20z and V40z Server Firmware Updates

To Download and Prepare Sun Fire V20z and V40z Server Firmware

Steps

Next Steps

Handling Threshold Breaches

Identifying Hardware and OS Threshold Breaches

Identifying Network Connectivity Failure

Identifying Monitoring Failure

Restarting NFS to Resolve `Boot Failed` Errors

Resolving `wget` Command Failures Related to OS Monitoring