Sun N1 System Manager 1.3 Troubleshooting Guide

OS Distributions and Deployment

OS distribution creation and OS deployment failures can be caused by many different issues. This section provides guidelines and references to help you resolve each issue.

The following topics are discussed:

Possible Causes for Distribution and Deployment Failure

OS deployments might fail or fail to complete if any of the following conditions occur:

Use the following graphic as a guide to troubleshooting best practices. The graphic describes steps to take when you initiate provisioning operations. Taking these steps will help you troubleshoot deployments with greater efficiency.

This graphic illustrates troubleshooting steps to take
when initiating deployments.

Solaris Deployment Job Times Out or Stops

If you attempt to load a Solaris OS profile and the OS Deploy job times out or stops, check the output in the job details to ensure that the target server completed a PXE boot. For example:


PXE-M0F: Exiting Broadcom PXE ROM.
      Broadcom UNDI PXE-2.1 v7.5.14
     Copyright (C) 2000-2004 Broadcom Corporation
     Copyright (C) 1997-2000 Intel Corporation
     All rights reserved.      
CLIENT MAC ADDR: 00 09 3D 00 A5 FC  GUID: 68D3BE2E 6D5D 11D8 BA9A 0060B0B36963
     DHCP.

If the PXE boot fails, the /etc/dhcpd.conf file on the management server might have not been set up correctly by the N1 System Manager.


Note –

The best diagnostic tool is to open a console window on the target machine and then run the deployment. See Connecting to the Serial Console for a Managed Server in Sun N1 System Manager 1.3 Discovery and Administration Guide.


If you suspect that the /etc/dhcpd.conf file was configured incorrectly, complete the following procedure to modify the configuration.

ProcedureTo Modify the Network Interface Configuration

Steps
  1. Log in to the management server as root.

  2. Inspect the dhcpd.conf file for errors.


    # vi /etc/dhcpd.conf
    
  3. If errors exist that need to be corrected, run the following command:


    # /usr/bin/n1smconfig
    

    The n1smconfig utility appears.

  4. Modify the provisioning network interface configuration.

    See Configuring the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide for detailed instructions.

  5. Load the OS profile on the target server.

Solaris OS Profile Installation Fails

OS profiles that install only the Core System Support distribution group do not load successfully. Specify “Entire Distribution plus OEM Support” as the value for the distributiongroup parameter. Doing so configures a profile that will install the needed version of SSH and other tools that are required for servers to be managed by the N1 System Manager.

Deploying Solaris OS 9 Update 7 or Lower from a Linux Management Server Fails

The inability to deploy Solaris 9 Update 7 or lower OS distributions to servers from a Linux management server is usually due to a problem with NFS mounts. To solve this problem, you need to apply a patch to the mini-root of the Solaris 9 OS distribution. The instructions differ according to the management and patch server configuration scenarios in the following table. The patch is not required if you are deploying Solaris 9 Update 8 or later.

Table 3–1 Task Map for Patching a Solaris 9 Distribution

Management Server 

Patch Server 

Task 

Red Hat 3.0 u2 

Solaris 9 OS on x86 platform 

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on an x86 Patch Server

Red Hat 3.0 u2 

Solaris 9 OS on SPARC platform 

To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on a SPARC Patch Server

Red Hat Linux OS Profile Creation Fails

Building Red Hat OS profiles on the N1 System Manager might require additional analysis to avoid failures. If you have a problem with a custom OS profile, perform the following steps while the problem deployment is still active.

  1. Log into the management server as root.

  2. Run the following script:


    # cat /var/opt/sun/scs/share/allstart/config/ks*cfg > failed_ks_cfg

The failed_ks_cfg file will contain all of the KickStart parameters, including those that you customized. Verify that the parameters stated in the configuration file are appropriate for the current hardware configuration. Correct any errors and try the deployment again.

Linux Deployment Stops

If you are deploying a Linux OS and the deployment stops, check the console of the target server to see if the installer is in interactive mode. If the installer is in interactive mode, the deployment timed out because of a delay in the transmission of data from the management server to the target server. This delay usually occurs because the switch or switches connecting the two machines has spanning tree enabled. Either turn off spanning tree on the switch or disable spanning tree for the ports that are connected to the management server and the target server.

If spanning tree is already disabled and OS deployment stops, there may be a problem with your network.


Note –

For Red Hat installations to work with some networking configurations, you must enable spanning tree.


Windows Deployment Fails

Provisioning a Windows distribution to a managed server can fail for several reasons:

Invalid Management Server Netmask

If the target server cannot access DHCP information or mount the distribution directories on the management server during a Solaris 10 deployment, you might have network problems caused by an invalid netmask. The console output might be similar to the following:


Booting kernel/unix...
  krtld: Unused kernel arguments: `install'.
  SunOS? Release 5.10 Version Generic 32-bit
  Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  Unsupported Tavor FW version: expected: 0003.0001.0000, actual: 0002.0000.0000
  NOTICE: tavor0: driver attached (for maintenance mode only)
  Configuring devices.
  Using DHCP for network configuration information.
  Beginning system identification...
  Searching for configuration file(s)...
  Using sysid configuration file /sysidcfg
  Search complete.
  Discovering additional network configuration...
  Completing system identification...
  Starting remote procedure call (RPC) services: done.
  System identification complete.
  Starting Solaris installation program...
  Searching for JumpStart directory...
  /sbin/dhcpinfo: primary interface requested but no primary interface is set
  not found
  Warning: Could not find matching rule in rules.ok
  Press the return key for an interactive Solaris install program...

To fix the problem, set the management server netmask value to 255.255.255.0. See To Configure the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide.

OS Distribution Creation Fails with a Copying Files Error

If the creation of an OS distribution fails with a copying files error, check the size of the ISO image and ensure that it is not corrupted. You might see output similar to the following in the job details:


bash-3.00# /opt/sun/n1gc/bin/n1sh show job 25
Job ID:   25
Date:     2005-07-20T14:28:43-0600
Type:     Create OS Distribution
Status:   Error (2005-07-20T14:29:08-0600)
Command:	 create os RedHat file /images/rhel-3-U4-i386-es-disc1.iso
Owner:    root
Errors:   1
Warnings: 0

Steps
ID     Type             Start
Completion                 Result
1      Acquire Host     2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
2      Run Command      2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
3      Acquire Host     2005-07-20T14:28:46-0600
2005-07-20T14:28:46-0600   Completed
4      Run Command      2005-07-20T14:28:46-0600
2005-07-20T14:29:06-0600   Error 1

Errors
Error 1:
Description: INFO   : Mounting /images/rhel-3-U4-i386-es-disc1.iso at
/mnt/loop23308
INFO   : Version is 3ES, disc is 1
INFO   : Version is 3ES, disc is 1
INFO   : type redhat ver: 3ES
cp: /var/opt/SUNWscs/data/allstart/image/3ES-bootdisk.img: Bad address
INFO   : Could not copy PXE file bootdisk.img
INFO   : umount_exit: mnt is: /mnt/loop23308
INFO   : ERROR: Could not add floppy to the Distro

Results
Result 1:
Server:   -
Status:   -1
Message:  Creating OS rh30u4-es failed.

In the above case, try copying a different set of distribution files to the management server. See To Copy an OS Distribution From CDs or a DVD in Sun N1 System Manager 1.3 Operating System Provisioning Guide or To Copy an OS Distribution From ISO Files in Sun N1 System Manager 1.3 Operating System Provisioning Guide.

Mount Point Issues

Distribution copy failures might also occur if there are file systems on the /mnt mount point. Move all file systems off of the /mnt mount point before attempting create os command operations.

OS Deployment Fails on a V20z or V40z With internal error Message

If OS deployment fails on a V20z or a V40z with the internal error occurred message provided in the job results, direct the platform console output to the service processor. If the platform console output cannot simply be directed to the service processor, reboot the service processor. To reboot the service processor, log on to the service processor and run the sp reboot command.

To check the console output, log on to the service processor, and run the platform console command. Examine the output during OS deployment to resolve the problem.

Restarting NFS to Resolve Boot Failed Errors


Error: boot: lookup /js/4/Solaris_10/Tools/Boot failed boot: cannot open kernel/sparcv9/unix

Solution:

The message differs depending on the OS that is being deployed. If the management server cannot access files during a Load OS operation, it might be caused by a network problem. To possibly correct this problem, try restarting NFS.

On a Solaris system, type the following:


# svcadm nfs restart

On a Linux system, type the following:


# /etc/init.d/nfs restart