Sun N1 System Manager 1.3.1 Troubleshooting Guide

Chapter 6 OS Distribution and Deployment Problems

This chapter lists problems that can occur with OS distribution creation and deployment, their causes, and the solution for each problem. OS distribution creation and OS deployment failures can be caused by many different issues as described in this section. The following topics are discussed:

Possible Causes for Distribution and Deployment Failure

OS deployments might fail or fail to complete if any of the following conditions occur:

Possible Causes for Windows Distribution and Deployment Failure

Provisioning a Windows distribution to a managed server can fail for several reasons:

Deploying Solaris OS 9 Update 7 or Previous Distributions From a Linux OS Management Server Fails

The inability to deploy Solaris OS 9 Update 7 and previous Solaris OS 9 distributions to manageable servers from a Linux OS management server is usually due to a problem with NFS mounts. To solve this problem, you need to apply a patch to the mini-root of the Solaris OS 9 distribution. The instructions differ according to the management and patch server configuration scenarios listed in the following table. The patch is not required if you are deploying Solaris OS 9 Update 8 or later.

Table 6–1 Task Map for Patching a Solaris OS 9 Distribution

Management Server 

Patch Server 

Task 

Red Hat 3.0 u2 

Solaris OS 9 on x86 platform 

Creating and Using a Solaris OS 9 x86 Patch Server to Patch Solaris OS 9 Update 7 Distributions

Red Hat 3.0 u2 

Solaris OS 9 on SPARC platform 

Creating and Using a Solaris OS 9 SPARC Patch Server to Patch Solaris OS 9 Update 7 Distributions

Invalid Management Server Netmask

If the target server cannot access DHCP information or mount the distribution directories on the management server during a Solaris OS 10 deployment, you might have network problems caused by an invalid netmask. The console output might be similar to the following:


Booting kernel/unix...
  krtld: Unused kernel arguments: `install'.
  SunOS? Release 5.10 Version Generic 32-bit
  Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  Unsupported Tavor FW version: expected: 0003.0001.0000, actual: 0002.0000.0000
  NOTICE: tavor0: driver attached (for maintenance mode only)
  Configuring devices.
  Using DHCP for network configuration information.
  Beginning system identification...
  Searching for configuration file(s)...
  Using sysid configuration file /sysidcfg
  Search complete.
  Discovering additional network configuration...
  Completing system identification...
  Starting remote procedure call (RPC) services: done.
  System identification complete.
  Starting Solaris installation program...
  Searching for JumpStart directory...
  /sbin/dhcpinfo: primary interface requested but no primary interface is set
  not found
  Warning: Could not find matching rule in rules.ok
  Press the return key for an interactive Solaris install program...

To fix the problem, set the management server netmask value to 255.255.255.0. See To Configure the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide.

Linux OS Deployment Stops

If you are deploying a Linux OS and the deployment stops, check the console of the target server to see whether the installer is in interactive mode. If the installer is in interactive mode, the deployment timed out because of a delay in the transmission of data from the management server to the target server. This delay usually occurs because the switch or switches connecting the two machines has spanning tree enabled. Either turn off spanning tree on the switch or disable spanning tree for the ports that are connected to the management server and the target server.

If spanning tree is already disabled and OS deployment stops, a problem might exist with your network.


Note –

For Red Hat installations to work with some networking configurations, you must enable spanning tree.


Management Server Reboots During load os Operations

If the IP address range specified for discovery includes the management server IP addresses, and the management server service processor port is connected to the management network, the discovery process discovers the management server. Subsequently, it is possible that a load os operation that includes the discovered management server will attempt to load an OS to the management server, thus causing the management server to reboot.

    Solution:

  1. Disconnect the management server's service processor port from the management network.

  2. Delete the management server from the list of discovered servers in N1 System Manager.

Mount Point Issues

Distribution copy failures might also occur if there are file systems on the /mnt mount point. Move all file systems off the /mnt mount point before attempting create os command operations.

OS Deployment Fails on a Sun Fire V20z or V40z With internal error Message

If OS deployment fails on a Sun Fire V20z or a V40z server with the internal error occurred message provided in the job results, direct the platform console output to the service processor. If the platform console output cannot be directed to the service processor, reboot the service processor. To reboot the service processor, log in to the service processor and run the sp reboot command.

To check the console output, log in to the service processor, and run the platform console command. Examine the output during OS deployment to resolve the problem.

OS Deployment Fails on a Sun Blade X8400 Server Blade That Has Correct Firmware

Provisioning an OS distribution to a Sun BladeTM X8400 server blade will fail if the following conditions are met:

To provision an OS distribution to a Sun Blade X8400 server blade, use either of the following two methods:

For further information about the bootnetworkdevice and the networkdevice options, see Chapter 3, Provisioning Sun Blade X8400 Server Modules in the Sun Blade 8000 Chassis, in Sun N1 System Manager 1.3.1 What’s New

OS Distribution Creation Fails With a Copying Files Error

If the creation of an OS distribution fails with a copying files error, check the size of the ISO image and ensure that it is not corrupted. You might see output similar to the following example in the job details:


bash-3.00# /opt/sun/n1gc/bin/n1sh show job 25
Job ID:   25
Date:     2005-07-20T14:28:43-0600
Type:     Create OS Distribution
Status:   Error (2005-07-20T14:29:08-0600)
Command:	 create os RedHat file /images/rhel-3-U4-i386-es-disc1.iso
Owner:    root
Errors:   1
Warnings: 0

Steps
ID     Type             Start
Completion                 Result
1      Acquire Host     2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
2      Run Command      2005-07-20T14:28:43-0600
2005-07-20T14:28:43-0600   Completed
3      Acquire Host     2005-07-20T14:28:46-0600
2005-07-20T14:28:46-0600   Completed
4      Run Command      2005-07-20T14:28:46-0600
2005-07-20T14:29:06-0600   Error 1

Errors
Error 1:
Description: INFO   : Mounting /images/rhel-3-U4-i386-es-disc1.iso at
/mnt/loop23308
INFO   : Version is 3ES, disc is 1
INFO   : Version is 3ES, disc is 1
INFO   : type redhat ver: 3ES
cp: /var/opt/SUNWscs/data/allstart/image/3ES-bootdisk.img: Bad address
INFO   : Could not copy PXE file bootdisk.img
INFO   : umount_exit: mnt is: /mnt/loop23308
INFO   : ERROR: Could not add floppy to the Distro

Results
Result 1:
Server:   -
Status:   -1
Message:  Creating OS rh30u4-es failed.

In the above case, try copying a different set of distribution files to the management server. See To Copy an OS Distribution From CDs or a DVD in Sun N1 System Manager 1.3 Operating System Provisioning Guide or To Copy an OS Distribution From ISO Files in Sun N1 System Manager 1.3 Operating System Provisioning Guide.

Red Hat Linux OS Deployment Fails on a Sun Blade X8400 Server Blade With Factory-Default Firmware or After a Firmware Update

Linux OS deployment fails after some Sun Blade X8400 server blade BIOS firmware upgrades and on factory default blades. Some BIOS firmware upgrades may cause CMOS checksum errors, prompting you to restore the default CMOS settings after the server blade resets. The default BIOS settings will not work with Linux.

To resolve this problem:

  1. Connect to the server blade service processor by either logging directly into the SP, or by using connect server from the N1 System Manager browser interface.

  2. Press F2 during the boot sequence to enter the BIOS setup.

  3. Press F9 to load the optimal defaults.

  4. Navigate to Advanced settings.

    1. Choose ACPI Configuration.

    2. Choose Advanced ACPI Configuration.

    3. Set ACPI MCFG Table Select: to No.

  5. Navigate back to Advanced settings.

  6. Set AMD PowerNow Select to Enabled.

  7. Press F10 to save the settings and reboot.

Red Hat Linux OS Profile Creation Fails

Building Red Hat OS profiles on the N1 System Manager might require additional analysis to avoid failures. If you have a problem with a custom OS profile, perform the following steps while the problem deployment is still active.

  1. Log in to the management server as root.

  2. Run the following script:


    # cat /var/opt/sun/scs/share/allstart/config/ks*cfg > failed_ks_cfg

The failed_ks_cfg file will contain all of the KickStart parameters, including those that you customized. Verify that the parameters stated in the configuration file are appropriate for the current hardware configuration. Correct any errors and try the deployment again.

Restarting NFS to Resolve Boot Failed Errors

Boot Failed messages occur when the management server cannot access files during a Load OS operation, and appear similar to the following example.


Error: boot: lookup /js/4/Solaris_10/Tools/Boot failed
boot: cannot open kernel/sparcv9/unix

Note –

The message differs depending on the OS that is being deployed.


Stale NFS file handles are the most common cause of this problem. Log in to the management server as root (su - root) and restart NFS.

Solaris OS Deployment Job Times Out or Stops

If you attempt to load a Solaris OS profile and the OS Deploy job times out or stops, check the output in the job details to ensure that the target server completed a PXE boot. For example:


PXE-M0F: Exiting Broadcom PXE ROM.
      Broadcom UNDI PXE-2.1 v7.5.14
     Copyright (C) 2000-2004 Broadcom Corporation
     Copyright (C) 1997-2000 Intel Corporation
     All rights reserved.      
CLIENT MAC ADDR: 00 09 3D 00 A5 FC  GUID: 68D3BE2E 6D5D 11D8 BA9A 0060B0B36963
     DHCP.

If the PXE boot fails, the /etc/dhcpd.conf file on the management server might have erroneous network interface connection entries, which can occur if incorrect information is specified during the N1 System Manager configuration process.


Note –

The best diagnostic tool is to open a console window on the target machine and then run the deployment. See Connecting to the Serial Console for a Managed Server in Sun N1 System Manager 1.3 Discovery and Administration Guide.


If you suspect that the /etc/dhcpd.conf file was configured incorrectly, perform the following steps to modify the configuration.

  1. Log in to the management server as root (su - root).

  2. Inspect the dhcpd.conf file for errors.

  3. If errors exist that need to be corrected, stop N1 System Manager and rerun the configuration process as described in Configuring the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide. Ensure that you specify the correct management server Ethernet port for the N1 System Manager management network and provisioning network.

  4. When the configuration process has completed and N1 System Manager has restarted, load the OS profile on the target server.

Solaris OS Profile Installation Fails

OS profiles that install only the Core System Support distribution group do not load successfully. Specify “Entire Distribution plus OEM Support” as the value for the distributiongroup parameter. This setting configures a profile that will install the needed version of SSH and other tools that are required for servers to be managed by the N1 System Manager.

SuSE OS Profile Fails to Load on a Sun Fire V20z or Sun Fire V40z

Loading a SuSE OS profile on a Sun Fire X4000 series server modifies the associated SuSE OS distribution, which makes the SuSE OS distribution unusable by Sun Fire V20z and V40z servers.

To avoid this problem, you must create separate SuSE Linux Enterprise Server 9 OS and SuSE Linux Enterprise Server 9 SP1 OS distributions profiles for the Sun Fire V20z and V40z servers, and for the Sun Fire X4000 series servers.

Windows Deployment Fails after Upgrade from N1 System Manager 1.3 to 1.3.1

The N1 System Manager 1.3 to 1.3.1 does not upgrade the scripts and drivers in the Windows RIS server C:\N1SM directory. To upgrade the RIS server for N1 System Manager 1.3.1 you must perform the following tasks:

  1. Delete the C:\N1SM directory on the RIS server

  2. Delete the RIS server from N1 System Manager

  3. Re-add the RIS server to N1 System Manager

The following procedure provides the specific steps required to update the RIS server for N1 System Manager 1.3.1.

ProcedureTo Update the RIS Server After Upgrading to N1 System Manager 1.3.1

  1. Log in to the RIS server using an account with administrative privileges.

    Delete the C:\N1SM directory.

    If you specified a different file directory on the RIS server when running N1 System Manager 1.3 n1smconfig, delete that directory.

    The C:\N1SM will be recreated on the RIS server when you re-add the RIS server to N1 System Manager.

  2. Log in to the management server as root.

  3. Delete the RIS server from N1 System Manager as follows.

    1. Type n1smconfig.

      The current N1 System Manager configuration is displayed.


      Tip –

      Print the current configuration to use as reference in the following steps.


      You are notified that only options that can be changed will be displayed.

    2. Type y to continue.

      Respond to each prompt as appropriate for your network and N1 System Manager configuration.

    3. Type y when prompted Add, Delete, or Modify Windows RIS server? ([n]/y)

      The current RIS server configuration is displayed again, for example:


      Add, Delete, or Modify Windows RIS server? ([n]/y) y
      CURRENT RIS Servers:
      ID: 1
              Name: default
              HostName:
              IP: 192.168.0.100
              Subnet_Address: 192.168.0.0
              OSP_Location: C:\\\\N1SM
              RIS_Share_Path: D:\\RemoteInstall
              Active_dir_domain: mularis.sfbay.sun.com
              Active_dir_user: n1smuser
              ssh_user: n1smuser
      
      Delete this RIS server? ([n]/y)
    4. Type y to delete the RIS server from N1 System Manager.

      Respond to the remaining prompts as appropriate for your network and N1 System Manager configuration.

  4. Add the RIS server to N1 System Manager as follows.

    1. Type n1smconfig.

      The current N1 System Manager configuration is displayed.

      You are notified that only options that can be changed will be displayed.

    2. Type y to continue.

      Respond to each prompt as appropriate for your network and N1 System Manager configuration.

    3. Type y when prompted Add, Delete, or Modify Windows RIS server? ([n]/y).

      Respond to each prompt, specifying the values that were displayed in Step 3 substep Step a.

      After RIS server configuration is completed, respond to the remaining prompts as appropriate for your network and N1 System Manager configuration.