OS distribution creation and OS deployment failures can be caused by many different issues. This section provides guidelines and references to help you resolve each issue.
The following topics are discussed:
Deploying Solaris OS 9 Update 7 or Lower from a Linux Management Server Fails
OS Deployment Fails on a V20z or V40z With internal error Message
OS deployments might fail or fail to complete if any of the following conditions occur:
The target RSC technology server was not powered off before discovery was run. RSC servers must remain powered off until discovery is complete and discovery has been confirmed by using the show server command. See Discovery of RSC Servers.
Partitions are not modified to suit a Sun Fire V40z or SPARC V440 server. See To Modify the Default Solaris OS Profile for a Sun Fire V40z or a SPARC V440 Server in Sun N1 System Manager 1.3 Operating System Provisioning Guide.
Scripts are not modified to install the driver needed to recognize the Ethernet interface on a Sun Fire V20z server. See To Modify a Solaris 9 OS Profile for a Sun Fire V20z Server With a K2.0 Motherboard in Sun N1 System Manager 1.3 Operating System Provisioning Guide.
DHCP is not correctly configured. See Solaris Deployment Job Times Out or Stops.
OS profile installs only the Solaris Core System Support distribution group. See Solaris OS Profile Installation Fails.
The target server cannot access DHCP information or mount distribution directories. See Invalid Management Server Netmask.
The management server cannot access files during a Load OS operation. See Restarting NFS to Resolve Boot Failed Errors.
The Linux deployment stops. See Linux Deployment Stops.
The Red Hat deployment fails. See Red Hat Linux OS Profile Creation Fails.
Use the following graphic as a guide to troubleshooting best practices. The graphic describes steps to take when you initiate provisioning operations. Taking these steps will help you troubleshoot deployments with greater efficiency.
If you attempt to load a Solaris OS profile and the OS Deploy job times out or stops, check the output in the job details to ensure that the target server completed a PXE boot. For example:
PXE-M0F: Exiting Broadcom PXE ROM. Broadcom UNDI PXE-2.1 v7.5.14 Copyright (C) 2000-2004 Broadcom Corporation Copyright (C) 1997-2000 Intel Corporation All rights reserved. CLIENT MAC ADDR: 00 09 3D 00 A5 FC GUID: 68D3BE2E 6D5D 11D8 BA9A 0060B0B36963 DHCP. |
If the PXE boot fails, the /etc/dhcpd.conf file on the management server might have not been set up correctly by the N1 System Manager.
The best diagnostic tool is to open a console window on the target machine and then run the deployment. See Connecting to the Serial Console for a Managed Server in Sun N1 System Manager 1.3 Discovery and Administration Guide.
If you suspect that the /etc/dhcpd.conf file was configured incorrectly, complete the following procedure to modify the configuration.
Log in to the management server as root.
Inspect the dhcpd.conf file for errors.
# vi /etc/dhcpd.conf |
If errors exist that need to be corrected, run the following command:
# /usr/bin/n1smconfig |
The n1smconfig utility appears.
Modify the provisioning network interface configuration.
See Configuring the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide for detailed instructions.
Load the OS profile on the target server.
OS profiles that install only the Core System Support distribution group do not load successfully. Specify “Entire Distribution plus OEM Support” as the value for the distributiongroup parameter. Doing so configures a profile that will install the needed version of SSH and other tools that are required for servers to be managed by the N1 System Manager.
The inability to deploy Solaris 9 Update 7 or lower OS distributions to servers from a Linux management server is usually due to a problem with NFS mounts. To solve this problem, you need to apply a patch to the mini-root of the Solaris 9 OS distribution. The instructions differ according to the management and patch server configuration scenarios in the following table. The patch is not required if you are deploying Solaris 9 Update 8 or later.
Table 3–1 Task Map for Patching a Solaris 9 Distribution
Management Server |
Patch Server |
Task |
---|---|---|
Red Hat 3.0 u2 |
Solaris 9 OS on x86 platform |
To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on an x86 Patch Server |
Red Hat 3.0 u2 |
Solaris 9 OS on SPARC platform |
To Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on a SPARC Patch Server |
Building Red Hat OS profiles on the N1 System Manager might require additional analysis to avoid failures. If you have a problem with a custom OS profile, perform the following steps while the problem deployment is still active.
Log into the management server as root.
Run the following script:
# cat /var/opt/sun/scs/share/allstart/config/ks*cfg > failed_ks_cfg |
The failed_ks_cfg file will contain all of the KickStart parameters, including those that you customized. Verify that the parameters stated in the configuration file are appropriate for the current hardware configuration. Correct any errors and try the deployment again.
If you are deploying a Linux OS and the deployment stops, check the console of the target server to see if the installer is in interactive mode. If the installer is in interactive mode, the deployment timed out because of a delay in the transmission of data from the management server to the target server. This delay usually occurs because the switch or switches connecting the two machines has spanning tree enabled. Either turn off spanning tree on the switch or disable spanning tree for the ports that are connected to the management server and the target server.
If spanning tree is already disabled and OS deployment stops, there may be a problem with your network.
For Red Hat installations to work with some networking configurations, you must enable spanning tree.
Provisioning a Windows distribution to a managed server can fail for several reasons:
The Windows operating system might not be compatible with the managed server. For a list of qualified servers, see Manageable Server Requirements in Sun N1 System Manager 1.3 Site Preparation Guide.
The SSH entries for that managed server on the management server known_hosts file might be stale or obsolete. Determine the management server name and IP address, and then remove the entry for that managed server from the known_hosts as described in To Update the ssh_known_hosts File.
The product key is unique to each release of the Windows OS. To ensure that the correct product key applies, either modify the OS profile to include the correct product key or use the productkey attribute on the load server command.
If you encounter a TFTP error when loading the OS profile, the GUID is likely incorrect. To find the GUID of a system, use the Pre-Boot eXecution Environment (PXE) to boot the system.
If Linux was installed previously on the managed server, Windows will ask about partitions the first time that you try to install Windows on the system. To resolve this issue, delete the partitions on the console, or wipe out the first part of the disk before you install Windows.
If the target server cannot access DHCP information or mount the distribution directories on the management server during a Solaris 10 deployment, you might have network problems caused by an invalid netmask. The console output might be similar to the following:
Booting kernel/unix... krtld: Unused kernel arguments: `install'. SunOS? Release 5.10 Version Generic 32-bit Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Unsupported Tavor FW version: expected: 0003.0001.0000, actual: 0002.0000.0000 NOTICE: tavor0: driver attached (for maintenance mode only) Configuring devices. Using DHCP for network configuration information. Beginning system identification... Searching for configuration file(s)... Using sysid configuration file /sysidcfg Search complete. Discovering additional network configuration... Completing system identification... Starting remote procedure call (RPC) services: done. System identification complete. Starting Solaris installation program... Searching for JumpStart directory... /sbin/dhcpinfo: primary interface requested but no primary interface is set not found Warning: Could not find matching rule in rules.ok Press the return key for an interactive Solaris install program... |
To fix the problem, set the management server netmask value to 255.255.255.0. See To Configure the N1 System Manager in Sun N1 System Manager 1.3 Installation and Configuration Guide.
If the creation of an OS distribution fails with a copying files error, check the size of the ISO image and ensure that it is not corrupted. You might see output similar to the following in the job details:
bash-3.00# /opt/sun/n1gc/bin/n1sh show job 25 Job ID: 25 Date: 2005-07-20T14:28:43-0600 Type: Create OS Distribution Status: Error (2005-07-20T14:29:08-0600) Command: create os RedHat file /images/rhel-3-U4-i386-es-disc1.iso Owner: root Errors: 1 Warnings: 0 Steps ID Type Start Completion Result 1 Acquire Host 2005-07-20T14:28:43-0600 2005-07-20T14:28:43-0600 Completed 2 Run Command 2005-07-20T14:28:43-0600 2005-07-20T14:28:43-0600 Completed 3 Acquire Host 2005-07-20T14:28:46-0600 2005-07-20T14:28:46-0600 Completed 4 Run Command 2005-07-20T14:28:46-0600 2005-07-20T14:29:06-0600 Error 1 Errors Error 1: Description: INFO : Mounting /images/rhel-3-U4-i386-es-disc1.iso at /mnt/loop23308 INFO : Version is 3ES, disc is 1 INFO : Version is 3ES, disc is 1 INFO : type redhat ver: 3ES cp: /var/opt/SUNWscs/data/allstart/image/3ES-bootdisk.img: Bad address INFO : Could not copy PXE file bootdisk.img INFO : umount_exit: mnt is: /mnt/loop23308 INFO : ERROR: Could not add floppy to the Distro Results Result 1: Server: - Status: -1 Message: Creating OS rh30u4-es failed. |
In the above case, try copying a different set of distribution files to the management server. See To Copy an OS Distribution From CDs or a DVD in Sun N1 System Manager 1.3 Operating System Provisioning Guide or To Copy an OS Distribution From ISO Files in Sun N1 System Manager 1.3 Operating System Provisioning Guide.
Distribution copy failures might also occur if there are file systems on the /mnt mount point. Move all file systems off of the /mnt mount point before attempting create os command operations.
If OS deployment fails on a V20z or a V40z with the internal error occurred message provided in the job results, direct the platform console output to the service processor. If the platform console output cannot simply be directed to the service processor, reboot the service processor. To reboot the service processor, log on to the service processor and run the sp reboot command.
To check the console output, log on to the service processor, and run the platform console command. Examine the output during OS deployment to resolve the problem.
Error: boot: lookup /js/4/Solaris_10/Tools/Boot failed boot: cannot open kernel/sparcv9/unix
Solution:The message differs depending on the OS that is being deployed. If the management server cannot access files during a Load OS operation, it might be caused by a network problem. To possibly correct this problem, try restarting NFS.
On a Solaris system, type the following:
# svcadm nfs restart |
On a Linux system, type the following:
# /etc/init.d/nfs restart |