Sun N1 System Manager 1.3.1 Troubleshooting Guide

Chapter 7 OS Update Problems

This chapter describes the most common OS update problems, their causes, and the solution for each problem. The following topics are discussed:

OS Update Creation Fails

The file name that is specified when you create a new OS update must be unique. The combination of the internal package name, version, release, and file name that make up the OS update also needs to be unique.

For example, if test1.rpm is the source for an RPM named test1, another OS update called test2 cannot have the same file name as test1.rpm. To avoid additional naming issues, do not name an OS update with the same name as the internal package name for any other existing packages on the manageable server.

You can specify an adminfile value when you create an OS update. For the Solaris OS update packages, a default admin file is located at /opt/sun/n1gc/etc/admin.


mail=
   instance=unique
   partial=nocheck
   runlevel=nocheck
   idepend=nocheck
   rdepend=nocheck
   space=quit
   setuid=nocheck
   conflict=nocheck
   action=nocheck
   basedir=default
   authentication=nocheck

If you use an adminfile to install an OS update, ensure that the package file name matches the name of the package. If the file name does not match that of the package, and an adminfile is used to install the OS update, a later attempt to uninstall the OS update will fail. See OS Update Uninstall on a Managed Server Fails.

The default admin file setting used for Solaris OS package deployments in the N1 System Manager is instance=unique. If you want to report errors for duplicated packages, change the admin file setting to instance=quit. This change causes an error to appear in the Load Update job results if a duplicate package is detected.

See the admin(4) man page for detailed information about admin file parameter settings. Type man -s4 admin as root user on a Solaris OS system to view the man page.

For Solaris OS packages, a response file might also be needed. For instructions on how to specify an admin file and a response file when you create an OS update, see To Copy an OS Update in Sun N1 System Manager 1.3 Operating System Provisioning Guide.

OS Update Uninstall on a Managed Server Fails

If you cannot uninstall an OS update that was installed on a managed server using an adminfile, the package file name does not match the name of the package. Log in to the managed server and use the pkginfo command to display the package name, and compare the package name to the name of the package file.

For example:


# ls package file name 
   package-file-name
   # pkginfo -d ./package-file-name 
   application package-name     package-information
   # pkginfo -d ./package-name | /usr/bin/awk '{print $2}'
   package-name
---
   # cp package-file-name new-file-name
   # pkginfo -d ./new-file-name 
   application package-name     package-information
   5# pkginfo -d ./new-file-name | /usr/bin/awk '{print $2}'
   package-name
   

If the name of the package file is not the same as the package name , rename the adminfile in the manageable server's /tmp directory to match the name of the package and try the unload command again. If the package still exists, remove it from the manageable server by using pkgrm.

Solaris OS Update Deployment Failures

This section provides solutions for the following categories of failures that can occur during Solaris OS update deployment:

Failures That Occur Before the Job is Submitted

The following common failures can occur before the job is submitted:


Target server is not initialized

Solution:

Check whether the add server feature osmonitor command was issued and that it succeeded.


Another running job on the target server

Solution:

Only one job is allowed at a time on a server. Try again after the job completes.


Update is incompatible with operating system on target server

Solution:

Check whether the OS type of the target server matches one of the update OS types. Type show update update-name at the N1–ok> prompt to view the OS type for the update.


Target server is not in a good state or is powered off

Solution:

Check whether the target server is up and running. Typeshow server server-name at the N1–ok> prompt to view the server status. Typereset server server-name force to force a reboot.

Load Update Job Failures

The following are possible causes for Load Update job failures:


error: Failed dependencies:


A prerequisite package should be installed

Description:

Sometimes, Load Update jobs fail because either the same package already exists or because a later version of the package exists. Ensure that the package does not already exist on the target server if the job fails.

Solution:

For a Solaris OS system, configure the idepend= parameter in the admin file.


Preinstall or postinstall scripts failure: Non-zero status


pkgadd: ERROR: ... script did not complete successfully

Solution:

Check the preinstallation or postinstallation scripts for possible errors to resolve this error.


Interactive request script supplied by package

Solution:

This message indicates that the response file is missing or that the setting in the admin file is incorrect. Add a response file to correct this error.


patch-name was installed without backing up the original files

Solution:

This message indicates that the Solaris OS update was installed without backing up the original file. No action needs to be taken.


Insufficient diskspace

Solution:

Load Update jobs might fail due to insufficient disk space. Check the available disk space by typing df -k. Also check the package size. If the package size is too large, create more available disk space on the target server.

Unload Update Job Failures

In the following unload command, update could be either the update name that appears in the list when you type show update all, or the actual package name on the target server.


N1-ok> unload server server update update

Always check whether the package is targeted to the correct architecture.


Note –

The N1 System Manager does not distinguish 32-bit from 64-bit for the Solaris OS on either the x86 or SPARC platforms, so the package or patch might not install successfully if it is installed on an incompatible OS.


If the package or patch does install successfully but performance decreases, check whether the architecture of the patch matches the architecture of the OS. 32–bit packages will run under a 64–bit operating system, but this mismatch will decrease performance.

Stop Job Failures for Load and Unload Update

The following are stop job failures for loading and unloading update operations:

If you stop a Load Update or Unload Update job and the job does not stop, manually ensure that the following processes are killed on the management server:


# ps -ef |grep swi_pkg_pusher
# ps -ef |grep pkgadd, pkgrm, scp, ...

Then, check any processes that are running on the manageable server:


# ps -ef |grep pkgadd, pkgrm, ...

The following are common failures for Unload Server and Unload Group jobs:

Unload Server and Unload Group Failures

The following are possible causes for failures related to the commands unload server server-name update update-name and unload group group-name update update-name.


Removal of <SUNWssmu> was suspended (interaction required)

Solution:

This message indicates a failed dependency for uninstalling a Solaris OS package. Check the admin file setting and provide an appropriate response file.


Job step failure without error details

Solution:

This message might indicate that the job was not successfully started internally. Contact a Sun Service Representative for more information.


Job step failure with vague error details: Connection to 10.0.0.xx

Solution:

This message might indicate that the uninstall failed because some packages were not fully installed. In this case, manually install the package in question on the target server. For example:

To manually install a .pkg file, type the following command:


# pkgadd -d pkg-name  -a admin-file

To manually install a patch, type the following command:


# patchadd -d patch-name -a admin-file

Then, run the unload command again.


Job hangs

Solution:

If the job appears to hang, type n1sh stop job job-ID to stop the job.

Next, find the PID of the PKG and use the pkill command to kill the process. For example:


# ps -ef |grep pkgadd
root 1235 913 0 Jun 07 ? 0:0 pkgadd
# pkill 1235

Last, run the unload command again.

Linux OS Update Deployment Failures

This section provides solutions for the following categories of failures that can occur during Linux OS update deployment:

Failures that Occur Before the Job is Submitted

The following common failures can occur before the job is submitted:


Target server is not initialized

Solution:

Check whether the add server feature osmonitor command was issued and whether it succeeded.


Another running job on the target server

Solution:

Only one job is allowed at a time on a server. Try again after the job completes.


Update is incompatible with operating system on target server

Solution:

Check whether the OS type of the target server matches one of the update OS types. Type show update update-name at the N1–ok> prompt to view the OS type for the update.


Target server is not in a good state or is powered off

Solution:

Check whether the target server is up and running. Type show server server-name at the N1–ok> prompt to view the server status. Type reset server server-name force to force a reboot.

Load Update Job Failures

The following are possible causes for Load Update job failures:


error: Failed dependencies:


A prerequisite package should be installed

Description:

Sometimes, Load Update jobs fail because either the same package already exists or because a later version of the package exists. Ensure that the package does not already exist on the target server if the job fails.

Solution:

Use an RPM tool to address and resolve Linux OS RPM dependencies.


Preinstall or postinstall scripts failure: Non-zero status


ERROR: ... script did not complete successfully

Solution:

Check the preinstallation or postinstallation scripts for possible errors to resolve this error.


Insufficient diskspace

Solution:

Load Update jobs might fail due to insufficient disk space. Check the available disk space by typing df -k. Also check the package size. If the package size is too large, create more available disk space on the target server.

Unload Update Job Failures

In the following unload command, update could be either the update name that appears in the list when you type show update all, or the actual package name on the target server.


N1-ok> unload server server update update

Stop Job Failures for Load and Unload Update

The following are stop job failures for loading or unloading update operations:

If you stop a Load Update or Unload Update job and the job does not stop, manually ensure that the following process is killed on the management server:


# ps -ef |grep swi_pkg_pusher
# ps -ef |grep rpm

Then, check any processes that are running on the manageable server:


# ps -ef |grep rpm, ...

Unload Server and Unload Group Failures

The following are possible causes for failures related to the commands unload server server-name update update-name and unload group group-name update update-name.


Job step failure without error details

Solution:

This message might indicate that the job was not successfully started internally. Contact a Sun Service Representative for more information.


Job step failure with vague error details: Connection to 10.0.0.xx

Solution:

This message might indicate that the uninstall failed because some RPMs were not fully installed. In this case, manually install the package in question on the target server. For example:

To manually install an RPM, type the following command:


# rpm -Uvh rpm-name

Then, run the unload command again.


Job hangs

Solution:

If the job appears to hang, type n1sh stop job job-ID to stop the job.

Next, find the PID of the RPM and use the pkill command to kill the RPM process. For example:


# ps -ef |grep rpm-name
root 1235 913 0 Jun 07 ? 0:0 rpm-name
# pkill 1235

Then run the unload command again.