Sun N1 System Manager 1.3 Troubleshooting Guide

Chapter 4 Problem Resolution Procedures

This section provides the procedures for resolving N1 System Manager problems. Many of the procedures provide solutions for more than one problem.

The following topics are discussed:

Checking for OS Monitoring Agents

Adding the OS monitoring feature to a managed server that has the base management feature installed might fail. The following job output shows the error:


N1-ok> show job 61
Job ID: 61
Date: 2005-08-16T16:14:27-0400
Type: Modify OS Monitoring Support
Status: Error (2005-08-16T16:14:38-0400)
Command: add server 192.168.2.10 feature osmonitor agentssh root/rootpasswd
Owner: root
Errors: 1
Warnings: 0

Steps
ID Type Start Completion Result
1 Acquire Host 2005-08-16T16:14:27-0400 2005-08-16T16:14:28-0400 Completed
2 Run Command 2005-08-16T16:14:28-0400 2005-08-16T16:14:28-0400 Completed
3 Acquire Host 2005-08-16T16:14:29-0400 2005-08-16T16:14:30-0400 Completed
4 Run Command 2005-08-16T16:14:30-0400 2005-08-16T16:14:36-0400 Error

Results
Result 1:
Server: 192.168.2.10
Status: -3
Message: Repeate attempts for this operation are not allowed.

This error indicates that SSH credentials have previously been supplied and cannot be altered. To avoid this error, issue the add server feature osmonitor command without agentssh credentials for instructions.

Use the grep command as follows to determine whether the OS monitoring agents were successfully installed.

Downloading ALOM 1.5 Firmware Updates

This section provides detailed information to help you download and prepare the firmware versions that are required to discover Sun servers that use ALOM 1.5.

ProcedureTo Download and Prepare ALOM 1.5 Firmware

Steps
  1. Log in as root to the N1 System Manager management server.

    The N1–ok prompt appears.

  2. Create directories into which the ALOM firmware update zip files are to be saved.

    Create separate directories for each server type firmware download. For example:


    # mkdir ALOM-firmware
    
  3. In a web browser, go to http://jsecom16.sun.com/ECom/EComActionServlet?StoreId=8.

    The downloads page appears.

  4. To download the ALOM 1.5 firmware zip file, log in and navigate to the ALOM 1.5, All Platforms/SPARC, English, Download.

    Download the file to the directory you created for the ALOM firmware in Step 2.

  5. Change to the directory where you downloaded the ALOM firmware file and untar the file.


    bash-3.00# tar xvf ALOM_1.5.3_fw.tar
    x README, 9186 bytes, 18 tape blocks
    x copyright, 93 bytes, 1 tape blocks
    x alombootfw, 161807 bytes, 317 tape blocks
    x alommainfw, 5015567 bytes, 9797 tape blocks

    The files are extracted.

Next Steps

Downloading V20z and V40z Server Firmware Updates

This section provides detailed information to help you download and prepare the firmware versions that are required to discover Sun Fire V20z and V40z servers.

ProcedureTo Download and Prepare Sun Fire V20z and V40z Server Firmware

Steps
  1. Log in as root to the management server.

    The N1–ok prompt appears.

  2. Create directories into which the V20z and V40z firmware update zip files are to be saved.

    Create separate directories for each server type firmware download. For example:


    # mkdir V20z-firmware V40z-firmware
    
  3. In a web browser, go to http://www.sun.com/servers/entry/v20z/downloads.html.

    The Sun Fire V20z/V40z Server downloads page appears.

  4. Download the Sun Fire V20z Server 2.4.0.8 NSV patch file.

    The download Welcome page appears. Type your username and password, and then click Login.

    The Terms of Use page appears. Read the license agreement carefully. You must accept the terms of the license to continue and download the firmware. Click Accept and then click Continue.

    The Download page appears. Several downloadable files are displayed.

  5. To download the V20z firmware zip file, click V20z BIOS and SP Firmware, English (nsv-v20z-bios-fw_V2.4.0.8.zip).

    Save the file to the directory that you created for the V20z firmware in Step 2.

  6. To download the V40z firmware zip file, click V40z BIOS and SP Firmware, English (nsv-v40z-bios-fw_V2.4.0.8.zip).

    Save the file to the directory you created for the V40z firmware in Step 2.

  7. Change to the directory where you downloaded the V20z firmware file.

    1. Type unzip nsv-v20z-bios-fw_V2.4.0.8.zip to unpack the zip file.

      The sw_images directory is extracted.

      The following files in the sw_images directory are used by the N1 System Manager to update V20z manageable server firmware:

      • Service Processor:

        sw_images/sp/spbase/V2.4.0.8/install.image

      • BIOS

        sw_images/platform/firmware/bios/V1.34.6.2/bios.sp

  8. Change to the directory where you downloaded the V40z firmware zip file.

    1. Type unzip nsv-v40z-bios-fw_V2.4.0.8.zip to unpack the zip file.

      The sw_images directory is extracted.

      The following files in the sw_images directory are used by the N1 System Manager to update V40z manageable server firmware:

      • Service Processor:

        sw_images/sp/spbase/V2.4.0.8/install.image

      • BIOS:

        sw_images/platform/firmware/bios/V1.34.6.2/bios.sp

Next Steps

Regenerating Common Agent Container Security Keys

This section provides the procedure for regenerating the N1 System Manager security keys.

ProcedureTo Regenerate Common Agent Container Security Keys

Steps
  1. Log in as root on the management server.

  2. Stop N1 System Manager.

    • On a Solaris management server, type svcadm disable -s n1sm.

    • On a Linux management server, type /etc/init.d/n1sminit stop. Wait for all process to stop.

    Wait for all process to stop before continuing.

  3. Regenerate security keys using the create-keys subcommand.

    If the management server is running Linux:


    # /opt/sun/cacao/bin/cacaoadm create-keys --force
    

    If the management server is running the Solaris OS:


    # /opt/SUNWcacao/bin/cacaoadm create-keys --force
    
  4. Restart the N1 System Manager.

    • On a Solaris management server, type svcadm enable n1sm.

    • On a Linux management server, type /etc/init.d/n1sminit start. Wait for all process to stop.

Resetting Email Accounts for ALOM-based Managed Servers

If you have configured a separate mail server and account for the N1 System Manager to receive hardware event notifications, and the N1 System Manager is not receiving hardware event notifications from ALOM architecture manageable servers, it is possible that:

  1. The mail server is not configured correctly

  2. The email configuration has been invalidated by a mail server IP address change

  3. The email configuration has been invalidated by a mail server domain name change

  4. The manageable servers email account has been compromised or corrupted.

To resolve issues 1 through 3, log on to the management server as root and run the command n1smconfig -A to start the email reconfiguration process, and then either:

To resolve issue 4, proceed as described by To Reset Email Accounts for ALOM-based Managed Servers

ProcedureTo Configure the ALOM Email Alert Settings

Steps
  1. Log in as root to the management server management server.

  2. Type n1smconfig -A to start the ALOM email alert settings configuration process.

    You are notified that proper settings are required to send email alerts, and the existing values are displayed. You are then asked whether to modify the email alert settings.

  3. Choose whether to modify the email alert settings.

    • Type n to accept the displayed settings. The email alert configuration process exits to the system prompt.

    • Type y to modify the email alert configuration.

      You are prompted for the email alert user name.

  4. Specify the email alert user name.

    Type the account name that is to be used for the email alerts.

    For example: n1smadmin

    You are prompted for the email alert folder.

  5. Specify the email folder in which the email alerts are to be stored.

    Type the name of an email folder for the alert account, for example, inbox

    You are prompted for the email protocol

  6. Specify the email alert protocol.

    Type the name of the email protocol used by the management server. Valid entries are pop3 or imap.

    You are prompted for the email alert user account password.

  7. Type the password for the email alert user account.

    Type the password for the email alert user account.

    You are prompted for the email alert user account email address.

  8. Type the user account email address.

    For example: n1smadmin@company.com

    You are prompted for the IP address of the email server.

  9. Specify the IP address of the email server.

    • If you have installed and enabled an email server on the management server, type the IP address of the management server management network interface.

    • If you have installed and enabled an email server on a different machine that is accessible by the management server management network interface, type the IP address of that machine.

    The values you have specified are displayed, and you are asked whether you want to use the values.

  10. Choose whether to accept the displayed email alert settings.

    • Type n if the settings are not correct. The ALOM email alert settings process is restarted, and you are prompted to specify the email alert user name.

    • Type y to use the displayed email alert settings.

      The settings are displayed again, and you are asked whether you want to apply the settings.

      Type y to apply the settings, or type n to exit to the command prompt.

ProcedureTo Reset Email Accounts for ALOM-based Managed Servers

Before You Begin

This procedure provides the steps required to replace a compromised or corrupt ALOM email account on a managed server. The ALOM email addresses should be reserved for use only by the N1 System Manager.

Confirm that the problem is related to the fact that email alerts are not being received for the server. It is possible that the management server, or some other chosen server that can be accessed by the N1 System Manager, has not been configured correctly as an email server, or that email configuration has been invalidated due to other issues such as network error or domain name change.

Before trying the following procedure, verify that email sent from the ALOM server can be received by the designated email server, by configuring an independent mail client, such as Mozilla, with the same mail server IP, username and password. Then use the telnet command to access an ALOM server, and execute the resetsc -y command to generate a warning message. Check if the mail client is able to receive the ALOM warning message. If it is, you do not need to follow this procedure.

See SP-Based Discovery in Sun N1 System Manager 1.3 Discovery and Administration Guide for information about default telnet login and passwords for servers.

Before trying the following procedure, verify also that the N1 System Manager has access to the designated email server by using the telnet command to access an ALOM server, and executing the showsc command. Make sure the following parameters/values are set as shown:

If you do not see these settings, or if you see incorrect values for the mgt_mailalert email address, follow this procedure.

Steps
  1. Log in to the N1 System Manager.

    See To Access the N1 System Manager Command Line in Sun N1 System Manager 1.3 Discovery and Administration Guide for details.

  2. Switch off monitoring for ALOM-based manageable servers.

    Set the monitored attribute to false by using the set server command.


    N1-ok> set server server monitored false
    

    In this example, server is the name of the ALOM-based manageable server for which you want to reset the email account. Executing this command disables monitoring of the server.

    • If the ALOM-based servers are in the same group, use the set group command to switch off monitoring for the server group.


      N1-ok> set group group monitored false
      

      In this example, group is the name of the group of ALOM-based manageable servers for which you want to reset email accounts. Executing this command disables monitoring of the server group.

  3. Change the email address for the server using the n1smconfig command with the —A option.

    ALOM-based servers support email addresses of up to 33 characters in length.


    Note –

    If you manually configured ALOM-based servers to send event notifications by email to other addresses, using the telnet command and the setsc mgt_mailalert command, those addresses will not be changed by running the n1smconfig command.


  4. Switch on monitoring for the ALOM-based manageable server.

    Set the monitored attribute to true by using the set server command.


    N1-ok> set server server monitored true
    
    • If the ALOM-based servers are in the same group, use the set group command to switch on monitoring for the server group.


      N1-ok> set group group monitored true
      

      In this example, group is the name of the group of ALOM-based manageable servers for which you want to reset email accounts. Executing this command enables monitoring of the server group.

Updating Management Server System Files

This section provides the procedures for configuring the management server system files.

ProcedureTo Update the /etc/hosts File

Steps
  1. Log in as root on the management server.

  2. Edit /etc/hosts and ensure that the entries are similar to the following example:

    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1           localhost
    111.222.333.44     machine-name loghost

    where 111.222.333.44 is the IP address of the N1 System Manager server, and machine-name is the name of the N1 System Manager management server.

    For example, if the machine name is n1manager, and the assigned IP address for eth0 is 129.123.111.12, then the /etc/hosts file should contain the following settings:

    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1          localhost.localdomain localhost
    129.123.111.12     n1manager loghost

    You must reboot the system after updating the /etc/hosts file.

ProcedureTo Update the ssh_known_hosts File

The management server /etc/opt/sun/n1gc/ssh_known_hosts file contains the name, IP address, and encrypted access keys for SSH-accessible servers. A stale or obsolete entry for a server in the /etc/opt/sun/n1gc/ssh_known_hosts file prevents SSH access to that server. The solution is to remove the entry for server from the /etc/opt/sun/n1gc/ssh_known_hosts file as follows.

Steps
  1. Note the name and IP address of the inaccessible server.

  2. Log in as root on the management server.

  3. Edit the /etc/opt/sun/n1gc/ssh_known_hosts file and delete the entry for the inaccessible server.

ProcedureTo Update the /etc/resolv.conf File

Step

    Edit /etc/resolv.conf and ensure that the entries are similar to the following:

    nameserver server 1 IP address
    nameserver name server 2 IP address
    nameserver name server 3 IP address
    domain your-domain-name
    search your-domain-name
    

    For example, assume the IP address of the first DNS server is 129.123.111.12, the second DNS server is 129.123.111.24, and the third DNS server is 129.123.111.36. If your company domain name is mydomain.com, then the /etc/resolv.conf file would contain the following lines.

    nameserver 129.123.111.12
    nameserver name 129.123.111.24
    nameserver name 129.123.111.36
    domain mydomain.com
    search mydomain.com

ProcedureTo Disable Managed Server Automatic Configuration

The following procedure disables the automatic configuration of manageable servers during discovery.

Steps
  1. Log in as root on the management server.

  2. Edit the /etc/opt/sun/n1gc/domain.properties file and add the following line to the file:

    com.sun.hss.domain.internal.discovery.initializeDevice=false

    The N1 System Manager system must be restarted for auto configuration disabling to take effect. Note that once auto configuration is disabled, any servers in a factory default state cannot be discovered until their SSH and IPMI accounts are configured. For further information, see Setting Up Manageable Servers in Sun N1 System Manager 1.3 Site Preparation Guide.

Using a Managed Server to Patch OS Distributions

When you are using a patch server to perform the following tasks, you need to have root access to both the management server and the manageable server at the same time. For some tasks, you need to first patch the manageable server, then mount the management server and patch the distribution.

ProcedureTo Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on an x86 Patch Server

This procedure describes how to patch a Solaris 9 OS distribution in the N1 System Manager. The steps in this procedure need to be performed on both the patch server and the management server. The patches described are necessary for the N1 System Manager to be able to provision Solaris OS 9 update 7 and below. This procedure is not required for Solaris OS 9 update 8 and above.

Consider opening two terminal windows to complete the steps. The following steps first guide you through patching the patch server and then provide steps for patching the distribution.

Before You Begin
Steps
  1. Patch the Solaris 9 OS on x86 patch server.

    1. Log in as root.


      % su
      password:password
      

      The root prompt appears.

    2. Reboot the Solaris 9 patch server to single-user mode.


      # reboot -- -s
      
    3. In single-user mode, change to the patch directory.


      # cd /patch
      
    4. Install the patches.


      # patchadd -M . 117172-17
      # patchadd -M . 117468-02
      

      Tip –

      Pressing Control+D returns you to multiuser mode.


  2. Prepare to patch the distribution on the management server.

    1. Log in to the management server as root.


      % su
      password:password
      

      The root prompt appears.

    2. Edit the /etc/exports file.


      # vi /etc/exports
      
    3. Change /js *(ro,no_root_squash) to /js *(rw,no_root_squash).

    4. Save and close the /etc/exports file.

    5. Restart NFS.


      # /etc/init.d/nfs restart
      
  3. Patch the distribution that you copied to the management server.

    1. Log in to the Solaris 9 patch server as root.


      % su
      password:password
      

      The root prompt appears.

    2. Mount the management server.


      # mount -o rw management-server-IP:/js/DISTRO_ID /mnt
      
    3. Install the patches by performing one of the following actions:

      • If you are patching an x86 distribution, type the following commands:


        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117172-17
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117468-02
        
      • If you are patching a SPARC distribution, type the following commands:


        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117171-17
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117175-02
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 113318-20
        

        Note –

        You will receive a partial error for the first patch installation. Ignore this error.


    4. Unmount the management server.


      # unmount /mnt
      
  4. Restart NFS on the management server.

    1. Edit the /etc/exports file.


      # vi /etc/exports
      
    2. Change /js *(rw,no_root_squash) to /js *(ro,no_root_squash).

    3. Restart NFS.


      # /etc/init.d/nfs restart
      

      NFS is restarted.

      The Solaris 9 OS on SPARC distribution is ready for deployment to target servers.

  5. Fix the Solaris 9 OS on x86 distribution.

    1. Change to /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris.


      # cd /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris
      
    2. Re-create the bootenv.rc link.


      # ln -s ../../tmp/root/boot/solaris/bootenv.rc .
      

      The Solaris 9 OS on x86 distribution is ready for deployment to target servers.

Troubleshooting

If you want to patch another distribution, you might have to delete the /patch/117172-17 directory and re-create it using the unzip 117172-17.zip command. When the first distribution is patched, the patchadd command makes a change to the directory that causes problems with the next patchadd command execution.

This patch is not needed for the Solaris 9 update 8 build 5 OS and beyond. Versions of the Solaris OS from Solaris 9 9/05 s9x_u8wos_05, therefore, do not require this patch.

ProcedureTo Patch a Solaris 9 OS Distribution by Using a Solaris 9 OS on a SPARC Patch Server

This procedure describes how to patch a Solaris 9 OS distribution in the N1 System Manager. The steps in this procedure need to be performed on the manageable server and the management server. Consider opening two terminal windows to complete the steps. The following steps first guide you through patching the manageable server and then provide steps for patching the distribution.

Before You Begin
Steps
  1. Set up and patch the Solaris 9 OS on SPARC machine.

    1. Log in to the Solaris 9 machine as root.


      % su
      password:password
      
    2. Reboot the Solaris 9 machine to single-user mode.


      # reboot -- -s
      
    3. In single-user mode, change to the patch directory.


      # cd /patch
      
    4. Install the patches.


      # patchadd -M . 117171-17
      # patchadd -M . 117175-02
      # patchadd -M . 113318–20
      

      Tip –

      Pressing Control+D returns you to multiuser mode.


  2. Patch the distribution that you copied to the management server.

    1. Log in to the Solaris 9 machine as root.


      % su
      password:password
      
    2. Mount the management server.


      # mount -o rw management-server-IP:/js/DISTRO_ID /mnt
      
    3. Install the patches by performing one of the following actions:

      • If you are patching a Solaris OS on x86 software distribution, type the following commands:


        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117172-17
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117468-02
        
      • If you are patching a Solaris OS on SPARC software distribution, type the following commands:


        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117171-17
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 117175-02
        # patchadd -C /mnt/Solaris_9/Tools/Boot/ -M /patch 113318-20
        

        Note –

        You will receive a partial error for the first patch installation. Ignore this error.


    4. Unmount the management server.


      # unmount /mnt
      
  3. Restart NFS on the management server.

    1. Edit the /etc/exports file.


      # vi /etc/exports
      
    2. Change /js *(rw,no_root_squash) to /js *(ro,no_root_squash).

    3. Restart NFS.


      # /etc/init.d/nfs restart
      

      NFS is restarted.

      The Solaris 9 OS on SPARC distribution is ready for deployment to target servers.

  4. Fix the Solaris 9 OS on x86 distribution.

    1. Change to /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris.


      # cd /js/<distro_id>/Solaris_9/Tools/Boot/boot/solaris
      
    2. Re-create the bootenv.rc link.


      # ln -s ../../tmp/root/boot/solaris/bootenv.rc .
      

      The Solaris 9 OS on x86 distribution is ready for deployment to target servers.

Troubleshooting

If you want to patch another distribution you might have to delete the /patch/117172-17 directory and re-create it using the unzip 117172-17.zip command. When the first distribution is patched, the patchadd command makes a change to the directory that causes problems with the next patchadd command execution.