C H A P T E R  2

Recovering From Installation Problems

For information about how to resolve problems during installation of the Solaris Operating System and the Netra HA Suite, see the following sections. These sections include information about how to resolve problems related to Solaris JumpStarttrademark.


Recovering From General Installation Problems

This section describes how to recover from an error that is not related to the nhinstall tool.

Incorrect Software Is Installed on the Cluster Nodes

If incorrect software is installed on the nodes when you use the nhinstall tool, perform the following procedure.

procedure icon  To Investigate Why Incorrect Software Is Installed on the Cluster Nodes

  1. Search the console of the installation server for the IP address of the boot server. For example, for a Solaris installation:


    Beginning system identification...
    Searching for configuration file(s)...
    Using sysid configuration file 10.101.1.253:/export/nhjumpstart/101/10/sysidcfg
    Search complete.
    

    For a Linux installation:


    IP-Config: Got DHCP answer from 10.101.1.253, my address is \
    10.101.1.10
    IP-Config: Complete:
           device=eth0, addr=10.101.1.10, mask=255.255.255.0, \
           gw=10.101.1.253, host=cp3020-1, domain=, nis-domain=(none),
           bootserver=10.101.1.253, rootserver=10.101.1.253, \
           rootpath=/dist/mvista/CGE_4.0/target_tdp<7>eth0: no IPv6 \
           routers present
    

    The IP address of the boot server in this example is the class C address 10.101.1.253.

    Alternatively, use the name of the boot server to find the boot server IP address.

  2. Find the IP address of your installation server.

  3. Compare the address of the boot server with that of the installation server.

    If the addresses are not the same, your node is being booted by the wrong machine. Perform the following steps:

    1. If the boot server is running the Solaris OS, access the console of the wrong boot server as follows:

      1. Delete the MAC address of any nodes in your cluster in the /etc/ethers file.

      2. Delete the node parameters of any nodes in your cluster in the /etc/bootparams file.

      3. Delete the node parameters of any nodes in your cluster in the DHCP configuration files under /var/dhcp

      4. Restart the installation.

      OR

    2. If the boot server is running Linux, access the console of the wrong boot server as follows:

      1. Delete the node parameters of any nodes in your cluster in the DHCP configuration file /etc/dhcpd.

      2. Restart the DHCP daemon.

    3. Restart the installation

  4. If you have not solved this problem, it is possibly because you have two installation servers for this cluster.

    Confirm the presence of a second installation server using the /usr/sbin/bpgetfile command on a machine running on the same local network as your installation server.

    If the problem persists, contact your customer support center.


Recovering From nhinstall Problems

This section describes how to recover from error scenarios that can occur during installation using the nhinstall tool.

The nhinstall tool installs the Solaris Operating System and the Netra HA Suite on a cluster. If the nhinstall tool encounters an error, it issues a message and stops. When the nhinstall tool encounters an error, it does not continue to search for other errors. You must fix the error and relaunch the nhinstall tool from the point at which it failed. If the nhinstall tool encounters another error, it stops again.

The nhinstall Tool Stops During Installation

If the nhinstall tool stops during the installation of a node, perform the following procedure.

procedure icon  To Investigate Why the nhinstall Tool Stops During Installation

  1. Identify the problem by using the error message displayed on the installation server.

    Possible problems are:

    • local command failure on the installation server

    • rsh command failure due to a connection or permission problem

    • remote command failure

  2. Correct the problem.

    If the nhinstall tool stops due a problem with Solaris JumpStart, see Solaris JumpStart Installation Fails During nhinstall Installation.

  3. Restart the nhinstall tool with the same options that you used to launch it the first time:

    On the Solaris OS:


    # /opt/SUNWcgha/sbin/nhinstall -r config_file_directory -l logfile
    

    On Linux:


    # /opt/sun/sbin/nhinstall -r config_file_directory -l logfile
    

    The nhinstall tool resumes from the point at which it stopped.

    If you have modified the cluster_definition.conf file to correct the error, the nhinstall tool displays a warning that the configuration has changed. If the change that you made to the file makes the cluster incoherent, you must reset the installation for a new installation. For information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Manual Installation Guide for the Solaris OS.

  4. If the nhinstall tool stops again, repeat Step 1 through Step 6 until the tool completes successfully.

  5. If you cannot resolve this problem, contact your customer support center.

Solaris JumpStart Installation Fails During nhinstall Installation

If the Solaris JumpStart installation of a master-eligible node fails, perform the following procedure.

procedure icon  To Investigate Why the JumpStart Fails During nhinstall Installation

  1. Confirm that the installation server and the first network interface of the master-eligible node are connected to the same switch.

    • If they are, go to Step 4.

    • If they are not, do the following:

      a. Configure your installation server as described in the Netra High Availability Suite 3.0 1/08 Foundation Services Getting Started Guide.

      b. Restart the installation.


      # /opt/SUNWcgha/sbin/nhinstall -r config_file_directory -l logfile
      

      If the Solaris JumpStart software does not restart, delete the /tmp/.install_client.lck file.

  2. Confirm that a router is not connected between the installation server and the master-eligible node.

    • If no router connects the installation server and the master-eligible node, go to Step 7.

    • If a router connects the installation server and the master-eligible node, do the following:

      a. Remove the router.

      b. Reconfigure your hardware as described in the Netra High Availability Suite 3.0 1/08 Foundation Services Getting Started Guide.

      c. Restart the installation:


      # /opt/SUNWcgha/sbin/nhinstall -r config_file_directory -l logfile
      

      If the Solaris JumpStart software does not restart, delete the /tmp/.install_client.lck file.

  3. If you cannot resolve this problem, contact your customer support center.