C H A P T E R  12

Replacing Hardware in a Cluster

This chapter covers the following topics:


Preparing to Replace Hardware in a Cluster

Follow these guidelines before replacing hardware in a cluster:

The following table points to the procedures for replacing boards, Ethernet cards, and disks, for each type of node.


TABLE 12-1   Reference for Replacing Hardware 
Node Type Hardware Type For Information
Vice-master CPU Board Replacing a CPU Board on a Node
Ethernet Card Replacing Ethernet Cards on a Vice-Master or Dataless Node
Disk Replacing the Disk on the Vice-Master Node
Diskless CPU Board Replacing a CPU Board on a Node or Replacing a CPU Board on a Diskless Node
Ethernet Card Replacing Ethernet Cards on a Diskless Node
Disk N/A
Dataless CPU Board Replacing a CPU Board on a Node
Ethernet Card Replacing Ethernet Cards on a Vice-Master or Dataless Node
Disk Replacing a Dataless Node Disk


Replacing a CPU Board on a Node

To replace the board on the vice-master node, a diskless node, or a dataless node, perform the following procedure. If the node is a diskless node that is using the DHCP client ID boot policy, perform the procedure in Replacing a CPU Board on a Diskless Node.

procedure icon  To Replace a Board on a Node

  1. Verify that the new board is of the same type as the old board.

  2. Replace the board using information in the hardware documentation at http://www.sun.com/products-n-solutions/hardware/docs/.

  3. Log in to the new node.

  4. Get the ok prompt.

  5. Configure the OpenBoottrademark PROM parameters.



    Note - For x64 platforms, refer to the hardware documentation for information about performing tasks that reference OpenBoot PROM commands and, therefore, apply only to the UltraSPARC architecture.



    The following examples show the OpenBoot PROM parameters for an UltraSPARC-based diskless node and an UltraSPARC®-based master-eligible node.

    • An UltraSPARC diskless node has the following OpenBoot PROM parameters:


      ok> setenv local-mac-address? true
      ok> setenv auto-boot? true
      ok> setenv diag-switch? false
      ok> setenv boot-device net:dhcp,,,,,5 net2:dhcp,,,,,5
      

    • An UltraSPARC-based master-eligible node or dataless node has the following OpenBoot PROM parameters:


      ok> setenv local-mac-address? true
      ok> setenv auto-boot? true
      ok> setenv diag-switch? false
      ok> setenv boot-device disk net
      



    Note - If the auto-boot-retry variable exists on your system, it must be set to true; if it does not exist on your system, disregard references to it in the preceding examples.



  6. Reboot the node:


    ok> boot 
    

  7. Log into the node as superuser.

  8. Verify that the node is configured correctly:


    # nhadm check
    


Replacing a CPU Board on a Diskless Node

To replace the board on a diskless node that is using the DHCP static boot policy, perform the following procedure.


Replacing Ethernet Cards on a Vice-Master or Dataless Node

To replace the Ethernet cards on the vice-master node or a dataless node, perform the following procedure.

procedure icon  To Replace Ethernet Cards on the Vice-Master Node or a Dataless Node

  1. Verify that the new Ethernet cards are of the same type as the old Ethernet cards.

  2. Replace the Ethernet cards using information in the hardware documentation at http://www.sun.com/products-n-solutions/hardware/docs/.

  3. Power on the node.

  4. Log in to the node as superuser.

  5. Verify that the node is configured correctly:


    # nhadm check
    


Replacing Ethernet Cards on a Diskless Node

To replace the Ethernet cards on diskless nodes with the DHCP dynamic boot policy or the DHCP client ID boot policy, perform the procedure in Replacing Ethernet Cards on a Vice-Master or Dataless Node. To replace the Ethernet cards on diskless nodes with the DHCP static boot policy, perform the following procedure.

procedure icon  To Replace Ethernet Cards on a Diskless Node With the DHCP Static Boot Policy

  1. Verify that the new Ethernet cards are of the same type as the old Ethernet cards.

  2. Identify the IP address - Ethernet address couplet for the network interface cards that are to be replaced.

  3. Replace the Ethernet cards by using the hardware documentation at http://www.sun.com/products-n-solutions/hardware/docs/.

  4. Record the Ethernet addresses of the new network cards.

    To find the Ethernet addresses of the network cards, perform the following step:

    1. Log in to the diskless node.

    2. Identify the Ethernet address of NIC0:


       ok> banner
      

      The Ethernet address of NIC0 is provided in the output.

      The Ethernet address of NIC1 is derived as follows:

      NIC0 + 0x1

      For example, if the output of the banner command is this:

      Ethernet address 8:0:20:fa:2a:6e, Host ID: 80fa2a6e

      The Ethernet address of NIC0 is 8:0:20:fa:2a:6e, and the Ethernet address of NIC1 is 8:0:20:fa:2a:6f.

      In the DHCP configuration files, the Ethernet addresses of NIC0 and NIC1 are given as 01080020FA2A6E and 01080020FA2A6F, respectively.

      For another example, the output of the banner command is as follows:

      Ethernet address 8:0:20:f9:b3:60, Host ID: 80f9b360

      In the DHCP configuration files in the /SUNWcgha/remote/var/dhcp/ directory, the Ethernet addresses of NIC0 and NIC1 are given as 01080020F9B360 and 01080020F9B361, respectively.

  5. Log in to the master node as superuser.

  6. Modify the DHCP configuration for NIC0:


    # pntadm -M NIC0IP-address -i newEthernet-address \-f 'PERMANENT+MANUAL' -m NIC0IP-address subnet1
    

    The parameters of this command are as follows:

    where:


    NIC0IP-address is the IP address of the NIC0 interface
    newEthernet-address is the Ethernet address of the NIC0 interface in DHCP configuration format
    subnet1 is the subnet connecting the NIC0 interfaces

  7. Modify the DHCP configuration for NIC1:


    # pntadm -M NIC1IP-address -i newEthernet-address \-f 'PERMANENT+MANUAL' -m NIC1IP-address subnet2
    

    The parameters of this command are as follows:

    where:


    NIC1IP-address is the IP address of the NIC1 interface
    newEthernet-address is the Ethernet address of the NIC1 interface in DHCP configuration format
    subnet2 is the subnet connecting the NIC1 interfaces

  8. Refresh the DHCP configuration on the master node:


    # pkill -1 in.dhcpd
    

  9. Reboot the diskless node:


    ok> boot 
    

  10. Verify that the node is configured correctly:


    # nhadm check
    


Replacing the Disk on the Vice-Master Node

This section describes how to replace the disk on the vice-master node.

procedure icon  To Replace the Disk on the Vice-Master Node Using IP Replication



Note - This procedure is supported only for the Solaris OS.



  1. Verify that the new disk is the same hardware type and can have the same disk partition configuration as the old disk.

  2. Replace the hardware by using the hardware documentation at http://www.sun.com/products-n-solutions/hardware/docs/.

  3. Install the Solaris OS on the vice-master node. Keep the original partitioning configuration (reformat the new disk by recreating the format of the old disk).

  4. Install the Netra HA Suite software on the vice-master node.

    For information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Manual Installation Guide for the Solaris OS.

  5. If logical partitioning or IDE disks are used, follow this step and then jump to Step 9. Otherwise, go to Step 6.

    Force a full synchronization:


    # nhcrfsadm -f all

    Power on the vice-master node.

    The master node will resynchronize the vice-master disk automatically.

  6. If SCSI disks are used and logical partitioning is not used, follow this step and the rest of the procedure.

    Power on the vice-master node. The master node detects that the vice-master node is not synchronized. A message is displayed in the system log file, asking whether you want to restart the replication.

  7. Log in to the master node as superuser.

  8. Accept a replication restart:


    # nhcrfsadm -a
    

  9. (All configurations, IDE or SCSI) Verify that the synchronization is complete:

    For versions earlier than the Solaris 10 OS:


    # /usr/opt/SUNWesm/sbin/scmadm -S -M
    

    For the Solaris 10 OS and later:


    # /usr/sbin/dsstat 1
    

    While the synchronization is taking place, the sync label is displayed. When the synchronization is complete, the sync label is replaced by the replicating label.

  10. Verify that the node is configured correctly:


    # nhadm check
    

  11. Power on the dataless nodes or diskless nodes.

procedure icon  To Replace the Disk on the Vice-Master Node Using Shared Disk

Clusters using shared disk are supported only on the Solaris OS. The information presented in this section does not apply to Linux clusters.

  1. Replace the hardware by using the hardware documentation at http://www.sun.com/products-n-solutions/hardware/docs/.

  2. Reformat the new disk by recreating the format of the old disk.

    You must restore the local file system from backup.

  3. Restore the disk configuration.

    1. Install the Solaris Operating System on the vice-master node.

    2. Install the Netra HA Suite software on the vice-master node.

    For information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Manual Installation Guide for the Solaris OS.

  4. Create the database replicas on the dedicated partition:


    # metadb -a -c 3 -f /dev/rdsk/c0t0d0s7

  5. Reboot the node in cluster mode.

    The vice-master node joins the cluster.



    Note - Reliable NFS will not notice that the disksets on the vice master have not yet been recreated, so a switchover or failover is prohibited.



  6. Log in to the master node as superuser.

  7. Remove the vice-master node from the diskset node names:


    # metaset -s nhas_diskset -d -h netraMEN2-cgtp

  8. Re-add the vice-master node to the diskset node names:


    # metaset -s nhas_diskset -a -h netraMEN2-cgtp


Replacing Disks on Both Master-Eligible Nodes Without Cluster Shutdown

Disks in master-eligible nodes can be replaced without fully shutting down a cluster. The new disks may have different geometry than the old disks, however, the new disk in one node must be identical to the new disk in the other node, and the disks must be replaced sequentially in both nodes. When you replace disks as described in this section, the cluster is not single-fault tolerant.

This section describes how to replace disks on both master-eligible nodes without fully shutting down the cluster.

procedure icon  To Replace Disks on Both Master-Eligible Nodes Using IP Replication Without Full Cluster Shutdown



Note - For this release of the Netra HA Suite product, this procedure is supported for only master-eligible nodes that are running the Solaris OS.



  1. Replace the disk in the vice-master node using the procedure described in the hardware documentation at:

    http://www.sun.com/products-n-solutions/hardware/docs/

  2. Install the Solaris OS on the vice-master node. Keep the original partitioning configuration as much as possible (reformat the new disk by recreating the format of the old disk). The following conditions must be met:

    • Device names of the new replicated and bitmap slices/partitions must be preserved.

    • Minor and major numbers of these devices must be preserved.

    • New replicated slices/partitions must not be smaller than the original slices/partitions.

    • The new bitmap partitions must be at least 1 Kbyte + 4 Kbytes per Gbyte of data in the associated new replicated slice/partition.

  3. Install the Netra HA Suite software on the vice-master node.

    For information, see the Netra High Availability Suite 3.0 1/08 Foundation Services Manual Installation Guide for the Solaris OS.

  4. If logical partitioning or IDE disks are used, follow this step and then jump to Step 11. Otherwise, go to Step 9.

    On the master node, force a full synchronization:


    master# nhcrfsadm -f all

    Start the vice-master node with the Foundation Services.

    Remove the /etc/opt/SUNWcgha/not_configured file, which was created automatically during the installation process and reboot the vice-master node. The master node will resynchronize the vice-master disk automatically.

  5. If SCSI disks are used and logical partitioning is not used, follow this step and the rest of the procedure.

    Start the vice-master node with the Foundation Services. Remove the /etc/opt/SUNWcgha/not_configured file, which was created automatically during the installation process and reboot the vice-master node. The master node detects that the vice-master node is not synchronized. A message is displayed in the system log file, asking whether you want to restart the replication.

  6. Accept a replication restart by running the following command on the master node:


    master# nhcrfsadm -a
    

  7. (All configurations, IDE or SCSI) On the master node, verify that the synchronization is complete:

    For versions earlier than the Solaris 10 OS:


    master# /usr/opt/SUNWesm/sbin/scmadm -S -M
    

    For the Solaris 10 OS and later:


    master# /usr/sbin/dsstat 1
    

    While the synchronization is taking place, the sync label is displayed. When the synchronization is complete, the sync label is replaced by the replicating label.

  8. Verify that the vice-master node is configured correctly by running the following command on the vice-master node:


    vice-master# nhadm check
    

  9. Abruptly terminate the master node by running the following command on the master node:


    master# uadmin 1 0
    

    Running this command will halt the CPU(s) immediately and failover will occur. The first node with an upgraded disk will take the mastership and the master role. Switchover should not be used, as the vice-master with old disk might be rejected because of insufficient disk space for replication.

  10. Repeat steps Step 1 through Step 12 to finish the upgrade of the second node.


Replacing a Dataless Node Disk

This section describes how to replace a dataless node disk.

procedure icon  To Replace a Dataless Node Disk

  1. Verify that the new disk is of the same hardware type as the old disk.

  2. Replace the hardware by referring to information in the hardware documentation on http://www.sun.com/products-n-solutions/hardware/docs/

  3. Reformat the new disk by recreating the format of the old disk.

  4. Use the disk backup to restore the file system on the new disk.

    For example, if the backup was created on the Solaris OS using the ufsdump command, use the ufsrestore command to restore the file system. If the backup was created on Linux using the dump command, use the restore command to restore the file system.

  5. Reboot the dataless node.

  6. Verify that the node is configured correctly:


    # nhadm check