Sun Open Telecommunications Platform 1.1 Installation and Administration Guide

N*N Topology Administration

This section provides the procedures for adding new OTP host to an N*N clustered OTP system, and for repairing an OTP host within an N*N clustered OTP system.

The following topics are discussed:

Adding a Host to the Existing Cluster

This section provides the procedure for adding a host to an existing clustered OTP system.

ProcedureTo Add a Host to the Existing Cluster

Before You Begin

Ensure that the sponsoring host (the first OTP host of the cluster) is added to the host list in the service provisioning service. See To Add Hosts to the External OTP Installation Server.

  1. Install the Solaris OS on the new OTP host as described in Installing Solaris 10 Update 2 and the Remote Agent on the OTP Hosts.

  2. Configure the Solaris OS on the new OTP host as described in Configuring Solaris 10 Update 2

  3. Create a mount point /var/otp on the new OTP host.

    # mkdir -p /var/otp

  4. Add the following entry to the /etc/vfstab file.

    /dev/md/sps-dg/dsk/d0 /dev/md/sps-dg/rdsk/d0 /var/otp ufs 2 no global,logging

  5. Provision OTP on the new OTP host using either the graphical user interface or the command line interface.

    1. Perform the following steps to provision OTP using the graphical user interface.

    2. Perform the following steps to provision OTP through the command line interface.

      • Run the deployOTPMultiNode script with the -addNode option.

        Type the command

        /opt/SUNWotp10/CLI/deployOTPMultiNode -addNode /local-path/inputOTPMultiNode.dat

        where local-path is the path to the file inputOTPMultiNode.dat.

      • Create metadb on the host and add the host to metaset as described in To Create Shared Storage on the Clustered OTP System.

      • Run the deployOTPMultiNode script with the -addNodeCont option.

        Type the command

        /opt/SUNWotp10/CLI/deployOTPMultiNode -addNodeCont /local-path/inputOTPMultiNode.dat

        where local-path is the path to the file inputOTPMultiNode.dat.


      Note –

      Quorum automatic configuration applies only to two-host clustered OTP systems. If you disable quorum automatic configuration on a two-host cluster by choosing no, you must manually configure the quorum for the two-host cluster and reset the cluster configuration as described in Installing the Open Telecommunications Platform on a Clustered OTP System.

      For further information, see “Quorum and Quorum Devices” in Sun Cluster Concepts Guide for Solaris OS to understand the requirements for Quorum. Reconfigure the quorum as described in “Administering Quorum” in Sun Cluster System Administration Guide for Solaris OS.

      You can use the scsetup(1M) utility to add a node to the node list of an existing quorum device. To modify a quorum device's node list, you must remove the quorum device, modify the physical connections of nodes to the quorum device you removed, then add the quorum device to the cluster configuration again. When a quorum device is added, scconf(1M) automatically configures the node-to-disk paths for all nodes attached to the disk.


  6. Set the system property for the otp-system-rg resource group to false.

    Type the command scrgadm -c -g otp-system-rg -h RG_system=false

  7. Determine the current IPMP groups.

    Type scrgadm -pvv | grep otp-lhn:NetIfList | grep value to list the current IPMP groups. For example:


    # scrgadm -pvv | grep otp-lhn:NetIfList | grep value
        (otp-system-rg:otp-lhn:NetIfList) Res property value: sc_ipmp0@1
  8. Determine the node ID value as follows:


    # scconf -pvv | grep pcl3-ipp2 | grep ID
    (pcl3-ipp2) Node ID: 2 

    The IPMP group for the new node in this example would be sc_ipmp0@2

  9. Add the IPMP group for the newly added host to the Logical Host Name resource.

    Type the command

    scrgadm -c -j otp-lhn -x NetIfList=list of IPMP groups

    where list of IPMP groups is the current list of IPMP groups. For example:


    # scrgadm -c -j otp-lhn -x NetIfList=sc_ipmp0@1,sc_ipmp0@2
    
  10. Determine the current node list.

    Type the command scrgadm -pvv | grep otp-system-rg | grep Nodelist. For example:


    # scrgadm -pvv | grep otp-system-rg | grep Nodelist
    (otp-system-rg) Res Group Nodelist: pc13-ipp1
  11. Add the host to the resource group.

    # scrgadm -c -g resource-group -y nodelist

    For example, add the host to the otp-system-rg resource group.

    # scrgadm -c -g otp-system-rg -y nodelist=pcl3-ipp1,pcl3-ipp2

  12. Set the system property for the otp-system-rg resource group to true.

    scrgadm -c -g otp-system-rg -y RG_system=true

Repairing a Host in the Cluster

This section provides the procedure for repairing a failed host in a clustered OTP system. If a host fails in a multi-host cluster setup, the host has to be repaired. The host repair process involves the following two steps:

ProcedureTo Remove a Failed Host From the Cluster

In this procedure, the host pcl17-ipp2 is removed from a two-host cluster configuration. The hosts are pcl17-ipp1 and pcl17-ipp2. Substitute your own cluster and host information.


Note –

If the host that is being removed is the first host in the cluster, back up the system management database as described in Backing Up The OTP System Management Service Database and Configuration Files so that the database can be restored to one of the remaining cluster hosts as described in Restoring the OTP System Management Service Database and Configuration Files to Another OTP Host.


  1. Log in as root (su - root) to the active host in the cluster.

    If the cluster has more than two hosts:

    1. Log in as root to an OTP host in the cluster.

    2. Type /usr/cluster/bin/scstat -g | grep Online to determine which host in the cluster is active.

      Make note of the host on which the resource group otp-system-rg is online.

      For example:


      # /usr/cluster/bin/scstat -g | grep Online
          Group: otp-system-rg          pcl17-ipp2   Online
       Resource: otp-lhn                pcl17-ipp2   Online  Online - LogicalHostname online.
       Resource: otp-sps-hastorage-plus pcl17-ipp2   Online  Online
       Resource: otp-spsms-rs           pcl17-ipp2   Online  Online
       Resource: otp-spsra-rs           pcl17-ipp2   Online  Online

      In the above example, the active host is pcl17-ipp2.

    3. Log in as root on the OTP host on which the resource group is active.

  2. Add the cluster binaries path to your $PATH.

    # PATH=$PATH:/usr/cluster/bin

  3. Move all the resource groups and disk device groups to pcl17-ipp1.

    # scswitch -z -g otp-system-rg -h pcl17-ipp1

  4. Remove the host from all resource groups.

    # scrgadm -c -g otp-system-rg -y RG_system=false

    # scrgadm -c -g otp-system-rg -y Nodelist=pcl17-ipp1


    Note –

    Nodelist must contain all the node names except the node to be removed.


  5. If the node was set up as a mediator host, remove it from the set.

    # metaset -s sps-dg -d -m pcl17-ipp2

  6. Remove the node from metaset.

    # metaset -s sps-dg -d -h -f pcl17-ipp2

  7. Remove all the disks connected to the node except the quorum disk.

    1. Check the disks connected to the node by typing the following command:

      scconf -pvv |grep pcl17-ipp2|grep Dev


      # scconf -pvv |grep pcl17-ipp2|grep Dev
      (dsk/d12) Device group node list:                pcl17-ipp2
      (dsk/d11) Device group node list:                pcl17-ipp2
      (dsk/d10) Device group node list:                pcl17-ipp2
      (dsk/d9) Device group node list:                 pcl17-ipp2
      (dsk/d8) Device group node list:                 pcl17-ipp2
      (dsk/d7) Device group node list:                 pcl17-ipp1, pcl17-ipp2
      (dsk/d6) Device group node list:                 pcl17-ipp1, pcl17-ipp2
      (dsk/d5) Device group node list:                 pcl17-ipp1, pcl17-ipp2
      (dsk/d1) Device group node list:                 pcl17-ipp1, pcl17-ipp2
    2. Remove the local disks.

      # scconf -c -D name=dsk/d8,localonly=false

      # scconf -c -D name=dsk/d9,localonly=false

      # scconf -c -D name=dsk/d10,localonly=false

      # scconf -c -D name=dsk/d11,localonly=false

      # scconf -c -D name=dsk/d12,localonly=false

      # scconf -r -D name=dsk/d8

      # scconf -r -D name=dsk/d9

      # scconf -r -D name=dsk/d10

      # scconf -r -D name=dsk/d11

      # scconf -r -D name=dsk/d12

    3. Determine which disk is the quorum disk.

      To determine which disk is the quorum disk, type the command scstat -q | grep "Device votes". For example:


      # scstat -q | grep "Device votes"
      Device votes: /dev/did/rdsk/d1s2 1 1 Online
      

      In this example, the quorum disk is dsk/d1

    4. Remove the shared disks except for the quorum disk.

      # scconf -r -D name=dsk/d5,nodelist=pcl17-ipp2

      # scconf -r -D name=dsk/d6,nodelist=pcl17-ipp2

      # scconf -r -D name=dsk/d7,nodelist=pcl17-ipp2

    5. Check that only the quorum disk is in the list.

      # scconf -pvv |grep pcl17-ipp2|grep Dev


      (dsk/d1) Device group node list:                 pcl17-ipp1, pcl17-ipp2
  8. Shut down the failed node.

    shutdown -y -g 0 -i 0

  9. Place the failed node in maintenance state.

    # scconf -c -q node=pcl17-ipp2,maintstate

  10. Remove the private interconnect interfaces.

    1. Check the private interconnect interfaces using the following command:

      # scconf -pvv | grep pcl17-ipp2 | grep Transport


      Transport cable:   pcl17-ipp2:ce0@0   switch1@2           Enabled
      Transport cable:   pcl17-ipp2:ce2@0   switch2@2           Enabled
    2. Disable and remove the private interconnect interfaces.

      # scconf -c -m endpoint=pcl17-ipp2:ce0,state=disabled

      # scconf -c -m endpoint=pcl17-ipp2:ce2,state=disabled

      # scconf -r -m endpoint=pcl17-ipp2:ce0

      # scconf -r -m endpoint=pcl17-ipp2:ce2

    3. Remove the private interfaces of the failed node.

      # scconf -r -A name=ce0,node=pcl17-ipp2

      # scconf -r -A name=ce2,node=pcl17-ipp2

  11. Remove the quorum disk from the failed node.

    • For a two-node cluster, type the following commands:

      # scconf -r -D name=dsk/d1,nodelist=pcl17-ipp2

      # scconf -c -q installmode

      # scconf -r -q globaldev=d1

      # scconf -c -q installmodeoff

    • For a three-host or more cluster, type the following commands:

      # scconf -r -D name=dsk/d1,nodelist=pcl17-ipp2

      # scconf -r -q globaldev=d1

  12. Add the quorum devices only to the nodes that will remain in the cluster.

    # scconf -a -q globaldev=d[n],node=node1,node=node2

    Where n is the disk DID number.

  13. Remove the failed node from the node authentication list.

    # scconf -r -T node=pcl17-ipp2

  14. Remove the failed node from the cluster node list.

    # scconf -r -h node=pcl17-ipp2

    Perform this step from installmode (scconf -c -q installmode). Otherwise, you will get a warning about possible quorum compromise.

  15. Use the following commands to verify whether the failed node is still in the cluster configuration.

    # scconf -pvv |grep pcl17-ipp2

    # scrgadm -pvv|grep pcl17-ipp2

    If the failed node was successfully removed, both of the above commands return to the system prompt.

    • If the scconf command failed, command out will be similar to the following:


      # scconf -pvv | grep pcl17-ipp2
      Cluster nodes: pcl17-ipp1 pcl17-ipp2
      Cluster node name: pcl17-ipp2
      (ipp-node70) Node ID: 1
      (ipp-node70) Node enabled: yes
      (ipp-node70) Node private hostname: clusternode1-priv
      (ipp-node70) Node quorum vote count: 0
      (ipp-node70) Node reservation key: 0x462DC27400000001
      (ipp-node70) Node transport adapters:

    If the scrgadm command output is similar to the following, then Step 4 was not executed.


    # scrgadm -pvv|grep pcl17-ipp2
    (otp-system-rg) Res Group Nodelist: pc117-ipp1 pc117-ipp2
  16. Change the RG_system property to true.

    Type scrgadm -c -g otp-system-rg -y RG_system=true

Next Steps

Add the host to the cluster as described in Adding a Host to the Existing Cluster.