5.24.4 Configuring InfiniBand Partitioning across Oracle VM RAC Clusters

The steps for configuring InfiniBand Partitioning across Oracle RAC clusters running in Oracle VM are described here.

In this procedure, the Oracle RAC clusters incur a minimal downtime. The downtime occurs when the Oracle RAC cluster is restarted to use the new interfaces.

Before you start this task, download and extract the file create_pkeys.tar. This file can be downloaded from Implementing InfiniBand Partitioning across OVM RAC clusters on Exadata (My Oracle Support Doc ID 2075398.1). The file should be downloaded to one of the management domain (dom0) nodes. This is the node that you will use for running all the scripts in this procedure. This node will be referred to as driver_dom0 in this procedure.

When you extract the file, you should get three files:

  • create_pkeys_on_switch.sh
  • run_create_pkeys.sh
  • create_pkey_files.sh
  1. Allocate IP addresses to be used by the pkey interfaces.

    Plan and allocate sets of IP addresses and netmasks for each Oracle VM RAC cluster that will be used by the cluster pkey interfaces and the storage pkey interfaces when InfiniBand partitioning gets implemented in the cluster.

    Refer to the topic About InfiniBand Partitioning Network Configuration for an example.

  2. On the InfiniBand switches, create a dedicated partition (cluster pkey) for each Oracle RAC cluster to be used by the clusterware and create one partition (storage pkey) to be used by all the Oracle VM RAC clusters and the storage cells for communication between the Oracle RAC cluster nodes and the storage cells.

    You assign a pkey to each partition as a simplified means of identifying the partition to the Subnet Manager. Pkeys are 15-bit integers. Values 0x0001 and 0x7fff are default partitions. Use values between 0x0002 and 0x7ffe for your pkeys.

    1. Enable password-less ssh equivalence for the root user from the driver_dom0 management domain (dom0) node to all the switches on the InfiniBand fabric.

      Use a command similar to the following where ib_switch_list refers to a file that contains the list of all the InfiniBand switches on the fabric, with each switch name on a separate line.

      # dcli –g ib_switch_list -l root –k
    2. Run the script create_pkeys_on_switch.sh from driver_dom0 to create and configure the partition keys on the InfiniBand switches.

      Note:

      Each run of the script create_pkeys_on_switch.sh creates exactly one partition. You must run the script once for each partition to be created. For example, an environment that contains two Oracle VM RAC clusters will have a total of three partitions: one storage partition and two cluster partitions (one per Oracle RAC cluster). In this example, you will need to run create_pkeys_on_switch.sh three times.

      You must run the script on only one node (driver_dom0). The script creates the partitions in all the switches provided as input.

    3. After you finish running the script, verify the partitions were created on all the switches.
      # /usr/local/sbin/smpartition list active no-page

      The following example output shows the default partitions (0x0001 and 0x7fff), and an additional partition, 0x0004. The partition with pkey 0x0004 is configured for IPoIB and has two member ports that are assigned full membership of the partition.

      # Sun DCS IB partition config file
      #! version_number : 1
      #! version_number : 12
      Default=0x7fff, ipoib :
      ALL_CAS=full,
      ALL_SWITCHES=full,
      SELF=full;
      SUN_DCS=0x0001, ipoib :
      ALL_SWITCHES=full;
       = 0x0004,ipoib: 
      0x0021280001cf3787=full, 
      0x0021280001cf205b=full; 

      At this stage ensure that you have created all the required partitions.

  3. On the Oracle VM RAC nodes and on the storage cells, generate all the relevant network configuration files for the new IP over InfiniBand (IPoIB) interfaces.

    Each partition requires a new IPoIB network interface.

    This step makes the following changes on the Oracle RAC cluster nodes:

    • Modifies these files:

      • /etc/sysconfig/network-scripts/ifcfg-ib0
      • /etc/sysconfig/network-scripts/ifcfg-ib1
    • Removes these files:

      • /etc/sysconfig/network-scripts/rule-ib0
      • /etc/sysconfig/network-scripts/rule-ib1
      • /etc/sysconfig/network-scripts/route-ib0
      • /etc/sysconfig/network-scripts/route-ib1
    • Creates the following new files in /etc/sysconfig/network-scripts:

      • ifcfg-clib0, ifcfg-clib1
      • rule-clib0, rule-clib1
      • route-clib0, route-clib1
      • ifcfg-stib0, ifcfg-stib1
      • rule-stib0, rule-stib1
      • route-stib0, route-stib1

    Note:

    If this step fails, before you rerun this step:

    • Restore all the files from /etc/sysconfig/network-scripts/backup-for-pkeys to /etc/sysconfig/network-scripts.
    • Remove the newly created files listed in this step.
    1. Make sure passwordless ssh is set up from the driver_dom0 node to all the Oracle RAC cluster nodes and the storage cells that need to be configured for partition keys.
    2. Make sure run_create_pkeys.sh and create_pkey_files.sh are executable and they are in the same directory on driver_dom0.
    3. Run run_create_pkeys.sh.

      For cluster nodes, you need to run the script a total of four times for every cluster node with a node_type value of compute.

      The syntax for this script is:

      run_create_pkeys.sh node_name interface_name pkey_id 
      node_type pkey_ipaddr pkey_netmask pkey_interfaceType
      • node_name specifies the cluster node.
      • interface_name is either ib0 or ib1.
      • pkey_id specifies the pkey without the 0x prefix. The value used here is the cluster partition key derived from the cluster pkey_id value entered in step 2.
      • node_type is either compute or cell.
      • pkey_ipaddr specifies the IP address.
      • pkey_netmask specifies the netmask in CIDR format, for example, /21.
      • pkey_interfaceType is cluster or storage for compute node types, or storage for cell node types.

      Note:

      The pkey_ipaddr and pkey_netmask of the cluster pkey interface must be on a different subnet from the pkey_ipaddr and pkey_netmask of the storage pkey interface.

      You can use the following command to derive the partition key values to be used for the run_create_pkeys.sh script from the pkey_id value entered in step 2.

      FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 
      + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))"
      |bc|tr [:upper:] [:lower:])

      FinalHexValue is the value that will be entered in the command here and HexValue is the value entered in step 2 for pkey_id.

      The following table provides an example of the inputs for the four runs for a cluster node:

      Table 5-4 Four Runs for Cluster Nodes

      Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType

      1

      ib0

      a000

      compute

      192.168.12.153

      /21

      cluster

      2

      ib1

      a000

      compute

      192.168.12.154

      /21

      cluster

      3

      ib0

      aa00

      compute

      192.168.114.15

      /20

      storage

      4

      ib1

      aa00

      compute

      192.168.114.16

      /20

      storage

      You use these values in each run of the script, denoted by the Run column, as shown in this example, where vm-guest-1 is the name of the cluster node.

      # ./run_create_pkeys.sh vm-guest-1 ib0 a000 compute 192.168.12.153 /21 cluster
      

    At this stage all the required networking files listed at the beginning of this step have been created for the new pkey-enabled network interfaces on the Oracle VM RAC cluster nodes.

    Oracle Grid Infrastructure has also been modified to make use of the new network interfaces upon restart. The output of the command $GRID_HOME/bin/oifcfg getif should list clib0 and clib1 in the list of interfaces to be used for the cluster interconnect.

  4. Modify Oracle ASM and Oracle RAC CLUSTER_INTERCONNECTS parameter.
    1. Log in to each of the Oracle ASM instances in the Oracle RAC cluster using SQL*Plus as SYS, and run the following command:
      ALTER SYSTEM SET cluster_interconnects='<cluster_pkey_IP_address_of_ib0>:
      <cluster_pkey_IP_address_of_ib1>' scope=spfile  sid='<name_of_current_ASM_instance>';

      For example:

      ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154'
        scope=spfile  sid='+ASM1';
    2. Log in to each of the database instances in the Oracle RAC cluster using SQL*Plus, and run the same command for the Oracle RAC instance:

      For example:

      ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154'
        scope=spfile  sid='RACDB1';
    3. Shut down and disable CRS auto-start on all the Oracle RAC cluster nodes.
      # Grid_home/bin/crsctl stop crs
      
      # Grid_home/bin/crsctl disable crs

    At this stage Oracle Grid Infrastructure, the Oracle ASM instances, and the Oracle Database instances have been modified to make use of the newly created network interfaces.

  5. Modify cellip.ora and cellinit.ora on all the cluster nodes (user domains).

    Perform these steps on any one database server node of the cluster (user domain for an Oracle VM RAC cluster).

    1. Make a backup of the cellip.ora and cellinit.ora files.
      # cd /etc/oracle/cell/network-config
      # cp cellip.ora cellip.ora-bak
      # cp cellinit.ora cellinit.ora-bak
    2. Modify the cellip.ora-bak file to replace the existing IP address with the two storage pkey IP addresses of every storage cell that will be setup in step 7.
      The two IP addresses are separated by a semi-colon (;).
    3. Make sure ssh equivalence is set up for the root user to all the cluster nodes from this cluster node.
    4. Replace the cellip.ora file on all the cluster nodes.

      Use the following commands to backup and then replace the cellip.ora file on all the cluster nodes. In this example cluster_nodes refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.

      # /usr/local/bin/dcli -g cluster_nodes –l root 
      "/bin/cp /etc/oracle/cell/network-config/cellip.ora /e
      tc/oracle/cell/network-config/cellip-orig.ora"
      
      # /usr/local/bin/dcli -g cluster_nodes –l root –f celli
      p.ora-bak –d /etc/oracle/cell/network-config/cellip.ora
      
    5. Manually edit the /etc/oracle/cell/network-config/cellinit.ora-bak file to replace the existing IP addresses and netmask with the two storage pkey IP addresses and netmask of the cluster node which was used in step 3.
    6. Make sure ssh equivalence is set up for the root user to all the cluster nodes from this cluster node.
    7. Replace the cellinit.ora file on all the cluster nodes.

      The IP address and netmask were used in the third and fourth run of step 3.

      Use the following commands to backup and then replace the cellinit.ora file on all the cluster nodes. In this example cluster_nodes refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.

      # /usr/local/bin/dcli -g cluster_nodes –l root 
      "/bin/cp /etc/oracle/cell/network-config/cellinit.ora /e
      tc/oracle/cell/network-config/cellinit-orig.ora"
      
      # /usr/local/bin/dcli -g cluster_nodes –l root –f cellini
      t.ora-bak –d /etc/oracle/cell/network-config/cellinit.ora
      
  6. In the management domains (dom0s), modify the user domain configuration file for each user domain to use the partition key applicable to that user domain.

    Modify all the relevant vm.cfg files in the management domain. This step is applicable only for Oracle VM environments. Log in to all the management domains and manually edit /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg to include the partition keys created in step 2.

    For example, modify the line:

    ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xffff'
    ,]},{'pf':'40:00.0','port':'2','pkey':['0xffff',]},]

    to:

    ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xa000'
    ,'0xaa00',]},{'pf':'40:00.0','port':'2','pkey':['0xa000
    ','0xaa00',]},]

    In this example, 0xa000 is the cluster partition key derived from the cluster pkey_id value entered in step 2, and 0xaa00 is the storage partition key derived from the storage pkey_id value.

    You can use the following command to derive the partition key values to use in vm.cfg from the pkey_id values entered in step 2.

    FinalHexValue=$(echo "obase=16;ibase=2;$(expr 100000000
    0000000 + $(echo "obase=2;ibase=16;$(echo $HexValue|tr 
    [:lower:] [:upper:])"|bc))"|bc|tr [:upper:] [:lower:])

    FinalHexValue is the value that you enter in vm.cfg and HexValue is the value entered in step 2 for pkey_id.

    Note:

    If your environment has multiple Oracle VM RAC clusters, the next two steps (step 7 and step 8) should be performed only once AFTER steps 3 through step 6 have been completed on all the Oracle VM RAC clusters.
  7. Modify the storage cells to use the newly created IPoIB interfaces.
    1. Make sure run_create_pkeys.sh and create_pkey_files.sh are available and that they are in the same directory on the same driver_dom0 node used in the previous steps.
    2. Make sure passwordless ssh is set up from the driver_dom0 node to all the storage cells that need to be configured for partition keys.
    3. Run run_create_pkeys.sh.

      For storage servers, you need to run the script twice for every storage server with a node_type value of cell.

      The syntax for this script is:

      run_create_pkeys.sh node_name interface_name pkey_id 
      node_type pkey_ipaddr pkey_netmask pkey_interfaceType
      • node_name specifies the storage server.
      • interface_name is either ib0 or ib1.
      • pkey_id specifies the pkey without the 0x prefix. The value used here is the cluster partition key derived from the storage pkey_id value entered in step 2.
      • node_type is either compute or cell.
      • pkey_ipaddr specifies the IP address.
      • pkey_netmask specifies the netmask in CIDR format, for example, /21.
      • pkey_interfaceType is cluster or storage for compute node types, or storage for cell node types.

      You can use the following command to derive the partition key values to be used for the run_create_pkeys.sh script from the pkey_id value entered in step 2.

      FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 
      + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))"
      |bc|tr [:upper:] [:lower:])

      FinalHexValue is the value that will be entered in the command here and HexValue is the value entered in step 2 for pkey_id.

      The following table provides an example of the inputs for the two runs for a storage server:

      Table 5-5 Two Runs for Storage Servers

      Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType

      1

      ib0

      aa00

      cell

      192.168.114.1

      /20

      storage

      2

      ib1

      aa00

      cell

      192.168.114.2

      /20

      storage

      You use these values in each run of the script, denoted by the Run column, as shown in this example, where cell01 is the name of the storage server.

      # ./run_create_pkeys.sh cell01 ib0 aa00 cell 192.168.114.1 /20 storage
      

      Note:

      You can ignore the following messages from the script. The restart of the storage cells at the end of this task will take care of these issues.

      Network configuration altered. Please issue the following commands 
      as root to restart the network and open IB stack: 
        service openibd restart
        service network restart
      A restart of all services is required to put new network configuration into 
      effect. MS-CELLSRV communication may be hampered until restart.

    At this stage the storage servers (cells) have been modified to use the new network interfaces upon restart.

  8. Modify the /opt/oracle.cellos/cell.conf file on each storage server and restart the storage servers.
    1. Make a backup of the /opt/oracle.cellos/cell.conf file.
      # cd /opt/oracle.cellos
      # cp cell.conf cell.conf-prepkey
    2. Change the Pkey configuration lines in /opt/oracle.cellos/cell.conf.

      Change this line:

      <Pkeyconfigured>no</Pkeyconfigured>

      to:

      <Pkeyconfigured>yes</Pkeyconfigured>

      Change this line for the 2 private interfaces ib0 and ib1:

      <IP_enabled>yes</IP_enabled>

      to:

      <IP_enabled>no</IP_enabled>
    3. Make sure Oracle Grid Infrastructure is stopped on all Oracle VM RAC nodes.
    4. Restart all the storage cell servers.
      # shutdown -r now
    5. Verify that the new pkey-enabled network interfaces are in use.
      # cellcli -e list cell detail | egrep 'interconnect|ipaddress'

      The output should show the new pkey-enabled interfaces (stib0 and stib1) along with the new set of IP addresses.

  9. Restart the Oracle RAC clusters.
    1. Log in to the corresponding management domain of each of the user domain nodes.
    2. Run the following commands:
      # xm shutdown user_domain_name
      
      # xm create /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg
  10. Start and verify the Oracle Grid Infrastructure stack is fully started on all the cluster nodes.
    1. Start and enable auto-start of the Oracle Grid Infrastructure stack on all the Oracle RAC cluster nodes.
      # $GRID_HOME/bin/crsctl start crs
      
      # $GRID_HOME/bin/crsctl enable crs
    2. After Oracle Grid Infrastructure has started on all the nodes, verify the cluster_interconnects parameter is set to use the newly configured pkey interfaces.

      Log in to a database instance and run the following query:

      SQL> SELECT inst_id, value FROM gv$parameter 
      WHERE name = 'cluster_interconnects'
    3. Remove the old cluster interconnect interfaces from the Oracle Cluster Registry (OCR).
      # Grid_home/bin/oifcfg delif –global ib0/<old subnet>
      
      # Grid_home/bin/oifcfg delif –global ib1/<old subnet>