5.24.5 Implementing InfiniBand Partitioning across OVM RAC Clusters: Setting up Limited Membership

The 12.1.0.2 October 2016 Database Bundle Patch introduces a security enhancement feature where the GUIDs of the database nodes can be assigned to the storage pkey with limited membership instead of full membership, as was the case prior to the 12.1.0.2 October 2016 Bundle Patch. This addresses a security concern where one RAC node from one RAC cluster could talk to a RAC node from another RAC cluster using the storage pkey interfaces.

Full Membership and Limited Membership

An InfiniBand partition defines a group of InfiniBand nodes that are allowed to communicate with one another. With InfiniBand partitioning, you define custom or unique partition keys that are managed by the master subnet manager, and assign members to the custom partition keys. Members with the same partition key can only communicate amongst themselves. A member of one partition key cannot communicate with a member that has a different partition key, regardless of membership type. The OVM RAC cluster nodes of one cluster are assigned one partition key for clusterware communication and another partition key for communication with storage cells. This way, the nodes of one RAC cluster will not be able to communicate with the nodes of another RAC cluster, which have a different partition key assigned to them. This is very similar conceptually to tagged VLANs in the Ethernet world.

Partition keys (pkeys) are 15-bit integers and have a value of 0x1 to 0x7FFF. An additional bit, the membership bit, identifies the membership of a member of the partition. Memberships can be:

  • Full: The membership bit is set to 1. Members with full membership can communicate with each other as well as members with limited membership within same the partition key.

  • Limited: The membership bit is set to 0. Members with limited membership within a partition cannot communicate with each other. However they can communicate with other members with full membership within the same partition.

Combined together, the pkey and the membership bit comprise a 16-bit integer. The most significant bit is the membership bit.

By default, the InfiniBand subnet manager provides a single partition and it is identified by the partition key 0x7FFF (limited membership) or 0xFFFF (full membership).

An HCA port can participate in a maximum of 128 partitions. Each partition key provides a new IPoIB network interface. For example, InfiniBand port 1 with partition key 0xa001 will result in a new network interface. These interfaces are named with meaningful names through the ifcfg-<interface> file parameters.

An InfiniBand node can be a member of multiple partitions. When a packet arrives at a database node, the partition key (pkey) of the packet is matched with the Subnet Manager configuration. This validation prevents a database node from communicating with another database node outside of the partitions of which it is a member.

Every node within the infiniBand fabric has a partition key table which you can see in /sys/class/infiniband/mlx4_0/ports/[1-2]/pkeys. Every Queue Pair (QP) of the node has an index (pkey) associated with it that maps to an entry in that table. Whenever a packet is sent from the QP’s send queue, the indexed pkey is attached with it. Whenever a packet is received on the QP’s receive queue, the indexed pkey is compared with that of the incoming packet. If it does not match, the packet is silently discarded. The receiving Channel Adapter does not know it arrived and the sending Channel Adapter gets no acknowledgement as well that it was received. The sent packet simply gets manifested as a lost packet. It is only when the pkey of the incoming packet matches the indexed pkey of the QP’s receive queue, a handshake is made and the packet is accepted and an acknowledgment is sent to the sending channel adapter. This is how only members of the same partition are able to communicate with each other and not with hosts that are not members of that partition (which means those hosts that does not have that pkey in their partition table).

The steps below describe how to set up this enhancement on a pkey-enabled environment that has the 12.1.0.2 October 2016 Database Bundle Patch applied. There are two possible scenarios, as described below:

Case 1. Implementing the feature on a pkey-enabled environment in a rolling manner

In this case, you have already applied the 12.1.0.2 October 2016 Database Bundle Patch.

Perform the steps below on one node at a time.

  1. Shut down the Grid Infrastructure on the node.

    # $GI_HOME/bin/crsctl stop crs
  2. Determine the two port GUIDs of the dom0 (control domain) which manages this user domain OVM RAC cluster node.

    # /usr/sbin/ibstat | grep Port
  3. Login to the Infiniband Switch where the SM master is running as root.

  4. Run the commands below on the InfiniBand switch.

    # /usr/local/sbin/smpartition start
    
    # /usr/local/sbin/smpartition modify -n <storage pkey name> -port <Port GUID1 of the dom0 from step 2> -m limited
    
    # /usr/local/sbin/smpartition modify -n <storage pkey name> -port <Port GUID2 of the dom0 from step 2> -m limited
    
    # /usr/local/sbin/smpartition commit
  5. Modify the vm.cfg file for this OVM RAC user domain node in the dom0.

    1. Login to the dom0 as root.

    2. Edit /EXAVMIMAGES/GuestImages/<user domain name>/vm.cfg and modify the partition keys as shown in the example below.

      Modify this line:

      ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':[ '0xclpkey','0x<stpkey>',]},{'pf':'40:00.0','port':'2','pkey':[ '0xclpkey','0x<stpkey>',]},]

      to this:

      ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':[ '0xclpkey','0x<mod_stpkey>',]},{'pf':'40:00.0','port':'2','pkey':[ '0xclpkey','0x<mod_stpkey>',]},]

      <mod_stpkey> is derived from <stpkey> using the formula below:

      mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

      Note that <stpkey> and <mod_stpkey> in the formula above are specified without the "0x" prefix.

  6. Modify the /etc/sysconfig/network-scripts/ifcfg-stib* files on the user domain RAC nodes.

    Edit the PKEY_ID in those files using the formula below:

    mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

    mod_stpkey is the new PKEY_ID, and stpkey is the old PKEY_ID.

    Note that <stpkey> and <mod_stpkey> in the formula above are specified without the "0x" prefix.

  7. Modify /opt/oracle.cellos/pkey.conf on the user domain RAC nodes.

    Edit the Pkey for the storage network pkey interfaces (stib*):

    Change:

    <Pkey>0xstpkey</Pkey>

    to:

    <Pkey>0xmod_stpkey</Pkey>

    mod_stpkey is derived from stpkey using the formula below:

    mod_stpkey=$(echo "obase=16;ibase=2;$(expr $(echo "obase=2;ibase=16;$(echo $stpkey|tr [:lower:] [:upper:])"|bc) - 1000000000000000)"|bc|tr [:upper:] [:lower:])

    stpkey and mod_stpkey used in the formula above are specified without the "0x" prefix.

  8. Restart the OVM RAC user domain node.

    1. Login to the dom0 as root.

    2. Run the following commands:

      # xm shutdown <user domain name>
      
      # xm create /EXAVMIMAGES/GuestImages/<user domain name>/vm.cfg
  9. Verify the Grid Infrastructure stack is fully up on the cluster node.

  10. Repeat the steps on the remaining cluster nodes, one node at a time.

Case 2. Implementing the feature on a pkey-enabled environment while you apply the 12.1.0.2 October 2016 Database Bundle Patch in a rolling manner

Perform the steps below on one node at a time.

  1. Apply the 12.1.0.2 October 2016 Database Bundle Patch on the cluster node.

  2. Run the steps 1 through 10 from Case 1 above on the node where the patch was applied.

  3. Move on to the next cluster node and repeat steps 1 and 2 above.

Note:

Once the dom0 GUIDs are converted to limited membership, deployment of any new cluster will have the October 2016 Database Bundle Patch as a prerequisite.