10.3 Set Up Ethernet Over InfiniBand (EoIB) on Oracle Solaris

This section includes the following procedures to set up EoIB on Oracle Solaris:

10.3.1 Set Up Ethernet Over InfiniBand on Oracle Solaris 11.1

You can set up Ethernet over InfiniBand connectivity for Exalogic compute nodes running Oracle Solaris 11.1 by doing the following:

  1. Use an SSH client, such as PuTTY, to log in to a Sun Network QDR InfiniBand Gateway Switch as a root. For example, log in to el01gw04 as root.

  2. At the command prompt, run the following command:

    el01gw04# listlinkup | grep Bridge
    

    A section of the output of this command is as follows:

    Connector 0A-ETH Present
      Bridge-0 Port 0A-ETH-1 (Bridge-0-2) up (Enabled)
      Bridge-0 Port 0A-ETH-2 (Bridge-0-2) up (Enabled)
      Bridge-0 Port 0A-ETH-3 (Bridge-0-1) up (Enabled)
      Bridge-0 Port 0A-ETH-4 (Bridge-0-1) up (Enabled)
      Bridge-0 Port 1A-ETH-1 (Bridge-1-2) down (Enabled)
      Bridge-0 Port 1A-ETH-2 (Bridge-1-2) down (Enabled)
      Bridge-0 Port 1A-ETH-3 (Bridge-1-1) up (Enabled)
      Bridge-0 Port 1A-ETH-4 (Bridge-1-1) up (Enabled)
    

    From this example, identify the uplinks. In this example, you can use any of the following Ethernet connectors for creating a VNIC:

    • 0A-ETH-1

    • 0A-ETH-2

    • 0A-ETH-3

    • 0A-ETH-4

    • 1A-ETH-3

    • 1A-ETH-4

      Note:

      This example procedure uses 1A-ETH-3.

  3. Determine GUIDs of an Exalogic compute node as follows:

    1. On the compute node that requires the VNIC, log in as root, and run the dladm show-ib command on the command line. For example, log in to el01cn02 as root. This command displays port information, as in the following example output:

      el01cn02# dladm show-ib
      LINK     HCAGUID            PORTGUID           PORT  STATE  PKEYS
      ibp0     21280001A0A694     21280001A0A695     1     up     FFFF
      ibp1     21280001A0A694     21280001A0A696     2     up     FFFF
      

      In the output, information about two ports is displayed. From this output, you must determine which port GUID to use. This example procedure uses the port GUID 21280001A0A695 (port 1).

    2. On the same compute node, run the following command on the command line to report information about all active links in the InfiniBand fabric:

      el01cn02# iblinkinfo.pl -R | grep hostname
      

      Where hostname is the name of the compute node. For example, el01cn02.

      The following is the example output of this command:

      el01cn02# iblinkinfo.pl -R | grep el01cn02
      65   15[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>    121   2[  ] "el01cn02 EL-C 192.168.10.29 HCA-1" (Could be 5.0 Gbps)
      64   15[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>    120   1[  ] "el01cn02 EL-C 192.168.10.29 HCA-1" (Could be 5.0 Gbps)
      

      From this example output, note down the switch lid values. The switch lid of port 1 is 64 (the first column in the output). The switch lid of port 2 is 65.

  4. Determine which gateway switch is associated with the switch lids by comparing the first column of the iblinkinfo output to the lid value of the ibswitches command as follows:

    1. On the compute node, run the ibswitches command on the command line. The example output of this command is as follows:

      el01cn02# ibswitches
      Switch  : 0x002128548042c0a0 ports 36 "SUN IB QDR GW switch el01gw03" enhanced port 0 lid 63 lmc 0
      Switch  : 0x002128547f22c0a0 ports 36 "SUN IB QDR GW switch el01gw02" enhanced port 0 lid 6 lmc 0
      Switch  : 0x00212856d0a2c0a0 ports 36 "SUN IB QDR GW switch el01gw04" enhanced port 0 lid 65 lmc 0
      Switch  : 0x00212856d162c0a0 ports 36 "SUN IB QDR GW switch el01gw05" enhanced port 0 lid 64 lmc 0
      
    2. In this example output, identify the switches that lid values 64 and 65 are associated with. In this example, the switch lid 64 of the gateway switch el01gw05 with GUID 0x00212856d162c0a0 is associated with port 1 of the HCA in the compute node el01cn02.

      Note:

      This example procedure uses LID 64 of this gateway switch.

  5. Define a dummy MAC address in the following format:

    <last three octets from el01gw05 switch ib GUID> : <last three octets of the administrative IP of the compute node in hexadecimal format>

    Example:

    GUID of switch el01gw05: 00:21:28:56:d1:62:c0:a0

    Last three octets of the switch GUID: 62:c0:a0

    Administrative IP address of compute node: 192.168.1.5

    Last three octets of the compute node's IP address: 168.1.5

    Last three octets in hexadecimal notation: a8:01:05.

    MAC address of the VNIC: 62:c0:a0:a8:01:05

    Note:

    Each MAC address should be unique. Only even numbers are supported for the most significant byte of the MAC address (unicast). The above address is an example only.

  6. As root, log in to el01gw05 that you identified in Step 4. Use its IP address or host name to log in.

  7. Upon login, to permit the configuration of VNICs, run the following command:

    el01gw05# allowhostconfig
    
  8. To create a VLAN, run the following command:

    e101gw05# createvlan 1A-ETH-3 -vlan 1706 -pkey default
    
  9. Note the ID of the VLAN you created by running the showvlan command as follows:

    # showvlan
      Connector/LAG  VLN   PKEY
      -------------  ---   ----
      1A-ETH-3        0    ffff
      1A-ETH-3        1706 ffff
    

    In this example, the VLAN ID is 1706.

  10. Run the following command to create a VNIC on the switch:

    el01gw05# createvnic 1A-ETH-3 -guid 00:21:28:00:01:A0:A6:95 -mac 62:c0:a0:a8:01:05 -pkey default
    

    Note:

    This new resource is not tagged with any VLAN.

    A VNIC is created.

  11. To verify that the VNIC was created, run the showvnics command. The following example output is displayed:

    ID  STATE FLG IOA_GUID                NODE                        IID  MAC               VLN PKEY GW
    --- ----- --- ----------------------- --------------------------- ---- ----------------- --- ---- --------
    0   UP    N   00:21:28:00:01:A0:A6:95 e101cn01 EL-C 192.168.10.29 0000 62:c0:a0:a8:01:05 NO  ffff 1A-ETH-3
    
  12. On the compute node, run the following command to display the list of VNICs available on the compute node:

    el01cn02# dladm show-phys | grep eoib
    

    This command displays the name of the new interface, as seen on the compute node, such as eoib0. Note the corresponding link, such as net7. It also displays the state of the interface.

    Note:

    You may repeat the above steps to create more network- administered tagless VNICs on the same compute node as long as a unique {ETH connector, port GUID} tuple is chosen each time. When this second VNIC is configured in the same manner, the VNIC is seen on the compute node (for example, as the eoib1 interface with the link net8). It is recommended that you configure these two Ethernet over InfiniBand (EoIB) interfaces in an IPMP group, such as bond1.

    To create a host-administered VNIC on a {ETH connector, port GUID} tuple with a network-administered tagless VNIC already created on it, complete the steps described in Oracle Solaris: Creating VNICs and Associating Them with VLANs.

  13. Create another VNIC for the same compute node, using a connector on a different gateway switch, by following steps 1 to 12. Note the name of this interface and its corresponding link. For example, eoib1 interface with the link net8.

  14. Delete the following files:

    • /etc/hostname.bond1

    • /etc/hostname.eoib0

    • /etc/hostname.eoib0

  15. Restart the compute node by running the reboot command.

  16. Create the VNIC you created in step 8 again on the compute node by running the following command:

    hostname# dladm create-vnic -l link_name [-v vlan_id] interface_name
    

    Example:

    el01cn02# dladm create-vnic -l net7 eoib0
    el01cn02# dladm create-vnic -l net8 eoib1
    

    If you are creating a VLAN tagged VNIC, use the -v option to add the VLAN ID as follows:

    el01cn02# dladm create-vnic -l net7 -v 1706 eoib0
    el01cn02# dladm create-vnic -l net8 -v 1706 eoib1
    
  17. You can verify if the VNICs were created by using the dladm show-vnic command as follows:

    hostname# dladm show-vnic
    
  18. To configure eoib0 and eoib1 in an IPMP group for high availability purposes, do the following:

    1. Identify the data links associated with the VNICs you created on the InfiniBand switch by running the following command:

      el01cn02# dladm show-phys -m
      

      Identify the link names associated with the VNICs you created, such as net7 and net8.

    2. Create the IPMP group by running the following command:

      hostname# ipadm create-ipmp bond_name
      

      Example:

      el01cn02# ipadm create-ipmp bond1
      
    3. Create the IP interfaces for the two links you noted in step 18.a by running the ipadm create-ip command as follows:

      hostname# ipadm create-ip link_name
      

      Example:

      el01cn02# ipadm create-ip net7
      el01cn02# ipadm create-ip net8
      
    4. Create interfaces for the VNICs you created in step 16 by running the following commands:

      hostname# ipadm create-ip interface_name 
      

      Example:

      el01cn02# ipadm create-ip eoib0
      el01cn02# ipadm create-ip eoib1
      
    5. Set one of the interfaces as a standby for the bonded interface, by running the following command:

      hostname# ipadm set-ifprop -p standby=on -m ip interface_name
      

      Example:

      e101cn02# ipadm set-ifprop -p standby=on -m ip eoib1
      
    6. Add the two interfaces to the ipmp bond you created in step 18.b, by running the following command:

      hostname# ipadm add-ipmp -i interface_name1 -i interface_name2 bond_name
      

      Example:

      e101cn02# ipadm add-ipmp -i eoib0 -i eoib1 bond1
      
    7. Set an IP address for the bonded interface you created, by running the following command:

      hostname# ipadm create-addr –T static –a local=ipv4_address/CIDR_netmask bond_name/v4
      

      Example:

      e101cn02# ipadm create-addr –T static –a local=10.100.44.68/22 bond1/v4
      
    8. Verify that your bonded interface is up, by running the following command:

      hostname# ipadm show-if
      IFNAME     CLASS    STATE    ACTIVE OVER
      lo0        loopback ok       yes    --
      net0       ip       ok       yes    --
      net4       ip       ok       yes    --
      net8       ip       down     no     --
      net9       ip       down     no     --
      bond0_0    ip       ok       yes    --
      bond0_1    ip       ok       no     --
      bond1     ipmp     ok       yes    eoib1 eoib0
      eoib1      ip       ok       no     --
      eoib0      ip       ok       yes    --
      
    9. Verify that your bonded interface was given an IP address by running the following command:

      # ipadm show-addr
      ADDROBJ           TYPE     STATE        ADDR
      lo0/v4            static   ok           127.0.0.1/8
      net0/v4           static   ok           138.3.2.87/21
      net4/v4           static   ok           169.254.182.77/24
      bond0/v4          static   ok           192.168.14.101/24
      bond1/v4          static   ok           138.3.48.35/22
      bond1/v4a         static   ok           138.3.51.1/22
      lo0/v6            static   ok           ::1/128
      net0/v6           addrconf ok          fe80::221:28ff:fed7:e944/10
      

10.3.2 Set Up Ethernet Over InfiniBand on Oracle Solaris 11.2

To set up Ethernet over InfiniBand connectivity for Exalogic compute nodes running Oracle Solaris 11.2 Base Image of EECS 2.0.6.2.0 perform the following procedure:

  1. Use an SSH client, such as PuTTY, to log in to a compute node as root. For this example log in to el01cn16 as root.
  2. Run the following command to verify that the image version is EECS 2.0.6.2.0 or greater, and the kernel version is SunOS 11.2.
    root@el01cn16:~# imageinfo
    

    A section of the output of this command is as follows:

    Exalogic 2.0.6.2.0 (build:r240216)
    Image version       : 2.0.6.2.0
    . . .
    Kernel version      : SunOS 11.2
    . . .
    
  3. Get the names of the InfiniBand (IB) datalink by running the following command:
    root@el01cn16:~# dladm show-phys
    

    The following is a section of the output of the command that displays net4 and net5 as the names for the IB datalink:

    LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
    . . .
    net4              Infiniband           up         32000  unknown   ibp0
    net5              Infiniband           up         32000  unknown   ibp1
    
  4. Open a second terminal and log in as root to the switch net5 is connected to (el01sw-ib02 for this example).

    Run the showvlan command to verify that the VLAN 0 is associated with the default partition and that the VLAN ID is created on the correct IB partition. The following example displays the output from this command showing that the VLAN ID is 3066 and it's associated to the correct IB partition:

    [root@el01sw-ib02 ~]# showvlan
       Connector/LAG  VLN   PKEY
       -------------  ---   ------
       0A-ETH-1        0    0xffff
       0A-ETH-1        3066 0x8206
    
  5. Repeat the previous step on the switch net4 is connected to (el01sw-ib03 for this example).
  6. From the compute node session, use the dladm command to verify that the compute node GUIDs are included in the IB partition that the VLAN 3066 is using.

    See the following extract of the command output for net4 and net5:

    root@el01cn16:~# dladm show-ib net4
    LINK  HCAGUID    PORTGUID       PORT STATE  GWNAME      GWPORT   PKEYS
    net4  2128000... 21280001EFF369 1    up     el01sw-ib02 0a-eth-1 7FFF,8206,FFFF
                                                el01sw-ib02 0a-eth-2
                                                el01sw-ib02 0a-eth-3
                                                . . .
    
    root@el01cn16:~# dladm show-ib net5
    LINK  HCAGUID    PORTGUID       PORT STATE  GWNAME      GWPORT   PKEYS
    net5  2128000... 21280001EFF36A 2    up     el01sw-ib02 0a-eth-1 7FFF,8206,FFFF
                                                el01sw-ib02 0a-eth-2
                                                el01sw-ib02 0a-eth-3
                                                . . .
    

    Make a note of the data that the command displays in the PORTGUID column for both IB datalinks.

  7. Run the commands iblinkinfo, ibswitches, and ibstat to determine the mapping among the IB HCA ports, IB datalinks and IB switches. See the following section of the first command output:
    root@el01cn16:~# iblinkinfo|grep cn16
    . . .
       14   33[  ] ==(...)==>  72    2[  ] "el01cn16 EL-C  192.168.10.16 HCA-1" ( )
       15   33[  ] ==(...)==>  71    1[  ] "el01cn16 EL-C  192.168.10.16 HCA-1" ( )
    

    The output of the command displays a pair of value sets:

    • switch lid 14, base lid 72, port 2

    • switch lid 15, base lid 71, port 1

    The following is an extract of the second command:

    root@el01cn16:~# ibswitches
    . . .
    Switch  : 0x0010e00b4520c0a0 ports 36 "SUN IB QDR GW switch el01sw-ib02 10.128.21.186 leaf:1" enhanced port 0 lid 14 lmc 0
    . . .
    Switch  : 0x0010e00b6d80c0a0 ports 36 "SUN IB QDR GW switch el01sw-ib03 10.128.21.187 leaf:2" enhanced port 0 lid 15 lmc 0
    

    The lid and port data from the previous command output matches the following switches:

    • switch lid 14, base lid 72 matches el01sw-ib02

    • switch lid 15, base lid 71 matches el01sw-ib03

    The following is an extract of the third and last command:

    root@el01cn16:~# ibstat
    . . .
            Port 1:
                    . . .
                    Base lid: 71
                    . . .
                    Port GUID: 0x0021280001eff369
                    Link layer: IB
            Port 2:
                    . . .
                    Base lid: 72
                    . . .
                    Port GUID: 0x0021280001eff36a
                    Link layer: IB
    

    With this information you can see that 0x0021280001eff369 is the port GUID for net4 and 0x0021280001eff36a is the port GUID for net5 (see 6). Now you can determine the following mappings:

    • port1 -> net4 -> el01sw-ib03

    • port2 -> net5 -> el01sw-ib02

  8. Run the following commands to create the EoIB datalink over net4 and display the results of the procedure:
    root@el01cn16:~# dladm create-eoib  -l net4 -g  el01sw-ib03 -c 0A-ETH-1 eoib0
    
    root@el01cn16:~# dladm show-eoib
    LINK       GWNAME       GWPORT    GWID FLAGS  SPEED  MACADDRESS      OVER
    eoib0      el01sw-ib03  0a-eth-1  506  aH---- 10000  0:0:0:0:0:0     net4
    
  9. Open another terminal and log in as root to the switch net4 is connected to (el01sw-ib03 for this example).

    Run the following commands to create a VNIC with no VLAN tag and display the results of the procedure. The following is an example of the commands' output:

    [root@el01sw-ib03 ~]# createvnic 0A-ETH-1 -guid 0021280001EFF369 -mac 
    80:C0:A0:09:16:01
    vNIC created
    
    [root@el01sw-ib03 ~]# showvnics |grep cn16                                    
    
    105 WAIT-IOA    N 0021280001EFF369        el01cn16 EL-C  192.168.10.16  0000 80:C0:A0:09:16:01 NO  0xffff 0A-ETH-1
    
  10. From the compute node session create a host-based VNIC with a VLAN tag. Run the following commands:
    root@el01cn16:~# dladm create-vnic -l eoib0 -v 3066 vnic3066_0
    
    root@el01cn16:~# dladm show-vnic
    

    The following is an example of the show-vnic command output:

    LINK                OVER              SPEED  MACADDRESS        MACADDRTYPE VIDS
    vnic3066_0          eoib0             10000  2:8:20:42:a1:f1   random      3066
    
  11. Run the following commands to create the EoIB datalink over net5:
    root@el01cn16:~# dladm create-eoib  -l net5 -g  el01sw-ib02 -c 0A-ETH-1 eoib1
    
    root@el01cn16:~# dladm show-eoib
    

    The following is an example of the show-eoib command output:

    LINK        GWNAME      GWPORT   GWID FLAGS  SPEED  MACADDRESS        OVER
    eoib0       el01sw-ib03 0a-eth-1 506  aHnU-- 10000  80:c0:a0:9:16:1   net4
    eoib1       el01sw-ib02 0a-eth-1 286  aH---- 10000  0:0:0:0:0:0       net5
    
  12. Log into the switch that net5 is connected to and run the following command to create a VNIC with no VLAN tag:
    [root@el01sw-ib02 ~]# createvnic 0A-ETH-1 -guid 0021280001eff36a -mac
    00:14:4F:09:16:02
    vNIC created
    

    Run the following command to display the result of the creation of the VNIC:

    [root@el01sw-ib02 ~]# showvnics|grep cn16
    108 WAIT-IOA    N 0021280001EFF36A        el01cn16 EL-C  192.168.10.16  0000
    00:14:4F:09:16:02 NO  0xffff 0A-ETH-1
    
  13. From the compute node session run the following command to create a VNIC with a VLAN tag:
    root@el01cn16:~# dladm create-vnic -l eoib1 -v 3066 vnic3066_1
    00:14:4F:09:16:02
    vNIC created
    

    Run the following command to display the result of the creation of the VNIC:

    root@el01cn16:~# dladm show-vnic
    LINK                OVER              SPEED  MACADDRESS        MACADDRTYPE VIDS
    vnic3066_0          eoib0             10000  2:8:20:42:a1:f1   random      3066
    vnic3066_1          eoib1             10000  2:8:20:10:7f:d3   random      3066
    
  14. Run the following commands to create the IPMP bond1 group:
    root@el01cn16:~# ipadm create-ip vnic3066_0
    
    root@el01cn16:~# ipadm create-ip vnic3066_1
    
    root@el01cn16:~# ipadm delete-ipmp bond1
    
    root@el01cn16:~# ipadm create-ipmp -i vnic3066_0 -i vnic3066_1 bond1
    
    root@el01cn16:~# ipadm create-addr -T static -a 192.168.100.16/24 bond1/v4
    
    root@el01cn16:~# ipadm set-ifprop -p standby=on -m ip vnic3066_1
    
    root@el01cn16:~# ipmpstat -i
    INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
    vnic3066_1  no      bond1       is-----   up        disabled  ok
    vnic3066_0  yes     bond1       --mbM--   up        disabled  ok
    bond0_1     no      bond0       is-----   up        disabled  ok
    bond0_0     yes     bond0       --mbM--   up        disabled  ok
    
  15. From the session to switch el01sw-ib03 run the following command to verify that the active VNIC is up on the switch el01sw-ib03:
    [root@el01sw-ib03 ~]# showvnics |grep cn16
    105 UP          N 0021280001EFF369        el01cn16 EL-C  192.168.10.16  31744 80:C0:A0:09:16:01 NO  0xffff 0A-ETH-1
    106 UP          H 0021280001EFF369        el01cn16 EL-C  192.168.10.16  64513 02:08:20:42:A1:F1 3066 0x8206 0A-ETH-1
    
  16. On the switch el01sw-ib02, the passive VNIC is not expected to appear until the IPMP group failover process runs. From the session to the switch el01sw-ib02 run the following command to verify the state of the VNIC:
    [root@el01sw-ib02 ~]# showvnics|grep cn16
    108 UP          N 0021280001EFF36A        el01cn16 EL-C  192.168.10.16  31744  00:14:4F:09:16:02 NO  0xffff 0A-ETH-1