Cabling Two RoCE Network Fabric Racks Together with Down Time using Oracle Exadata System Software Release 20.1.0 or Later

Use this simpler procedure to cable together two racks with RoCE Network Fabric where some down-time can be tolerated and you are using Oracle Exadata System Software release 20.1.0, or later.

In this procedure, the existing rack is R1, and the new rack is R2.

Use the applicable cabling tables depending on your system configuration:

  1. Ensure the new rack is near the existing rack.
    The RDMA Network Fabric cables must be able to reach the servers in each rack.
  2. Ensure you have a backup of the current switch configuration for each switch in the existing and new rack.

    See Backing Up Settings on the RoCE Network Fabric Switch in Oracle Exadata Database Machine Maintenance Guide.

  3. Shut down all servers on both the new rack (R2) and the existing rack (R1).
    The switches should remain available.
  4. Update the firmware to the latest available release on all of the RoCE Network Fabric switches.

    For this step, treat all of the switches as if they belong to a single rack system.

    See Updating RoCE Network Fabric Switch Firmware in Oracle Exadata Database Machine Maintenance Guide.

  5. Apply the multi-rack golden configuration settings on the RoCE Network Fabric switches.

    Use the procedure described in Applying Golden Configuration Settings on RoCE Network Fabric Switches, in Oracle Exadata Database Machine Maintenance Guide.

  6. Perform the physical cabling of the switches.
    1. In Rack 2, remove the existing inter-switch connections between the two leaf switches, R2UL and R2LL.
    2. In Rack 2, cable each leaf switch to the spine switches using the applicable cabling tables.
    3. In Rack 1, remove the existing inter-switch connections between the two leaf switches, R1UL and R1LL.
    4. In Rack 1, cable each leaf switch to the spine switches using the applicable cabling tables.
  7. Confirm each switch is available and connected.

    For each of the 6 switches, confirm the output from the show interface status command shows connected and 100G for each connected inter-switch port. Use the appropriate cabling tables to identify the ports that should be connected.

    In the following examples, the leaf switches are ports Eth1/4 to Eth1/7, and Eth1/30 to Eth1/33. The spine switches are ports Eth1/5 to Eth1/20.

    When run from a spine switch, the output should be similar to the following:

    rack1sw-roces0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/5        RouterPort5        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/7        RouterPort7        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    Eth1/9        RouterPort9        connected routed    full    100G    QSFP-100G-CR4
    Eth1/10       RouterPort10       connected routed    full    100G    QSFP-100G-SR4
    Eth1/11       RouterPort11       connected routed    full    100G    QSFP-100G-CR4
    Eth1/12       RouterPort12       connected routed    full    100G    QSFP-100G-SR4
    Eth1/13       RouterPort13       connected routed    full    100G    QSFP-100G-CR4
    Eth1/14       RouterPort14       connected routed    full    100G    QSFP-100G-SR4
    Eth1/15       RouterPort15       connected routed    full    100G    QSFP-100G-CR4
    Eth1/16       RouterPort16       connected routed    full    100G    QSFP-100G-SR4
    Eth1/17       RouterPort17       connected routed    full    100G    QSFP-100G-CR4
    Eth1/18       RouterPort18       connected routed    full    100G    QSFP-100G-SR4
    Eth1/19       RouterPort19       connected routed    full    100G    QSFP-100G-CR4
    Eth1/20       RouterPort20       connected routed    full    100G    QSFP-100G-SR4
    Eth1/21       RouterPort21       xcvrAbsen      routed    full    100G    --
    ...

    When run from a leaf switch, the output should be similar to the following:

    rack1sw-rocea0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/4        RouterPort1        connected routed    full    100G    QSFP-100G-CR4
    Eth1/5        RouterPort2        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort3        connected routed    full    100G    QSFP-100G-CR4
    Eth1/7        RouterPort4        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        celadm14           connected 3888      full    100G    QSFP-100G-CR4
    ...
    Eth1/29       celadm01           connected 3888      full    100G    QSFP-100G-CR4
    Eth1/30       RouterPort5        connected routed    full    100G    QSFP-100G-SR4
    Eth1/31       RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/32       RouterPort7        connected routed    full    100G    QSFP-100G-SR4
    Eth1/33       RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    ...
  8. Check the neighbor discovery for every switch in racks R1 and R2.

    Log in to each switch and use the show lldp neighbors command. Make sure that all switches are visible and check the switch ports assignment against the applicable cabling tables.

    A spine switch should see the two leaf switches in each rack, but not the other spine switch. The output for a spine switch should be similar to the following:

    Note:

    The interfaces output in the Port ID column are different for each switch based on the applicable cabling tables.
    rack1sw-roces0# show lldp neighbors
    ...
    Device ID            Local Intf      Hold-time  Capability  Port ID
    rack1-adm0           mgmt0           120        BR          Ethernet1/47
    rack1sw-roceb0       Eth1/5     120        BR          Ethernet1/5
    rack2sw-roceb0       Eth1/6     120        BR          Ethernet1/5
    rack1sw-roceb0       Eth1/7     120        BR          Ethernet1/7
    rack2sw-roceb0       Eth1/8     120        BR          Ethernet1/7
    rack1sw-roceb0       Eth1/9     120        BR          Ethernet1/4
    rack2sw-roceb0       Eth1/10    120        BR          Ethernet1/4
    rack1sw-roceb0       Eth1/11    120        BR          Ethernet1/6
    rack2sw-roceb0       Eth1/12    120        BR          Ethernet1/6
    rack1sw-rocea0       Eth1/13    120        BR          Ethernet1/5
    rack2sw-rocea0       Eth1/14    120        BR          Ethernet1/5
    rack1sw-rocea0       Eth1/15    120        BR          Ethernet1/7
    rack2sw-rocea0       Eth1/16    120        BR          Ethernet1/7
    rack1sw-rocea0       Eth1/17    120        BR          Ethernet1/4
    rack2sw-rocea0       Eth1/18    120        BR          Ethernet1/4
    rack1sw-rocea0       Eth1/19    120        BR          Ethernet1/6 
    rack2sw-rocea0       Eth1/20    120        BR          Ethernet1/6
    Total entries displayed: 17

    Each leaf switch should see the two spine switches, but not the other leaf switches. The output for a leaf switch should be similar to the following:

    Note:

    The interfaces output in the Port ID column are different for each switch based on the applicable cabling tables.
    rack1sw-rocea0# show lldp neighbors
    ...
    Device ID            Local Intf      Hold-time  Capability  Port ID
    switch               mgmt0      120        BR          Ethernet1/46
    rack1sw-roces0       Eth1/4     120        BR          Ethernet1/17
    rack1sw-roces0       Eth1/5     120        BR          Ethernet1/13
    rack1sw-roces0       Eth1/6     120        BR          Ethernet1/19
    rack1sw-roces0       Eth1/7     120        BR          Ethernet1/15
    rack2sw-roces0       Eth1/30    120        BR          Ethernet1/17
    rack2sw-roces0       Eth1/31    120        BR          Ethernet1/13
    rack2sw-roces0       Eth1/32    120        BR          Ethernet1/19
    rack2sw-roces0       Eth1/33    120        BR          Ethernet1/15
    rocetoi-ext-sw       Eth1/36    120        BR          Ethernet1/49
    Total entries displayed: 10
  9. Power on all servers in racks R1 and R2.
  10. For each rack, confirm the multi-rack cabling by running the verify_roce_cables.py script.

    Refer to My Oracle Support Doc ID 2587717.1 for download and usage instructions.

    Check the output of the verify_roce_cables.py script against the applicable cabling tables. Also, check that output in the CABLE OK? columns contains the OK status.

    When running the script, two input files are used, one for nodes and one for switches. Each file should contain the servers or switches on separate lines. Use fully qualified domain names or IP addresses for each server and switch.

    The following output is a partial example of the command results:

    # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1
    SWITCH PORT (EXPECTED PEER)  LEAF-1 (rack1sw-rocea0)     : CABLE OK?  LEAF-2 (rack1sw-roceb0)    : CABLE OK?
    ----------- --------------   --------------------------- : --------   -----------------------    : ---------
    Eth1/4 (ISL peer switch)   : rack1sw-roces0 Ethernet1/17 : OK         rack1sw-roces0 Ethernet1/9 : OK
    Eth1/5 (ISL peer switch)   : rack1sw-roces0 Ethernet1/13 : OK         rack1sw-roces0 Ethernet1/5 : OK
    Eth1/6 (ISL peer switch)   : rack1sw-roces0 Ethernet1/19 : OK         rack1sw-roces0 Ethernet1/11: OK
    Eth1/7 (ISL peer switch)   : rack1sw-roces0 Ethernet1/15 : OK         rack1sw-roces0 Ethernet1/7 : OK
    Eth1/12 (celadm10)         : rack1celadm10 port-1        : OK         rack1celadm10 port-2       : OK
    Eth1/13 (celadm09)         : rack1celadm09 port-1        : OK         rack1celadm09 port-2       : OK
    Eth1/14 (celadm08)         : rack1celadm08 port-1        : OK         rack1celadm08 port-2       : OK
    ...
    Eth1/15 (adm08)            : rack1dbadm08 port-1         : OK         rack1dbadm08 port-2        : OK
    Eth1/16 (adm07)            : rack1dbadm07 port-1         : OK         rack1dbadm07 port-2        : OK
    Eth1/17 (adm06)            : rack1dbadm06 port-1         : OK         rack1dbadm06 port-2        : OK
    ...
    Eth1/30 (ISL peer switch)  : rack2sw-roces0 Ethernet1/17 : OK         rack2sw-roces0 Ethernet1/9 : OK
    Eth1/31 (ISL peer switch)  : rack2sw-roces0 Ethernet1/13 : OK         rack2sw-roces0 Ethernet1/5 : OK
    Eth1/32 (ISL peer switch)  : rack2sw-roces0 Ethernet1/19 : OK         rack2sw-roces0 Ethernet1/11: OK
    Eth1/33 (ISL peer switch)  : rack2sw-roces0 Ethernet1/15 : OK         rack2sw-roces0 Ethernet1/7 : OK
    
    # ./verify_roce_cables.py -n nodes.rack2 -s switches.rack2
    SWITCH PORT (EXPECTED PEER)  LEAF-1 (rack2sw-rocea0)     : CABLE OK?  LEAF-2 (rack2sw-roceb0)    : CABLE OK?
    ----------- --------------   --------------------------- : --------   -----------------------    : ---------
    Eth1/4 (ISL peer switch)  :  rack1sw-roces0 Ethernet1/18 : OK         rack1sw-roces0 Ethernet1/10: OK
    ...
  11. Verify the RoCE Network Fabric operation by using the infinicheck command.

    Use the following recommended command sequence. In each command, hosts.lst contains a list of database server host names or RoCE Network Fabric IP addresses (2 RoCE Network Fabric IP addresses for each database server), and cells.lst contains a list of RoCE Network Fabric IP addresses for the storage servers (2 RoCE Network Fabric IP addresses for each storage server).

    • Use infinicheck with the -z option to clear the files that were created during the last run of the infinicheck command. For example:

      # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -z
    • Use infinicheck with the -s option to set up user equivalence for password-less SSH across the RoCE Network Fabric. For example:

      # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -s
    • Finally, verify the RoCE Network Fabric operation by using infinicheck with the -b option, which is recommended on newly imaged machines where it is acceptable to suppress the cellip.ora and cellinit.ora configuration checks. For example:

      # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -b
      
      INFINICHECK                    
              [Network Connectivity, Configuration and Performance]        
                     
                ####  FABRIC TYPE TESTS  #### 
      System type identified: RoCE
      Verifying User Equivalance of user=root from all DBs to all CELLs.
           ####  RoCE CONFIGURATION TESTS  ####       
           Checking for presence of RoCE devices on all DBs and CELLs 
      [SUCCESS].... RoCE devices on all DBs and CELLs look good
           Checking for RoCE Policy Routing settings on all DBs and CELLs 
      [SUCCESS].... RoCE Policy Routing settings look good
           Checking for RoCE DSCP ToS mapping on all DBs and CELLs 
      [SUCCESS].... RoCE DSCP ToS settings look good
           Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs
      [SUCCESS].... RoCE PFC and DSCP settings look good
           Checking for RoCE interface MTU settings. Expected value : 2300
      [SUCCESS].... RoCE interface MTU settings look good
           Verifying switch advertised DSCP on all DBs and CELLs ports ( )
      [SUCCESS].... Advertised DSCP settings from RoCE switch looks good  
          ####  CONNECTIVITY TESTS  ####
          [COMPUTE NODES -> STORAGE CELLS] 
            (60 seconds approx.)       
          (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK
      [SUCCESS]....... All  can talk to all storage cells          
          [COMPUTE NODES -> COMPUTE NODES]               
      ...