- Owner's Guide
- Extending Oracle Zero Data Loss Recovery Appliance
- Extending a Rack by Adding Another Rack
- Cabling Several Racks Together
- Cabling Several RoCE Network Fabric Racks Together using Oracle Exadata System Software Release 19.3 or Earlier
Cabling Several RoCE Network Fabric Racks Together using Oracle Exadata System Software Release 19.3 or Earlier
Use this procedure to add another rack to an existing multi-rack system with RoCE Network Fabric using Oracle Exadata System Software Release 19.3 or earlier.
This procedure is for systems with RoCE Network Fabric (X8M or later).
In this procedure, the existing racks are R1, R2, … ,Rn, and the new rack is Rn+1. In the following steps, these example switch names are used:
rack5sw-roces0
: Rack 5 Spine switch (SS)rack5sw-rocea0
: Rack 5 Lower Leaf switch (R5LL)rack5sw-roceb0
: Rack 5 Upper Leaf switch (R5UL)
Note:
Cabling three or more racks together requires no downtime for the existing racks R1, R2, …, Rn. Only the new rack, Rn+1, is powered down- Ensure the new rack is near the existing racks R1, R2, …, Rn.The RDMA Network Fabric cables must be able to reach the servers in each rack.
- Ensure you have a backup of the current switch configuration for each switch in the existing racks and the new rack.For each switch, complete the steps in the Oracle Exadata Database Machine Maintenance Guide, section Backing Up Settings on the RoCE Network Fabric Switch.
- Shut down all servers in the new rack Rn+1.Refer to Powering Off Oracle Exadata Rack. The switches must remain online and available.
- Apply the multi-rack spine switch configuration to the spine switch in the new rack Rn+1:
- Log in to the server that has downloaded the latest RDMA Network Fabric patch ZIP file.
To find the available RDMA Network Fabric patches, search for 'RDMA network switch' in My Oracle Support document 888828.1. Download and use the latest patch for your Oracle Exadata System Software release.
- Unzip the RDMA Network Fabric patch ZIP
file and change directories to the location of the
patchmgr
utility. - Make a copy of the golden configuration file for the new spine switch.
Run these commands from patch directory, where n+1 is the number of the new rack:
# cp roce_switch_templates/roce_spine_switch_multi.cfg roce_spine_switch_multi_Rn+1SS.cfg
- Edit the copy of the spine switch configuration file.
Using a text editor, replace the three occurrences of
%SPINE_LOOPBACK_IP0%
with the correct IP address for the switch, as indicated in the table below, using the value that matches Rn+1 for your environment.Switch SPINE_LOOPBACK_IP0 Rack 3 spine switch (R3SS) 100.64.0.203 Rack 4 spine switch (R4SS) 100.64.0.204 Rack 5 spine switch (R5SS) 100.64.0.205 Rack 6 spine switch (R6SS) 100.64.0.206 Rack 7 spine switch (R7SS) 100.64.0.207 Rack 8 spine switch (R8SS) 100.64.0.208 For example, if you are adding a rack to an existing 4-rack system (where n+1=5), then use IP address 100.64.0.205 as the
SPINE_LOOPBACK_IP0
for the spine switch in the new rack (R5SS).! Define loopback interface for underlay OSPF routing interface loopback0 description Routing loopback interface !ip address 100.64.0.201/32 ip address 100.64.0.205/32 ip router ospf UNDERLAY area 0.0.0.0 ! Configure OSPF as the underlay network router ospf UNDERLAY router-id 100.64.0.205 ! change ECMP hash rotate value from default 32 to 40 for better ! router port utilization for upto parallel flows via the 8 ! available router ports ip load-sharing address source-destination port source-destination rotate 40 ! Create BGP route reflector to exchange routes across VTEPs ! Use CIDR block of IPs for neighbor range ! - log-neighbor-changes: Enables the generation of logging messages ! generated when the status of a BGP neighbor changes. ! - address-family ipv4 unicast: Enters address family configuration ! mode and Specifies IP Version 4 unicast address prefixes. ! address router bgp 65502 router-id 100.64.0.205 log-neighbor-changes
- Verify the three replacements in the spine switch configuration file.
For example, if you are adding a 5th rack, then check for IP address 100.64.0.205 in the spine switch configuration file:
$ grep 100.64 roce_spine_switch_multi_R5SS.cfg |grep -v ‘neighbor’ |grep -v ‘!’ ip address 100.64.0.205/32 router-id 100.64.0.205 router-id 100.64.0.205
- Apply the updated multi-rack configuration file to the spine switch in the new rack Rn+1:
-
Log in to the switch in the new rack Rn+1, and remove the existing configuration file, if it exists. For example, if you are adding a 5th rack, you would use the following command:
rack5sw-roces0# delete bootflash:roce_spine_switch_multi.cfg Do you want to delete "/roce_spine_switch_multi.cfg" ? (yes/no/abort) [y] y rack5sw-roces0#
-
Log in to the server that contains the modified configuration file for the spine switch, and copy the file to the spine switch in the new rack. For example, if you are adding a 5th rack:
# scp roce_spine_switch_multi_R5SS.cfg admin@R5SS_IP_Address:/
-
Verify the modified file was copied successfully to the spine switch. For example, if you are adding a 5th rack, log in to the spine switch on the new rack Rn+1 again and use the following command:
rack5sw-roces0# dir bootflash:roce_spine_switch_multi_R5SS.cfg 27360 Nov 20 12:12:50 2019 roce_spine_switch_multi_R5SS.cfg Usage for bootflash://sup-local 1829572608 bytes used 114893496320 bytes free 116723068928 bytes total
-
Copy the modified configuration into flash.
For example, if you are adding a 5th rack, you would use the following commands:
rack5sw-roces0# run-script bootflash:roce_spine_switch_multi_R5SS.cfg | grep 'none' rack5sw-roces0# copy running-config startup-config
Note:
Therun-script
command for a spine switch can take approximately 2 minutes to complete.
-
- Log in to the server that has downloaded the latest RDMA Network Fabric patch ZIP file.
- Apply the multi-rack leaf switch configuration to the leaf switches in the new rack Rn+1:
For each leaf switch, complete the following steps, where SW# represents the values Rn+1LL or Rn+1UL, depending on which switch you are configuring.
- Log in to the server that has downloaded the RDMA Network Fabric patch ZIP file (from Step 4.a) for the Oracle Exadata System Software release used by the existing racks.
- Change directories to the location of the
patchmgr
utility. - Make a copy of the golden configuration file for each leaf switch.You can copy either the
roce_leaf_switch_multi.cfg
file or theroce_qinq_leaf_switch_multi.cfg
file if you want to enable Secure Fabric on the rack.Run the following command twice from the patch directory, substituting for SW# the values
Rn+1LL
andRn+1UL
.# cp roce_switch_templates/roce_leaf_switch_multi.cfg roce_leaf_switch_multi_SW#.cfg
- Edit each copy of the leaf switch configuration file to replace the loopback IP addresses:
Using a text editor, replace the three occurrences of
%LEAF_LOOPBACK_IP0%
and one occurrence of%LEAF_LOOPBACK_IP1%
with the correct IP addresses for the leaf switch, as indicated in the table below.Switch LEAF_LOOPBACK_IP0 LEAF_LOOPBACK_IP1 Rack 3 Lower Leaf switch (R3LL)
Rack 3 Upper Leaf switch (R3UL)
100.64.0.105
100.64.0.106
100.64.1.105
100.64.1.106
Rack 4 Lower Leaf switch (R4LL)
Rack 4 Upper Leaf switch (R4UL)
100.64.0.107
100.64.0.108
100.64.1.107
100.64.1.108
Rack 5 Lower Leaf switch (R5LL)
Rack 5 Upper Leaf switch (R5UL)
100.64.0.109
100.64.0.110
100.64.1.109
100.64.1.110
Rack 6 Lower Leaf switch (R6LL)
Rack 6 Upper Leaf switch (R6UL)
100.64.0.111
100.64.0.112
100.64.1.111
100.64.1.112
Rack 7 Lower Leaf switch (R7LL)
Rack 7 Upper Leaf switch (R7UL)
100.64.0.113
100.64.0.114
100.64.1.113
100.64.1.114
Rack 8 Lower Leaf switch (R8LL)
Rack 8 Upper Leaf switch (R8UL)
100.64.0.115
100.64.0.116
100.64.1.115
100.64.1.116
For example, if you are adding a 5th rack to an existing 4-rack system, then the configuration file for the lower leaf switch on rack 5 (R5LL) would look like the following:
! Define loopback interface for IGP protocol for VTEP reachability interface loopback0 description Routing loopback interface !ip address 100.64.0.101/32 ip address 100.64.0.109/32 ip router ospf UNDERLAY area 0.0.0.0 ! Define loopback interface for associating with local VTEP interface loopback1 description VTEP loopback interface !ip address 100.64.1.101/32 ip address 100.64.1.109/32 ip router ospf UNDERLAY area 0.0.0.0 ! Configure OSPF as the underlay network router ospf UNDERLAY router-id 100.64.0.109 ! change ECMP hash rotate value from default 32 to 40 for better ! router port utilization for upto parallel flows via the 8 ! available router ports ip load-sharing address source-destination port source-destination rotate 40 ! - Create BGP route reflector to exchange routes across VTEPs ! Define max config 8 neighbor spines using their loopback IPs ! - BGP peers are located in an autonomous system (AS) that uses ! 4-byte AS numbers. Cisco recommends to pick a high value such ! as 65502 to avoid conflict with future bgp peers. ! - Create a template ‘BasePolicy’ that defines a peer policy ! template to define attributes for a particular address family. router bgp 65502 router-id 100.64.0.109 log-neighbor-changes
- Verify the IP address replacements in each leaf switch configuration
file.
For example, if you are adding a 5th rack, then check for IP addresses 100.64.0.109 and 100.64.1.109 in the lower leaf switch configuration file (R5LL), and for IP addresses 100.64.0.110 and 100.64.1.110 in the upper leaf switch configuration file (R5UL):
$ grep 100.64. roce_leaf_switch_multi_R5LL.cfg | grep -v neighbor | grep -v ‘!’ ip address 100.64.0.109/32 ip address 100.64.1.109/32 router-id 100.64.0.109 router-id 100.64.0.109 $ grep 100.64. roce_leaf_switch_multi_R5UL.cfg | grep -v neighbor | grep -v ‘!’ ip address 100.64.0.110/32 ip address 100.64.1.110/32 router-id 100.64.0.110 router-id 100.64.0.110
- Apply the updated multi-rack configuration files to each corresponding leaf switch in the new rack:
-
Log in to each leaf switch, and remove the existing configuration file. For example:
rack5sw-rocea0# delete bootflash:roce_leaf_switch.cfg Do you want to delete “/roce_leaf_switch.cfg” ? (yes/no/abort) [y] y rack5sw-rocea0# delete bootflash:roce_leaf_switch_multi.cfg No such file or directory
rack5sw-roceb0# delete bootflash:roce_leaf_switch.cfg Do you want to delete “/roce_leaf_switch.cfg” ? (yes/no/abort) [y] y rack5sw-roceb0# delete bootflash:roce_leaf_switch_multi.cfg No such file or directory
-
Log in to the server that contains the modified configuration files, and copy each file to its corresponding leaf switch. For example:
# scp roce_leaf_switch_multi_R5LL.cfg admin@rack5sw-rocea0:/ User Access Verification Password: roce_leaf_switch_multi_R5LL.cfg 100% 167KB 487.6KB/s 00:00 # scp roce_leaf_switch_multi_R5UL.cfg admin@rack5sw-roceb0:/ User Access Verification Password: roce_leaf_switch_multi_R5UL.cfg
-
Log in to each leaf switch and verify that the modified files were copied successfully. For example:
rack5sw-rocea0# dir bootflash:roce_leaf_switch_multi_R5LL.cfg 171387 Nov 20 14:41:52 2019 roce_leaf_switch_multi_R5LL.cfg Usage for bootflash://sup-local 2583580672 bytes used 114139488256 bytes free 116723068928 bytes total rack5sw-roceb0# dir bootflash:roce_leaf_switch_multi_R5UL.cfg 171387 Nov 20 21:41:50 2019 roce_leaf_switch_multi_R5UL.cfg Usage for bootflash://sup-local 2579836928 bytes used 114143232000 bytes free 116723068928 bytes total
-
Copy the modified configuration file into flash. For example:
rack5sw-rocea0# run-script bootflash:roce_leaf_switch_multi_R5LL.cfg | grep 'none' rack5sw-rocea0# copy running-config startup-config
rack5sw-roceb0# run-script bootflash:roce_leaf_switch_multi_R5UL.cfg | grep 'none' rack5sw-roceb0# copy running-config startup-config
Note:
Therun-script
command for a leaf switch can take approximately 6 minutes to complete.
-
- Use
patchmgr
to verify the configuration of the RDMA Network Fabric switches against the golden configuration files.- Log in to the server that has downloaded the RDMA Network Fabric patch ZIP file (from Step 4.a).
- Change directories to the location of the
patchmgr
utility. - Create a file that contains the host name or IP address of the leaf and
spine switches on all racks.For example, create a file named
switches.lst
. The file contains the host name or IP address for the spine switches and both leaf switches on each rack, with each switch on a new line. - Run
patchmgr
with the--verify_config
option.In the following command,
switches.lst
is a file that contains the switches to be queried.$ ./patchmgr --roceswitches switches.lst --verify-config -log_dir /tmp 2019-11-20 14:12:27 -0800 :Working: Initiate config verify on RoCE switches from . Expect up to 6 minutes for each switch 2019-11-20 14:12:30 -0800 1 of 15 :Verifying config on switch rack1sw-rocea0 2019-11-20 14:12:30 -0800: [INFO ] Dumping current running config locally as file: /tmp/run.rack1sw-rocea0.cfg 2019-11-20 14:12:33 -0800: [SUCCESS ] Backed up switch config successfully 2019-11-20 14:12:33 -0800: [INFO ] Validating running config against template [1/3]: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch.cfg 2019-11-20 14:12:33 -0800: [INFO ] Validating running config against template [2/3]: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch_multi.cfg 2019-11-20 14:12:33 -0800: [INFO ] Config matches template: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch_multi.cfg 2019-11-20 14:12:33 -0800: [SUCCESS ] Config validation successful! 2019-11-20 14:12:33 -0800 2 of 15 :Verifying config on switch rack1sw-roceb0 ...
- Perform the physical cabling of the switches in the new rack Rn+1.
Caution:
Cabling within a live network must be done carefully in order to avoid potentially serious disruptions.- Remove the eight existing inter-switch connections between each leaf switch in the new rack Rn+1 (ports 4, 5, 6, 7 and 30, 31, 32, 33).
- Cable the leaf switches in the new rack according to the appropriate table in Multi-Rack Cabling Tables.
For example, if you are adding a 5th rack and rack Rn+1 is R5, then use "Table 21-14 Leaf Switch Connections for the Fifth Rack in a Five-Rack System".
- Add the new rack to the switches in the existing racks (R1 to Rn).
- For an existing rack (Rx), cable the lower leaf switch RxLL according to the appropriate table in Multi-Rack Cabling Tables.
- For the same rack, cable the upper leaf switch RxUL according to the appropriate table in Multi-Rack Cabling Tables.
- Repeat these steps for each existing rack, R1 to Rn.
- Confirm each switch is available and connected.
For each switch in racks R1, R2, …, Rn, Rn+1, confirm the output for the switch
show interface status
command showsconnected
and100G
. In the following example, the leaf switches are ports Eth1/4 to Eth1/7, and Eth1/30 to Eth1/33. The spine switches are ports Eth1/5 to Eth1/20.When run from a spine switch, the output should be similar to the following:
rack1sw-roces0# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- ... Eth1/5 RouterPort5 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 connected routed full 100G QSFP-100G-SR4 Eth1/7 RouterPort7 connected routed full 100G QSFP-100G-CR4 Eth1/8 RouterPort8 connected routed full 100G QSFP-100G-SR4 Eth1/9 RouterPort9 connected routed full 100G QSFP-100G-CR4 Eth1/10 RouterPort10 connected routed full 100G QSFP-100G-SR4 Eth1/11 RouterPort11 connected routed full 100G QSFP-100G-CR4 Eth1/12 RouterPort12 connected routed full 100G QSFP-100G-SR4 Eth1/13 RouterPort13 connected routed full 100G QSFP-100G-CR4 Eth1/14 RouterPort14 connected routed full 100G QSFP-100G-SR4 Eth1/15 RouterPort15 connected routed full 100G QSFP-100G-CR4 Eth1/16 RouterPort16 connected routed full 100G QSFP-100G-SR4 Eth1/17 RouterPort17 connected routed full 100G QSFP-100G-CR4 Eth1/18 RouterPort18 connected routed full 100G QSFP-100G-SR4 Eth1/19 RouterPort19 connected routed full 100G QSFP-100G-CR4 Eth1/20 RouterPort20 connected routed full 100G QSFP-100G-SR4 Eth1/21 RouterPort21 xcvrAbsen routed full 100G -- ...
When run from a leaf switch, the output should be similar to the following:
rack1sw-rocea0# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- ... Eth1/4 RouterPort1 connected routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort2 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort3 connected routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort4 connected routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 ... Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort5 connected routed full 100G QSFP-100G-SR4 Eth1/31 RouterPort6 connected routed full 100G QSFP-100G-SR4 Eth1/32 RouterPort7 connected routed full 100G QSFP-100G-SR4 Eth1/33 RouterPort8 connected routed full 100G QSFP-100G-SR4 ...
- Check the neighbor discovery for every switch in racks R1, R2, …, Rn, Rn+1.Log in to each switch and use the
show lldp neighbors
command. Make sure that all switches are visible and check the switch ports assignment (leaf switches: ports Eth1/4 - Eth1/7, Eth1/30 - Eth1/33; spine switches: ports Eth1/5 - Eth1/20) against the appropriate table in Multi-Rack Cabling Tables.Each spine switch should see all the leaf switches in each rack, but not the other spine switches. The output for a spine switch should be similar to the following:
Note:
The interfaces in the rightmost output column (for example,Ethernet1/5
) are different for each switch based on the applicable cabling tables.rack1sw-roces0# show lldp neighbors | grep roce rack1sw-roceb0 Eth1/5 120 BR Ethernet1/5 rack2sw-roceb0 Eth1/6 120 BR Ethernet1/5 rack1sw-roceb0 Eth1/7 120 BR Ethernet1/7 rack2sw-roceb0 Eth1/8 120 BR Ethernet1/7 rack1sw-roceb0 Eth1/9 120 BR Ethernet1/4 rack2sw-roceb0 Eth1/10 120 BR Ethernet1/4 rack3sw-roceb0 Eth1/11 120 BR Ethernet1/5 rack3sw-roceb0 Eth1/12 120 BR Ethernet1/7 rack1sw-rocea0 Eth1/13 120 BR Ethernet1/5 rack2sw-rocea0 Eth1/14 120 BR Ethernet1/5 rack1sw-rocea0 Eth1/15 120 BR Ethernet1/7 rack2sw-rocea0 Eth1/16 120 BR Ethernet1/7 rack3sw-rocea0 Eth1/17 120 BR Ethernet1/5 rack2sw-rocea0 Eth1/18 120 BR Ethernet1/4 rack3sw-rocea0 Eth1/19 120 BR Ethernet1/7 rack3sw-rocea0 Eth1/20 120 BR Ethernet1/4
Each leaf switch should see the spine switch in every rack, but not the other leaf switches. The output for a leaf switch should be similar to the following:
Note:
The interfaces in the rightmost output column (for example,Ethernet1/13
) are different for each switch based on the applicable cabling tables.rack1sw-rocea0# show lldp neighbors | grep roce rack3sw-roces0 Eth1/4 120 BR Ethernet1/13 rack1sw-roces0 Eth1/5 120 BR Ethernet1/13 rack3sw-roces0 Eth1/6 120 BR Ethernet1/15 rack1sw-roces0 Eth1/7 120 BR Ethernet1/15 rack2sw-roces0 Eth1/30 120 BR Ethernet1/17 rack2sw-roces0 Eth1/31 120 BR Ethernet1/13 rack3sw-roces0 Eth1/32 120 BR Ethernet1/17 rack2sw-roces0 Eth1/33 120 BR Ethernet1/15
- Power on all the servers in the new rack, Rn+1.
- For each rack, confirm the multi-rack cabling by running the
verify_roce_cables.py
script.Refer to My Oracle Support Doc ID 2587717.1 for download and usage instructions.
Check the output of the
verify_roce_cables.py
script against the applicable tables in Multi-Rack Cabling Tables for ZDLRA Rack X8M. Also, check that output in theCABLE OK?
columns contains theOK
status.When running the script, two input files are used, one for nodes and one for switches. Each file should contain the servers or switches on separate lines. Use fully qualified domain names or IP addresses for each server and switch.
The following output is a partial example of the command results:
# ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1 SWITCH PORT (EXPECTED PEER) LEAF-1 (rack1sw-rocea0) : CABLE OK? LEAF-2 (rack1sw-roceb0) : CABLE OK? ----------- -------------- --------------------------- : -------- ----------------------- : --------- Eth1/4 (ISL peer switch) : rack1sw-roces0 Ethernet1/17 : OK rack1sw-roces0 Ethernet1/9 : OK Eth1/5 (ISL peer switch) : rack1sw-roces0 Ethernet1/13 : OK rack1sw-roces0 Ethernet1/5 : OK Eth1/6 (ISL peer switch) : rack1sw-roces0 Ethernet1/19 : OK rack1sw-roces0 Ethernet1/11: OK Eth1/7 (ISL peer switch) : rack1sw-roces0 Ethernet1/15 : OK rack1sw-roces0 Ethernet1/7 : OK Eth1/12 (celadm10) : rack1celadm10 port-1 : OK rack1celadm10 port-2 : OK Eth1/13 (celadm09) : rack1celadm09 port-1 : OK rack1celadm09 port-2 : OK Eth1/14 (celadm08) : rack1celadm08 port-1 : OK rack1celadm08 port-2 : OK ... Eth1/15 (adm08) : rack1dbadm08 port-1 : OK rack1dbadm08 port-2 : OK Eth1/16 (adm07) : rack1dbadm07 port-1 : OK rack1dbadm07 port-2 : OK Eth1/17 (adm06) : rack1dbadm06 port-1 : OK rack1dbadm06 port-2 : OK ... Eth1/30 (ISL peer switch) : rack2sw-roces0 Ethernet1/17 : OK rack2sw-roces0 Ethernet1/9 : OK Eth1/31 (ISL peer switch) : rack2sw-roces0 Ethernet1/13 : OK rack2sw-roces0 Ethernet1/5 : OK Eth1/32 (ISL peer switch) : rack2sw-roces0 Ethernet1/19 : OK rack2sw-roces0 Ethernet1/11: OK Eth1/33 (ISL peer switch) : rack2sw-roces0 Ethernet1/15 : OK rack2sw-roces0 Ethernet1/7 : OK
- Verify the RoCE Network Fabric operation by
using the
infinicheck
command.Use the following recommended command sequence. In each command,
hosts.lst
is the name of an input file that contains a comma-delimited list of database server host names or RoCE Network Fabric IP addresses, andcells.lst
is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers.-
Use
infinicheck
with the-z
option to clear the files that were created during the last run of theinfinicheck
command. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -z
-
Use
infinicheck
with the-s
option to set up user equivalence for password-less SSH across the RoCE Network Fabric. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -s
-
Finally, verify the RoCE Network Fabric operation by using
infinicheck
with the-b
option, which is recommended on newly imaged machines where it is acceptable to suppress thecellip.ora
andcellinit.ora
configuration checks. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -b INFINICHECK [Network Connectivity, Configuration and Performance] #### FABRIC TYPE TESTS #### System type identified: RoCE Verifying User Equivalance of user=root from all DBs to all CELLs. #### RoCE CONFIGURATION TESTS #### Checking for presence of RoCE devices on all DBs and CELLs [SUCCESS].... RoCE devices on all DBs and CELLs look good Checking for RoCE Policy Routing settings on all DBs and CELLs [SUCCESS].... RoCE Policy Routing settings look good Checking for RoCE DSCP ToS mapping on all DBs and CELLs [SUCCESS].... RoCE DSCP ToS settings look good Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs [SUCCESS].... RoCE PFC and DSCP settings look good Checking for RoCE interface MTU settings. Expected value : 2300 [SUCCESS].... RoCE interface MTU settings look good Verifying switch advertised DSCP on all DBs and CELLs ports ( ) [SUCCESS].... Advertised DSCP settings from RoCE switch looks good #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK [SUCCESS]....... All can talk to all storage cells [COMPUTE NODES -> COMPUTE NODES] ...
-