4.5.8 Verifying RoCE Network Fabric Operation
Verify the RoCE Network Fabric is operating properly after making modifications to the underlying hardware.
If hardware maintenance has taken place with any component in the RoCE Network Fabric, including replacing an RDMA Network Fabric Adapter on a server, a switch, or a cable, or if the operation of the RoCE Network Fabric is suspected to be substandard, then verify the RoCE Network Fabric is operating properly. The following procedure describes how to verify network operation:
- Complete the steps in Verifying the RoCE Network Fabric Configuration.
- Prepare for
infinicheck.You may need to run the following commands before you can use the
infinicheckcommand to perform RoCE Network Fabric configuration, connectivity, and performance checks.-
If required, use the
-soption set up user equivalence for password-less SSH across the RoCE Network Fabric. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -s -
You can use the
-zoption to clear the files that were created during the last run of theinfinicheckcommand. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -z
In the previous commands,
hostipsis the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, andcellipsis the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers. -
- Run the
infinicheckcommand to perform RoCE Network Fabric configuration, connectivity, and performance checks.On a properly configured system, you can run the
infinicheckcommand on any database server with minimal arguments. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheckBy default, the
infinicheckcommand performs a group of configuration and connectivity checks on the RoCE Network Fabric. You can use the-poption to run the optional performance tests. Or, use the-aoption to perform all checks, including the performance tests. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -aNote:
System performance may be impacted when theinfinicheckcommand performs performance stress tests. Consequently, only run theinfinicheckperformance tests when required and preferably when there is no workload on the system.You can also specify the servers in your system explicitly by using the
-goption to specify the database servers and the-coption to specify the storage servers. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellipsIn the previous example,
hostipsis the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, andcellipsis the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers.Instead of listing the database servers and storage servers in input files, you can supply a comma-separated list of IP addresses on the command line.
The following example displays typical terminal output from the
infinicheckcommand.# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips INFINICHECK [Network Connectivity, Configuration and Performance] #### FABRIC TYPE TESTS #### System type identified: RoCE Verifying User Equivalence of user=root from all DBs to all CELLs. #### RoCE CONFIGURATION TESTS #### Checking for presence of RoCE devices on all DBs and CELLs [SUCCESS].... RoCE devices on all DBs and CELLs look good Checking for RoCE Policy Routing settings on all DBs and CELLs [SUCCESS].... RoCE Policy Routing settings look good Checking for RoCE DSCP ToS mapping on all DBs and CELLs [SUCCESS].... RoCE DSCP ToS settings look good Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs [SUCCESS].... RoCE PFC and DSCP settings look good Checking for RoCE interface MTU settings. Expected value : 2300 [SUCCESS].... RoCE interface MTU settings look good Verifying switch advertised DSCP on all DBs and CELLs ports ( ~ 2 min ) [SUCCESS].... Advertised DSCP settings from RoCE switch looks good #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..............Results OK [SUCCESS]....... All can talk to all storage cells [COMPUTE NODES -> COMPUTE NODES] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..............Results OK [SUCCESS]....... All hosts can talk to all other nodes Verifying Subnet Masks on all nodes [SUCCESS] ......... Subnet Masks is same across the network
Parent topic: Maintaining the RoCE Network Fabric