6.11 Verifying the InfiniBand Network Fabric Network

This procedure describes how to verify the InfiniBand Network Fabric network.

  1. Visually check all the RDMA Network Fabric cable connections within the rack. The port lights should be on, and the LEDs should be on. Do not press each connector to verify connectivity.

  2. Log in as the root user on any component in the rack.

  3. Verify the InfiniBand Network Fabric topology using the following commands:

    # cd /opt/oracle.SupportTools/ibdiagtools
    # ./verify-topology [-t rack_size]

    The following example shows the output when the network components are correct.

    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[SUCCESS ]
    Leaf switch check:cardinality and even distribution..............[SUCCESS ]
    Check if each rack has an valid internal ring....................[SUCCESS ]
    

    In the preceding command, rack_size is the size of the rack. The -t rack_size option is needed if the rack is Oracle Exadata Half Rack or Oracle Exadata Quarter Rack. Use the value halfrack or quarterrack, if needed.

    The following example shows the output when there is a bad RDMA Network Fabric switch to cable connection:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[SUCCESS ]
    Leaf switch check:cardinality and even distribution..............[SUCCESS ]
    Check if each rack has an valid internal ring....................[ERROR ]
    
    Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them.
    They should have at least 7 links between them
    

    The following example shows the output when there is a bad RDMA Network Fabric cable on a database server:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[ERROR ]
    Node db01 has 1 endpoints.(Should be 2)
    Port 2 of this node is not connected to any switch
    --------fattree End Point Cabling verification failed-----
    Leaf switch check:cardinality and even distribution..............[ERROR ]
    Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes
    It has only 3 links belonging to compute nodes                  [SUCCESS ]
    Check if each rack has an valid internal ring...................[SUCCESS ]
    

    The following example shows the output when there is a bad connection on the switch and the system:

    #./verify-topology
    [DB Machine Infiniband Cabling Topology Verification Tool ]
    Is every external switch connected to every internal switch......[SUCCESS ]
    Are any external switches connected to each other................[SUCCESS ]
    Are any hosts connected to spine switch..........................[SUCCESS ]
    Check if all hosts have 2 CAs to different switches..............[ERROR ]
    
    Node burxdb01 has 1 endpoints.(Should be 2) 
    Port 2 of this node is not connected to any switch
    --------fattree End Point Cabling verifation failed-----
    Leaf switch check:cardinality and even distribution..............[ERROR ]
    Internal QDR Switch 0x21283a87b8a0a0 has fewer than 4 compute nodes 
    It has only 3 links belonging to compute nodes...................[SUCCESS ]
    Check if each rack has an valid internal ring....................[ERROR ]
    
    Switches 0x21283a87cba0a0 0x21283a87b8a0a0 have 6 connections between them
    They should have at least 7 links between them