Checking the Health of an InfiniBand Switch

To check the health of an InfiniBand switch:

  1. Open the fabric management shell:

    -> show /SYS/Fabric_Mgmt
    NOTE: show on Fabric_Mgmt will launch a restricted Linux shell.
    User can execute switch diagnosis, SM Configuration and IB
    monitoring commands in the shell. To view the list of commands,
    use "help" at rsh prompt.
    Use exit command at rsh prompt to revert back to
    ILOM shell.
    FabMan@hostname->
    

    The prompt changes from -> to FabMan@hostname->

  2. Check the general health of the switch:

    FabMan@ra1sw-iba-> showunhealthy
    OK - No unhealthy sensors
    
  3. Check the general environment.

    FabMan@ra1sw-iba-> env_test
    NM2 Environment test started:
    Starting Voltage test:
    Voltage ECB OK
    Measured 3.3V Main = 3.28 V
    Measured 3.3V Standby = 3.42 V
    Measured 12V = 12.06 V
         .
         .
         .
    

    The report should show that fans 1, 2, and 3 are present, and fans 0 and 4 are not present. All OK and Passed results indicate that the environment is normal.

  4. Determine the current InfiniBand subnet manager priority of the switch. Leaf switches must have an smpriority of 5, and spine switches must have a smpriority of 8. The sample output shown here indicates the correct priority for a leaf switch.

    FabMan@ra1sw-iba-> setsmpriority list
    Current SM settings:
    smpriority 5
    controlled_handover TRUE
    subnet_prefix 0xfe80000000000000
    
  5. If the priority setting is incorrect, then reset it:

    1. Disable the subnet manager:

      FabMan@ra1sw-iba->disablesm
      Stopping partitiond daemon.             [ OK ]
      Stopping IB Subnet Manager..            [ OK ]
      
    2. Reset the priority. This example sets the priority on a leaf switch:

      FabMan@ra1sw-iba->setsmpriority 5
      Current SM settings:
      smpriority
      5 controlled_handover TRUE
      subnet_prefix 0xfe80000000000000
      
    3. Restart the subnet manager:

      FabMan@ra1sw-iba->enablesm
      Starting IB Subnet Manager.             [ OK ]
      Starting partitiond daemon.             [ OK ]
      
  6. Log out of the Fabric Management shell and the Oracle ILOM shell:

    FabMan@ra1sw-iba-> exit
    -> exit
    
  7. Log in to Linux as root and restart the switch:

    localhost: root
    password: welcome1
    [root@localhost ~]# reboot
    
  8. Disconnect your laptop from the InfiniBand switch.

  9. Repeat these procedures for the second InfiniBand leaf switch.