12.1 Using Sun Network QDR InfiniBand Gateway Switches

This section contains the following topics:

12.1.1 Physical Specifications

This section introduces Sun Network QDR InfiniBand Gateway Switches, which are also referred to as leaf switches in this guide.

Table 12-1 provides the physical specifications of the Sun Network QDR InfiniBand Gateway Switch.

Table 12-1 NM2-GW Specifications

Dimension Measurements

Width

17.52 in. (445.0 mm)

Depth

24 in. (609.6 mm)

Height

1.75  in. (44.5 mm)

Weight

23.0 lbs (11.4 kg)

12.1.2 Access the Command-Line Interface (CLI) of a Gateway Switch

With power applied, you can access the CLI of a gateway switch in your Exalogic machine.

The number of gateway switches in your Exalogic machine depends on your purchased Exalogic machine rack configuration. You must access the command-line interfaces of these gateway switches individually.

For example, to access the CLI of a gateway switch, complete the following steps:

  1. If you are using a network management port, begin network communication with the CLI using the ssh command and the host name configured for the gateway switch:

    % ssh -l root gateway-name
    root@gateway-name's password: password
    #
    

    where gateway-name is the host name configured for the gateway switch.

    If you do not see this output or prompt, there is a problem with the network communication, host name, or CLI.

  2. If you are using a USB management port, begin serial communication with the CLI as follows:

    1. Connect a serial terminal, terminal server, or workstation with a TIP connection to the USB-to-serial adapter. Configure the terminal or terminal emulator with these settings:

      115200 baud, 8 bits, No parity, 1 Stop bit, and No handshaking

    2. Press the Return or Enter key on the serial device several times to synchronize the connection. You might see text similar to the following:

      …
      CentOS release 5.2 (Final)
      Kernel 2.6.27.13-nm2 on an i686
      
      gateway-name login: root
      Password: password
      #
      

      where gateway-name is the host name assigned to the gateway switch.

      If you do not see this output or prompt, there is a problem with the network communication, host name, or command-line interface (CLI).

  3. Note:

    Repeat these steps to access the CLI for the other gateway switches in your Exalogic machine.

12.1.3 Verify the Status of a Gateway Switch

For each gateway switch, you can check the status of the CLI, power supplies, fans, and switch chip. Verify that the voltage and temperature values of the gateway switch are within specification:

# showunhealthy
# env_test

An unfavorable output from these commands indicates a hardware fault with that particular component. A voltage or temperature deviating more than 10% from the provided specification means a problem with the respective component.

For example, on the CLI of one of the gateway switches, enter the following command to check its status:

# env_test

This command performs a set of checks and displays the overall status of the gateway switch, as in the following example:

Environment test started:
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.37 V
Measured 12V = 12.06 V
Measured 5V = 5.03 V
Measured VBAT = 3.25 V
Measured 1.0V = 1.01 V
Measured I4 1.2V = 1.22 V
Measured 2.5V = 2.52 V
Measured V1P2 DIG = 1.17 V
Measured V1P2 AND = 1.16 V
Measured 1.2V BridgeX = 1.21 V
Measured 1.8V = 1.80 V
Measured 1.2V Standby = 1.20 V
Voltage test returned OK
Starting PSU test:
PSU 0 present
PSU 1 present
PSU test returned OK
Starting Temperature test:
Back temperature 23.00
Front temperature 32.62
SP temperature 26.12
Switch temperature 45, maxtemperature 45
Bridge-0 temperature 41, maxtemperature 42
Bridge-1 temperature 43, maxtemperature 44
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 11212
Fan 2 running at rpm 11313
Fan 3 running at rpm 11521
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting onboard ibdevice test:
Switch OK
Bridge-0 OK
Bridge-1 OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Environment test PASSED

When the status is operational, you can start the Subnet Manager (SM).

Note:

Repeat these steps to verify the status of the other gateway switches in your Exalogic machine.

12.1.4 Start the Subnet Manager Manually

The Subnet Manager (SM) is enabled on the gateway switches in a single Exalogic rack configuration, by default.

However, if the SM is not running on the InfiniBand switches, you can start and activate the SM as follows:

  1. On the CLI of a switch, start the SM by running the following command:

    # enablesm

  2. Set the SM priority within the command-line interface (CLI) as follows:

    # setsmpriority priority

    Note:

    For information about the switches on which the SM should run in various rack configurations and the SM priorities for the switches, see Subnet Manager Operation in Different Rack Configurations.

    For example, to set the SM on a gateway switch to priority 5, run the following command:

    # setsmpriority 5

    The following output is displayed:

    -------------------------------------------------
    OpenSM 3.2.6_20090717
      Reading Cached Option File: /etc/opensm/opensm.conf
      Loading Cached Option:routing_engine = ftree
      Loading Cached Option:sminfo_polling_timeout = 1000
      Loading Cached Option:polling_retry_number = 3
    Command Line Arguments:
      Priority = 5
      Creating config file template '/tmp/osm.conf'.
      Log File: /var/log/opensm.log
    -------------------------------------------------
    

    For the changes to take effect, restart the SM as follows:

    # disablesm

    # enablesm

12.1.5 Check Link Status

After starting the SM, you can verify that the Link LEDs for cabled links are green. If the Link LED is dark, the link is down. If the Link LED flashes, there are symbol errors.

To check the link status of the cables:

# listlinkup

If the link for a connector is reported as not present, the link at either end of the cable is down. If a port is down, use the enableswitchport 0 portnumber command to bring the port up. Alternatively, use the ibdevreset command to reset the switch chip.

See the Sun Network QDR InfiniBand Gateway Switch Administration Guide, "Enable a Switch Chip Port" and "Reset the Switch Chip".

After making sure that the link is up, you can verify the InfiniBand fabric.

The following is an output example of the listlinkup command:

# listlinkup
Connector  0A Present <-> Switch Port 20 up (Enabled)
Connector  1A Present <-> Switch Port 22 up (Enabled)
Connector  2A Present <-> Switch Port 24 up (Enabled)
.
.
.
Connector 15A Not present
Connector 0A-ETH Present
 Bridge-0-1 Port 0A-ETH-1 up (Enabled)
 Bridge-0-1 Port 0A-ETH-2 up (Enabled)
 Bridge-0-0 Port 0A-ETH-3 up (Enabled)
 Bridge-0-0 Port 0A-ETH-4 up (Enabled)
Connector 1A-ETH Present
 Bridge-1-1 Port 1A-ETH-1 up (Enabled)
 Bridge-1-1 Port 1A-ETH-2 up (Enabled)
 Bridge-1-0 Port 1A-ETH-3 up (Enabled)
 Bridge-1-0 Port 1A-ETH-4 up (Enabled)
Connector 0B Present <-> Switch Port 19 up (Enabled)
Connector 1B Present <-> Switch Port 21 up (Enabled)
.
.
.
Connector 15B Not present
#

12.1.6 Verify the InfiniBand Fabric

Use the following commands on the command-line interface (CLI) to verify that the InfiniBand fabric is operational:

  1. ibnetdiscover

    Discovers and displays the InfiniBand fabric topology and connections. See Discover the InfiniBand Network Topology.

  2. ibdiagnet

    Performs diagnostics upon the InfiniBand fabric and reports status. See Perform Diagnostics on the InfiniBand Fabric.

  3. ibcheckerrors

    Checks the entire InfiniBand fabric for errors. See Validate and Check Errors in the InfiniBand Fabric.

12.1.6.1 Discover the InfiniBand Network Topology

To discover the InfiniBand network topology and build a topology file which is used by the OpenSM Subnet Manager, run the following command on the command-line interface (CLI) of a gateway switch:

# ibnetdiscover

The output is displayed, as in the following example:

The topology file is used by InfiniBand commands to scan the InfiniBand fabric and validate the connectivity as described in the topology file, and to report errors as indicated by the port counters.

# Topology file: generated on Sat Apr 13 22:28:55 2002
#
# Max of 1 hops discovered
# Initiated from node 0021283a8389a0a0 port 0021283a8389a0a0
vendid=0x2c9
devid=0xbd36
sysimgguid=0x21283a8389a0a3
switchguid=0x21283a8389a0a0(21283a8389a0a0)
Switch   36 "S-0021283a8389a0a0" # "Sun DCS 36 QDR switch localhost" enhanced port 0 lid 15 lmc 0
[23]    "H-0003ba000100e388"[2](3ba000100e38a) # "nsn33-43 HCA-1" lid 14 4xQDR
vendid=0x2c9
devid=0x673c
sysimgguid=0x3ba000100e38b
caguid=0x3ba000100e388
Ca   2 "H-0003ba000100e388" # "nsn33-43 HCA-1"
[2](3ba000100e38a)   "S-0021283a8389a0a0"[23] # lid 14 lmc 0 "Sun DCS 36 QDR switch localhost" lid 15 4xQDR

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.6.2 Perform Diagnostics on the InfiniBand Fabric

To perform a collection of tests on the InfiniBand fabric and generate several files that contain parameters and aspects of the InfiniBand fabric, run the following command on the command-line interface (CLI) on a gateway switch:

# ibdiagnet

In the following example, the ibdiagnet command is minimized to determine which links are utilized:

# ibdiagnet -lw 4x -ls 10 -skip all

Loading IBDIAGNET from: /usr/lib/ibdiagnet1.2
-W- Topology file is not specified.
 Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib/ibdm1.2
-I- Using port 0 as the local port.
-I- Discovering ... 2 nodes (1 Switches & 1 CA-s) discovered.
.
.
.
-I- Links With links width != 4x (as set by -lw option)
-I---------------------------------------------------
-I- No unmatched Links (with width != 4x) were found
-I---------------------------------------------------
-I- Links With links speed != 10 (as set by -ls option)
-I---------------------------------------------------
-I- No unmatched Links (with speed != 10) were found
.
.
.
-I- Stages Status Report:
 STAGE               Errors Warnings
 Bad GUIDs/LIDs Check               0   0 
 Link State Active Check               0   0 
 Performance Counters Report               0   0 
 Specific Link Width Check               0   0 
 Specific Link Speed Check               0   0 
 Partitions Check               0   0 
 IPoIB Subnets Check               0   0 
Please see /tmp/ibdiagnet.log for complete log
----------------------------------------------------------------
-I- Done. Run time was 1 seconds.

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.6.3 Validate and Check Errors in the InfiniBand Fabric

Use the ibcheckerrors command that uses the topology file to scan the InfiniBand fabric and validate the connectivity as described in the topology file, and to report errors as indicated by the port counters.

On the command-line interface (CLI), enter the following command:

# ibcheckerrors

## Summary: 4 nodes checked, 0 bad nodes found
##      34 ports checked, 0 ports have errors beyond threshold

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.7 Monitor a Gateway Switch Using Web Interface

  1. Open a web browser and go to the following URL:

    http://gateway-IP

    where gateway-IP is the IP address of a gateway switch.

  2. Log in to the interface as the root user.
  3. Click the Switch/Fabric Monitoring Tools tab.
  4. Click Launch Sun DCS GW Monitor.

    The Fabric Monitor is displayed.