12 Using the InfiniBand Gateway Switches and Managing the InfiniBand Network Using Subnet Manager

This chapter describes how to use the Sun Network QDR InfiniBand Gateway switches in your Exalogic machine. The number of gateway switches depends on your purchased Exalogic machine rack configuration.

It also describes how to manage the InfiniBand network using Subnet Manager.

This chapter contains the following sections:

12.1 Using Sun Network QDR InfiniBand Gateway Switches

This section contains the following topics:

12.1.1 Physical Specifications

This section introduces Sun Network QDR InfiniBand Gateway Switches, which are also referred to as leaf switches in this guide.

Table 12-1 provides the physical specifications of the Sun Network QDR InfiniBand Gateway Switch.

Table 12-1 NM2-GW Specifications

Dimension Measurements

Width

17.52 in. (445.0 mm)

Depth

24 in. (609.6 mm)

Height

1.75  in. (44.5 mm)

Weight

23.0 lbs (11.4 kg)


12.1.2 Accessing the Command-Line Interface (CLI) of a Gateway Switch

With power applied, you can access the CLI of a gateway switch in your Exalogic machine.

The number of gateway switches in your Exalogic machine depends on your purchased Exalogic machine rack configuration. You must access the command-line interfaces of these gateway switches individually.

For example, to access the CLI of a gateway switch, complete the following steps:

  1. If you are using a network management port, begin network communication with the CLI using the ssh command and the host name configured for the gateway switch:

    % ssh -l root gateway-name
    root@gateway-name's password: password
    #
    

    where gateway-name is the host name configured for the gateway switch.

    If you do not see this output or prompt, there is a problem with the network communication, host name, or CLI.

  2. If you are using a USB management port, begin serial communication with the CLI as follows:

    1. Connect a serial terminal, terminal server, or workstation with a TIP connection to the USB-to-serial adapter. Configure the terminal or terminal emulator with these settings:

      115200 baud, 8 bits, No parity, 1 Stop bit, and No handshaking

    2. Press the Return or Enter key on the serial device several times to synchronize the connection. You might see text similar to the following:

      …
      CentOS release 5.2 (Final)
      Kernel 2.6.27.13-nm2 on an i686
      
      gateway-name login: root
      Password: password
      #
      

      where gateway-name is the host name assigned to the gateway switch.

      If you do not see this output or prompt, there is a problem with the network communication, host name, or command-line interface (CLI).

  3. Note:

    Repeat these steps to access the CLI for the other gateway switches in your Exalogic machine.

12.1.3 Verifying the Status of a Gateway Switch

For each gateway switch, you can check the status of the CLI, power supplies, fans, and switch chip. Verify that the voltage and temperature values of the gateway switch are within specification:

# showunhealthy
# env_test

An unfavorable output from these commands indicates a hardware fault with that particular component. A voltage or temperature deviating more than 10% from the provided specification means a problem with the respective component.

For example, on the CLI of one of the gateway switches, enter the following command to check its status:

# env_test

This command performs a set of checks and displays the overall status of the gateway switch, as in the following example:

Environment test started:
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.37 V
Measured 12V = 12.06 V
Measured 5V = 5.03 V
Measured VBAT = 3.25 V
Measured 1.0V = 1.01 V
Measured I4 1.2V = 1.22 V
Measured 2.5V = 2.52 V
Measured V1P2 DIG = 1.17 V
Measured V1P2 AND = 1.16 V
Measured 1.2V BridgeX = 1.21 V
Measured 1.8V = 1.80 V
Measured 1.2V Standby = 1.20 V
Voltage test returned OK
Starting PSU test:
PSU 0 present
PSU 1 present
PSU test returned OK
Starting Temperature test:
Back temperature 23.00
Front temperature 32.62
SP temperature 26.12
Switch temperature 45, maxtemperature 45
Bridge-0 temperature 41, maxtemperature 42
Bridge-1 temperature 43, maxtemperature 44
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 11212
Fan 2 running at rpm 11313
Fan 3 running at rpm 11521
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting onboard ibdevice test:
Switch OK
Bridge-0 OK
Bridge-1 OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Environment test PASSED

When the status is operational, you can start the Subnet Manager (SM).

Note:

Repeat these steps to verify the status of the other gateway switches in your Exalogic machine.

12.1.4 Starting the Subnet Manager Manually

The Subnet Manager (SM) is enabled on the gateway switches in a single Exalogic rack configuration, by default.

However, if the SM is not running on the InfiniBand switches, you can start and activate the SM as follows:

  1. On the CLI of a switch, start the SM by running the following command:

    # enablesm

  2. Set the SM priority within the command-line interface (CLI) as follows:

    # setsmpriority priority

    Note:

    For information about the switches on which the SM should run in various rack configurations and the SM priorities for the switches, see Section 12.3.2, "Running the Subnet Manager in Different Rack Configurations."

    For example, to set the SM on a gateway switch to priority 5, run the following command:

    # setsmpriority 5

    The following output is displayed:

    -------------------------------------------------
    OpenSM 3.2.6_20090717
      Reading Cached Option File: /etc/opensm/opensm.conf
      Loading Cached Option:routing_engine = ftree
      Loading Cached Option:sminfo_polling_timeout = 1000
      Loading Cached Option:polling_retry_number = 3
    Command Line Arguments:
      Priority = 5
      Creating config file template '/tmp/osm.conf'.
      Log File: /var/log/opensm.log
    -------------------------------------------------
    

    For the changes to take effect, restart the SM as follows:

    # disablesm

    # enablesm

12.1.5 Checking Link Status

After starting the SM, you can verify that the Link LEDs for cabled links are green. If the Link LED is dark, the link is down. If the Link LED flashes, there are symbol errors.

To check the link status of the cables:

# listlinkup

If the link for a connector is reported as not present, the link at either end of the cable is down. If a port is down, use the enableswitchport 0 portnumber command to bring the port up. Alternatively, use the ibdevreset command to reset the switch chip.

See the Sun Network QDR InfiniBand Gateway Switch Administration Guide, "Enable a Switch Chip Port" and "Reset the Switch Chip".

After making sure that the link is up, you can verify the InfiniBand fabric.

The following is example output of the listlinkup command:

# listlinkup
Connector  0A Present <-> Switch Port 20 up (Enabled)
Connector  1A Present <-> Switch Port 22 up (Enabled)
Connector  2A Present <-> Switch Port 24 up (Enabled)
.
.
.
Connector 15A Not present
Connector 0A-ETH Present
 Bridge-0-1 Port 0A-ETH-1 up (Enabled)
 Bridge-0-1 Port 0A-ETH-2 up (Enabled)
 Bridge-0-0 Port 0A-ETH-3 up (Enabled)
 Bridge-0-0 Port 0A-ETH-4 up (Enabled)
Connector 1A-ETH Present
 Bridge-1-1 Port 1A-ETH-1 up (Enabled)
 Bridge-1-1 Port 1A-ETH-2 up (Enabled)
 Bridge-1-0 Port 1A-ETH-3 up (Enabled)
 Bridge-1-0 Port 1A-ETH-4 up (Enabled)
Connector 0B Present <-> Switch Port 19 up (Enabled)
Connector 1B Present <-> Switch Port 21 up (Enabled)
.
.
.
Connector 15B Not present
#

12.1.6 Verifying the InfiniBand Fabric

Use the following commands on the command-line interface (CLI) to verify that the InfiniBand fabric is operational:

  1. ibnetdiscover

    Discovers and displays the InfiniBand fabric topology and connections. See Discovering the InfiniBand Network Topology.

  2. ibdiagnet

    Performs diagnostics upon the InfiniBand fabric and reports status. See Performing Diagnostics on the InfiniBand Fabric.

  3. ibcheckerrors

    Checks the entire InfiniBand fabric for errors. See Validating and Checking Errors in the InfiniBand Fabric.

12.1.6.1 Discovering the InfiniBand Network Topology

To discover the InfiniBand network topology and build a topology file which is used by the OpenSM Subnet Manager, run the following command on the command-line interface (CLI) of a gateway switch:

# ibnetdiscover

The output is displayed, as in the following example:

The topology file is used by InfiniBand commands to scan the InfiniBand fabric and validate the connectivity as described in the topology file, and to report errors as indicated by the port counters.

# Topology file: generated on Sat Apr 13 22:28:55 2002
#
# Max of 1 hops discovered
# Initiated from node 0021283a8389a0a0 port 0021283a8389a0a0
vendid=0x2c9
devid=0xbd36
sysimgguid=0x21283a8389a0a3
switchguid=0x21283a8389a0a0(21283a8389a0a0)
Switch   36 "S-0021283a8389a0a0" # "Sun DCS 36 QDR switch localhost" enhanced port 0 lid 15 lmc 0
[23]    "H-0003ba000100e388"[2](3ba000100e38a) # "nsn33-43 HCA-1" lid 14 4xQDR
vendid=0x2c9
devid=0x673c
sysimgguid=0x3ba000100e38b
caguid=0x3ba000100e388
Ca   2 "H-0003ba000100e388" # "nsn33-43 HCA-1"
[2](3ba000100e38a)   "S-0021283a8389a0a0"[23] # lid 14 lmc 0 "Sun DCS 36 QDR switch localhost" lid 15 4xQDR

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.6.2 Performing Diagnostics on the InfiniBand Fabric

To perform a collection of tests on the InfiniBand fabric and generate several files that contain parameters and aspects of the InfiniBand fabric, run the following command on the command-line interface (CLI) on a gateway switch:

# ibdiagnet

In the following example, the ibdiagnet command is minimized to determine which links are utilized:

# ibdiagnet -lw 4x -ls 10 -skip all

Loading IBDIAGNET from: /usr/lib/ibdiagnet1.2
-W- Topology file is not specified.
 Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib/ibdm1.2
-I- Using port 0 as the local port.
-I- Discovering ... 2 nodes (1 Switches & 1 CA-s) discovered.
.
.
.
-I- Links With links width != 4x (as set by -lw option)
-I---------------------------------------------------
-I- No unmatched Links (with width != 4x) were found
-I---------------------------------------------------
-I- Links With links speed != 10 (as set by -ls option)
-I---------------------------------------------------
-I- No unmatched Links (with speed != 10) were found
.
.
.
-I- Stages Status Report:
 STAGE               Errors Warnings
 Bad GUIDs/LIDs Check               0   0 
 Link State Active Check               0   0 
 Performance Counters Report               0   0 
 Specific Link Width Check               0   0 
 Specific Link Speed Check               0   0 
 Partitions Check               0   0 
 IPoIB Subnets Check               0   0 
Please see /tmp/ibdiagnet.log for complete log
----------------------------------------------------------------
-I- Done. Run time was 1 seconds.

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.6.3 Validating and Checking Errors in the InfiniBand Fabric

Use the ibcheckerrors command that uses the topology file to scan the InfiniBand fabric and validate the connectivity as described in the topology file, and to report errors as indicated by the port counters.

On the command-line interface (CLI), enter the following command:

# ibcheckerrors

## Summary: 4 nodes checked, 0 bad nodes found
##      34 ports checked, 0 ports have errors beyond threshold

Note:

The actual output for your InfiniBand fabric will differ from that in the example.

12.1.7 Monitoring a Gateway Switch Using Web Interface

  1. Open a web browser and go to the following URL:

    http://gateway-IP

    where gateway-IP is the IP address of a gateway switch.

  2. Log in to the interface as the root user.

  3. Click the Switch/Fabric Monitoring Tools tab.

  4. Click Launch Sun DCS GW Monitor.

    The Fabric Monitor is displayed.

12.2 Understanding Administrative Commands

The following topics provide an overview of administrative tasks and the command sets to perform those tasks. Administering the gateway requires accessing the command-line interface (CLI), which is also referred to as the management controller.

This section contains the following topics:

12.2.1 Hardware Command Overview

The CLI (management controller) uses a simplified Linux operating system and file system. From the # prompt on the CLI, you can type hardware commands to perform some administrative and management tasks. Hardware commands are user-friendly and can perform some testing upon the switch chip, enabling greater control of a gateway switch and its operation.

After you log in to the root account, the shell prompt (#) appears, and you can enter shell commands. Enter the hardware commands in the following format:

# command [arguments][arguments]...

12.2.2 InfiniBand Command Overview

The InfiniBand commands are a means of monitoring and controlling aspects of the InfiniBand fabric. These commands are also installed on and run from the CLI, which is also the host of the Subnet Manager. Use of these commands requires thorough knowledge of InfiniBand architecture and technology.

After you log in to the root account, the shell prompt (#) appears, and you can enter shell commands. Enter the InfiniBand commands in the following format:

# command [option][option] ...

12.3 Managing InfiniBand Network Using Subnet Manager

This section contains the following topics:

12.3.1 Overview of Subnet Manager

The subnet manager (SM) manages all operational characteristics of the InfiniBand network, such as the following:

  • Discovering the network topology

  • Assigning a local identifier (LID) to all ports connected to the network

  • Calculating and programming switch forwarding tables

  • Programming Partition Key (PKEY) tables at HCAs and switches

  • Programming QoS tables (Service Level to Virtual Lane mapping tables, and Virtual Lane arbitration tables)

  • Monitoring changes in the fabric

The InfiniBand network typically has more than one SM, but only one SM is active at a time. The active SM is Master SM, others are Standby SMs. If the master SM shuts down or fails, a standby SM will automatically become the master SM.

Note:

In the Exalogic machine, the InfiniBand switches (both leaf and spine) are automatically configured to separate the IP over InfiniBand (IPoIB) traffic and the Ethernet over InfiniBand (EoIB) traffic.

12.3.2 Running the Subnet Manager in Different Rack Configurations

Table 12-2 provides information about the switches on which the subnet manager should run in different rack configurations.

Table 12-2 Running the Subnet Manager in Different Rack Configurations

Rack Configuration SM Should Run On... SM Priority

Single Exalogic machine

All leaf switches

All leaf switches: 5

Two half- or full-rack Exalogic machines

Spine switches

Spine switch: 8

Two quarter-rack Exalogic machines

All leaf switches

All leaf switches: 5

Three or more Exalogic machines

Spine switches

Spine switch: 8

Half- or full-rack Exalogic machine connected to a half- or full-rack Exadata machine

See also: "Running the SM in Configurations with Varying Switch Firmware Versions"

Spine switches

Spine switch: 8

Quarter-rack Exalogic machine connected to a quarter-rack Exadata machine

See also: "Running the SM in Configurations with Varying Switch Firmware Versions"

All leaf switches

All leaf switches: 5

Two or more Exalogic machines connected to two or more Exadata machines

See also: "Running the SM in Configurations with Varying Switch Firmware Versions"

Spine switches

Spine switch: 8


Running the SM in Configurations with Varying Switch Firmware Versions

In a multirack configuration consisting of both Exalogic and Exadata machines, if firmware upgrades result in switches with varying firmware versions across the configuration, the SM should run on only the switches with the latest firmware version. This is necessary to benefit from the features of the latest firmware.

Note that the SM should run on at least two switches in the fabric.

Consider a configuration that consists of three or more spine switches—for example, two Exalogic machines connected to two Exadata machines—but with varying firmware versions.

  • If two or more of the available spine switches, across the configuration, have the highest firmware version, the SM should run on those spine switches, with the priority set to 8.

  • If only one of the spine switches in the entire configuration has the highest firmware version:

    • The SM should run on that spine switch. The SM priority should be set to 8.

    • In addition, the SM should run on one or more leaf switches having the latest firmware version. The SM priority of the leaf switches should be set to 5.

    In this case, running the SM on one or more leaf switches, besides running it on the spine switch, is necessary to fulfill the requirement that at least two SMs should be running in the fabric.

For more information about running the subnet manager, see the following topics:

12.3.3 Monitoring the Subnet Manager

This section contains the following topics:

12.3.3.1 Displaying the Subnet Manager Status

If you want to quickly determine your Subnet Manager's priority and state, you can use the sminfo command.

On the command-line interface (CLI), run the following command:

# sminfo

The output is displayed, as in the following example:

sminfo: sm lid 15 sm guid 0x21283a8389a0a0, activity count 32046 priority 8 state3 SMINFO_MASTER

In the example output, the Subnet Manager's hosting HCA has LID 15 and GUID 0x21283a8620b0f0. The Subnet Manager has a priority of 8 (high) and its state is 3 (master).

12.3.3.2 Displaying Recent Subnet Manager Activity

On the command-line interface (CLI), run the following command:

# getmaster -l

The output is displayed, as in the following example:

# getmaster -l
Last ring buffer history listed:
whereismaster-daemon is running
20091204 15:00:53 whereismaster started
20091204 15:00:55 No OpenSM Master seen in the system
20091204 15:06:19 OpenSM Master on Switch : 0x0002c9000100d050 ports 36 Sun DCS
36 QDR switch o4nm2-36p-2.norway.test.com enhanced port 0 lid 7 lmc 0

12.3.4 Controlling the Subnet Manager

This section contains the following topics:

12.3.4.1 Identifying the Location of Master Subnet Manager

From any InfiniBand switch in the network (leaf switch or spine switch), log in as root and run the getmaster command to obtain the location of the master SM as follows:

# getmaster

This command displays the host name or IP address and the IP address of the switch where the master SM is running.

12.3.4.2 Relocating Master Subnet Manager

You are required to relocate the master SM from a leaf switch (Sun Network QDR InfiniBand Gateway Switch) to the spine switch (Sun Datacenter InfiniBand Switch 36) when you are connecting more than one Exalogic machine. This step is also necessary when you are connecting an Exalogic machine to an Oracle Exadata Database Machine.

Relocating the master SM does not affect the availability of the InfiniBand network. You can perform this task while normal workload is running.

To relocate the master SM from a leaf switch (Sun Network QDR InfiniBand Gateway Switch) to the spine switch (Sun Datacenter InfiniBand Switch 36):

  1. Identify the location of the master SM, as described in Identifying the Location of Master Subnet Manager.

  2. If the master SM is not running on a spine switch, log in as a root user to the leaf switch where the master SM is located.

  3. Disable SM on the switch, as described in Disabling Subnet Manager on a Switch. This step relocates the master SM to another switch in the network.

  4. Perform the above steps until the master SM relocates to the spine switch (Sun Datacenter InfiniBand Switch 36).

  5. Enable SM on the leaf switches where SM was disabled during this procedure. For information about enabling SM on a switch, see Enabling Subnet Manager on a Switch.

12.3.4.3 Enabling Subnet Manager on a Switch

To enable SM on a switch:

  1. Log in as a root user.

  2. At the command prompt, run the following command:

    # enablesm

12.3.4.4 Disabling Subnet Manager on a Switch

To disable SM on a switch:

  1. Log in as a root user.

  2. At the command prompt, run the following command:

    # disablesm

12.4 Working with the Default Rack-Level InfiniBand Partition

This section contains the following topics:

12.4.1 Partition in Exalogic Machine

By default, the Exalogic machine includes a single partition at the rack level. All Exalogic compute nodes and the storage appliance are full members of this default partition.

Note:

Oracle recommends that you create IP subnets over the default IP over InfiniBand (IPoIB) link to implement isolate application deployments in the Exalogic environments. Each IP subnet will have a single multicast domain. When you create IP subnets, ensure that each of the interfaces per Exalogic compute node for these additional IP subnets above the default IPoIB subnet is bonded, for high availability (HA) purposes.

For more information, see the "Application Isolation by Subnetting over IPoIB" topic in the Oracle Exalogic Enterprise Deployment Guide.

12.4.2 Verifying the Default Partition

You can verify the default partition and the partition key by running the smpartition list command on the command-line interface (CLI) for one of the gateway switches.

12.5 What Next?

The Sun Network QDR InfiniBand Gateway Switch is installed with the default vNICs (vnic0 and vnic1) configured on separate Sun Network QDR InfiniBand Gateway Switches for the Ethernet over InfiniBand (EoIB) BOND1 interface.

Optionally, you can create VLANs and vNICs using the InfiniBand gateway switches.