C H A P T E R  2

Replacing the Sun Blade 6048 Switched InfiniBand Network ExpressModule

This chapter describes how to replace a Sun Blade 6048 Switched InfiniBand Network ExpressModule (IB NEM) in a powered-on Sun Blade 6048 Series Chassis. This chapter also includes instructions to verify that the replacement IB NEM has been installed correctly.

This chapter contains the following sections:



caution icon Caution - Damage to the IB NEM can occur as the result of careless handling or electrostatic discharge (ESD). Always handle an IB NEM with care to avoid damage to electrostatic sensitive components. To minimize the possibility of ESD-related damage, Sun strongly recommends using both a workstation antistatic mat and an ESD wrist strap. You can get an ESD wrist strap from any reputable electronics store or from Sun as part number 250-1007.


You can install the IB NEM in the following Sun Blade 6048 Series Chassis:


2.1 Replacing IB NEM Hardware

If an IB NEM fails or if you choose to change the I/O configuration, you will need to replace the IB NEM. You can replace an IB NEM in a powered-on Sun Blade 6048 Series Chassis using a hot-plug operation.

If you are removing but not replacing the IB NEM, you must install both IB NEM filler panels to meet FCC limits for electromagnetic interference (EMI) and to ensure proper airflow and cooling.

If you encounter a problem replacing the IB NEM, see Section 2.4, Troubleshooting a Hot-Remove Operation.



Note - If you are installing a IB NEM in a Sun Blade 6048 Series Chassis that has not been powered on, see the Sun Blade 6048 Series Installation Guide (Sun Part number:
820-2312).


The IB NEMs are customer-replaceable units (CRUs).

2.1.1 Replace IB NEM in a Powered-On Chassis

1. Identify which IB NEM to replace.

If the amber Service Action Required LED is lit, this indicates a problem with a specific IB NEM. Otherwise, you can choose any IB NEM to replace if, for example, you want to change the I/O configuration.

2. Prepare the IB NEM for a hot-plug procedure. Use either of these methods:

The green OK LED will blink for up to one minute, indicating that the IB NEM is being prepared for removal.

To abort the operation, press the Attention button again within five seconds.

Once the green LED goes dark and the blue LED is illuminated, you can safely remove the IB NEM.

If the IB NEM fails the hot-plug preparation and its Ready-to-Remove indicator does not light, see Section 2.4, Troubleshooting a Hot-Remove Operation.

3. When the blue Ready-to-Remove LED is lit, physically remove the IB NEM as follows:

a. Remove all cables from the IB NEM.

b. Press the latch on both ejector levers inward at the same time.

c. Swing out the ejector levers to their fully open position.

d. Slide the IB NEM out of its slot.

Support the weight of the IB NEM with one hand at the bottom of the IB NEM.

2.1.2 Install IB NEM in a Powered-On Chassis

1. Align the replacement IB NEM with the bottom IB NEM slot.

Ensure that the port connectors are facing toward you and that the ejector levers are fully open.

2. Align the replacement IB NEM with the chassis guidance system, and slide the IB NEM into its slot until the ejector levers engage and start to close.

Ensure that the IB NEM engages with the chassis guidance system. Failure to align the IB NEM correctly can result in damage to the IB NEM's internal connections to the chassis midplane.

3. Close the levers to secure the IB NEM in its slot. The levers click when locked.

Ensure that the back plate on the module mounts flush with the chassis panel opening.

The green OK indicator on the IB NEM should be in Standby Blink mode.

4. Connect the InfiniBand cables to the IB NEM port connectors.

5. Ensure that the connectors are properly engaged.

The connectors click when locked.



caution icon Caution - Avoid putting unnecessary stress on the connection. Do not bend or twist the cable near the connectors, and avoid sharp cable bends of more than
90 degrees.


6. If you have not already done so, connect the other end of the InfiniBand cables to the appropriate ports on an InfiniBand switch.

7. Press the Attention button to notify the Sun Blade Server Modules (host operating systems) of the IB NEM.

After you physically install the IB NEM, the Chassis Monitoring Module (CMM) automatically detects the presence of the IB NEM. The green OK indicator on the IB NEM transitions from Standby Blink to Steady On when the IB NEM is operational.



Note - If you are replacing an IB NEM, you do not need to install the InfiniBand software packages. The appropriate software package will have been installed and configured as part of the initial IB NEM installation.


8. Verify that the IB NEM is working properly.

See Section 2.2.1, Verify Hardware Installation.


2.2 Verifying Installation

If you have not installed the IB NEM in the chassis and connected it to an operational InfiniBand switch, do so before you attempt to verify the installation. The InfiniBand switch should automatically recognize InfiniBand servers when the servers are connected to the fabric.

2.2.1 Verify Hardware Installation

1. Once you have physically installed the IB NEM and ensured that the cables are connected to the IB NEM and switches, ensure that an IB subnet manager is running on the connected InfiniBand fabric (network).

If the green port LED is illuminated, you have successfully completed the hardware installation and you can proceed to verification through the ILOM interfaces. The green LED indicates that the port is enabled, that is, that a physical link to a remote switch (or, possibly an HCA) has been established.

If the port LEDs are not illuminated, one possible cause might be that the InfiniBand drivers are not installed. You cannot verify a complete installation on Linux until you install these drivers.

2. You can now examine hardware status through one of the ILOM (Integrated Lights Out Manager) interfaces. Use one of the following procedures:

For a description of the possible states of the IB NEM LEDs, see Section 2.2.4, Verify Component Status Using the LEDs.

2.2.2 Verify Installation Using the ILOM Web Interface

1. Log in to the ILOM web interface using the IP address of the active CMM.

The initial page of the ILOM web interface appears, providing visual verification of successful hardware installation. Note the image of the installed IB NEM in the view of the back of the chassis.


2. In the left navigation pane, select CMM.

The ILOM Version Information page appears.

3. Select the System Information tab and then select the Components tab.

The Component Management page appears.


4. Select the IB NEM component name.

You might need to scroll down in the Component Management Status page.

The ILOM page showing the IB NEM status details appears.


Note that the IB NEM prepare_to_remove status is NotReady and that the bladen_link_status is Connected, indicating successful hardware installation.

5. If you are physically near the IB NEM, you can examine its LEDs to verify that it has returned the expected feedback.

See Section 2.2.4, Verify Component Status Using the LEDs.

2.2.3 Verify Installation Using the ILOM CLI

1. Log in to the ILOM CLI.

2. To find the IB NEM in your system, enter:

> show /CH/

3. To verify that the IB NEM is installed, that is, that the Ready To Remove status is Not Ready, enter:

> show /CH/component

where component is NEMn. The system returns the following message:


> show /CH/NEM0
  Targets:
   SP
   SEEPROM
 
Properties:
   type = Network Express Module
   fru_part_number = 375-3551-03
   fru_serial_number = 0000000-0748000416
   fru_name = (none)
 
Commands:
cd
show

Note that the IB NEM prepare_to_remove status is NotReady and that the bladen_link_status is Connected, indicating successful hardware installation.

4. If you are physically near the IB NEM, you can examine its LEDs to verify that it has returned the expected feedback.

See Section 2.2.4, Verify Component Status Using the LEDs.

2.2.4 Verify Component Status Using the LEDs

single-step bullet  Verify the status of the IB NEM using the LEDs.

TABLE 2-1 lists the possible combinations for the IB NEM LEDs and the status of the IB NEM indicated by these combinations.


TABLE 2-1 LED Combinations and IB NEM Status

Blue (Top)

Amber (Middle)

Green (Bottom)

IB NEM Status

On

Off

Off

Ready to remove

Off

On

Off

Service attention required

Off

Off

Very Slow Blink

Standby

Off

Off

Slow Blink

Connecting (or disconnecting) links to blades

Off

Off

On

Links connected to blades


To verify successful IB NEM insertion, check the following LEDs and the ILOM interface status:

1. At module insertion, the green OK indicator goes to Standby Blink.

2. When you press the Attention button to activate the IB NEM, the green OK indicator transitions to Slow Blink.

3. When all links to active blades have been made, the green OK indicator transitions to Steady On.

4. In the ILOM interfaces, the Ready to Remove status shows Not Ready.


2.3 Verify Installation on Linux

single-step bullet  To determine whether the IB NEM is visible to the Linux OS, enter the lspci command.

Output similar to the following appears.


> lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
...
01: 01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 02)02:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
02:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
0a:00.0 InfiniBand: Mellanox Technologies Unknown device 634a (rev a0)

The last entry in the sample output (InfiniBand: Mellanox Technologies) verifies the hardware installation and confirms the IB NEM’s availability to the Linux host.

 


2.4 Troubleshooting a Hot-Remove Operation

Because IB NEMs are shared resources, all Sun Blade Server Modules must respond favorably to the PCI hot-remove request. However, a blade might not relinquish the link to a IB NEM if, for instance, there are busy NFS mounted volumes, file transfers, and so on.

To determine the state of the IB NEM-to-blade connections, you can use the ILOM web interface or the ILOM command-line interface, as described in the following procedures.

2.4.1 Troubleshoot Using the ILOM Web Interface

1. To verify IB NEM-to-blade connections, log in to the ILOM web interface for the CMM.

2. In the left navigation pane, select CMM.

The ILOM Version Information page appears.

3. Select the System Information tab and then select the Components tab.

The Component Management page appears.

4. Click on the IB NEM component name.

A page displaying properties and values for the selected IB NEM appears.


As shown, the system responds with bladen_link_status entries for each blade (where n is the blade module number). Any blade not reporting a Not_present status needs intervention from the host OS on that blade. This intervention from the host OS depends entirely on the OS that is active on the blade. Each supported OS has a different method for managing attached devices.

5. Perform the appropriate host OS procedure for releasing the IB NEM from the blade.

6. Re-execute the steps in Section 2.1.1, Replace IB NEM in a Powered-On Chassis.

2.4.2 Troubleshoot Using the ILOM CLI

1. To verify IB NEM-to-blade connections, log in to the ILOM CLI.

2. Enter the following command, where n is the number of the IB NEM in question.


> show /CH/NEMn
  Targets:
   SERVICE
   OK2RM
   LOCATE
   OK
   ATTN_BTN
   LOCATE_BTN
   T_AMB
   T_CORE
 
Properties:
   type = Network ExpressModule FRU
   board_part_number = 501-7460-04
   board_serial_number = 0060HSV-0649123404
   board_product_name = ASSY,ANDY,NEM,IB_PASS_THROUGH_MODULE
   product_name = SUN Blade 6000 NEM IB DDR 10PT UNSWITCHED
   product_manufacturer = SUN MICROSYSTEMS
   product_version = (none)
   product_part_number = (none)
   product_serial_number = (none)
   fault_state = OK
   clear_fault_action = (none)
   prepare_to_remove_status = Ready
   prepare_to_remove_action = (none)
   return_to_service_action = (none)
   blade0_link_status = Not_present
   blade2_link_status = Not_present
   blade4_link_status = Not_present

As shown, the system responds with bladen_link_status entries for each blade (where n is the blade module number). Any blade not reporting a Not_present status needs intervention from the host OS on that blade. This intervention from the host OS depends entirely on the OS that is active on the blade. Each supported OS has a different method for managing attached devices.

3. Perform the appropriate host OS procedure for releasing the IB NEM from the blade.

4. Re-execute the steps in Section 2.1.1, Replace IB NEM in a Powered-On Chassis.