C H A P T E R  11

Removing and Replacing Boards

The Sun Fire 6800/4810/4800/3800 Systems Service Manual and the Sun Fire E6900/E4900 Systems Service Manual provide instructions for physically removing and replacing boards. However, board removal and replacement also involves firmware steps that must be performed before a board is removed from the system and after a new board replaces the old one. This chapter discusses the firmware steps involved with the removal and replacement of the following boards, cards, and assemblies:

This chapter also discusses how to unassign a board from a domain and disable the board.

To troubleshoot board and component failures, see Board and Component Failures. To remove and install the FrameManager, ID board, power supplies, and fan trays, refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual and the Sun Fire E6900/E4900 Systems Service Manual.

Before you begin, have the following books available:

You will need these books for Solaris operating environment steps and the hardware removal and installation steps. The Sun Hardware Platform Guide and the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide are available with your Solaris operating environment release.


CPU/Memory Boards and I/O Assemblies

The following procedures describe the software steps involved with:

Refer to the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide for details on:


procedure icon  To Remove and Replace a System Board

This procedure does not involve dynamic reconfiguration commands.

1. Access the domain that contains the board or assembly to be removed by performing the following:

a. Connect to the domain console.

For details on accessing the domain console, see To Navigate Between The Platform Shell And a Domain and To Go From a Domain Shell To a Domain Console.

b. Halt the Solaris operating environment from the domain console as superuser.


root# init 0
ok

c. Type the escape sequence to get to the domain shell prompt.

By default, the escape sequence is #., the pound sign, followed by a period.


ok #.
schostname:A>

The domain shell prompt is displayed.

2. Turn the domain keyswitch to the standby position with the
setkeyswitch standby command and then power off the board or assembly.


schostname:A> setkeyswitch standby
schostname:A> poweroff board_name

where board_name is sb0 - sb5 or ib6 - ib9.


Verify the green power LED ( Power on LED icon)is off.

3. Remove the board or assembly and replace with a new board or assembly.

Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

4. Power on the board or assembly.


schostname:SC> poweron board_name

where board_name is sb0-sb5 or ib6-ib9.

5. Check the version of the firmware that is installed on the board by using the showboards command:


schostname:SC> showboards -p version

The firmware version of the new replacement board must be compatible with the system controller firmware.

6. If the firmware version of the replacement board or assembly is not compatible with the SC firmware, update the firmware on the board.

a. Use the flashupdate -c command to update the firmware from another board in the current domain.


schostname:SC> flashupdate -c source_board destination_board

For details on the flashupdate command syntax, refer to the command description in the Sun Fire Midrange System Controller Command Reference Manual.

b. After you run the flashupdate command to update the board firmware to a compatible firmware version, and if the board is in a Failed state, as indicated by showboards output, power off the board to clear the Failed state.

7. Before you bring an I/O assembly back to the Solaris operating environment, test the I/O assembly in a spare domain that contains at least one CPU/Memory board with a minimum of one CPU.

a. Enter a spare domain.

b. Test the I/O assembly.

See Testing an I/O Assembly.

8. Turn the domain keyswitch to the on position with the setkeyswitch on command.


schostname:A> setkeyswitch on

This command turns the domain on and boots the Solaris operating environment if the OpenBoot PROM parameters are set as follows:

If the Solaris operating environment did not boot automatically, continue with Step 9. If the appropriate OpenBoot PROM parameters are not set up to take you to the login: prompt, you will see the ok prompt. For more information on the OpenBoot PROM parameters, refer to the OpenBoot documentation included in the Sun Hardware Documentation Set.

9. At the ok prompt, type the boot command:


ok boot

After the Solaris operating environment is booted, the login: prompt is displayed.


procedure icon  To Unassign a Board From a Domain or Disable a System Board

If a CPU/Memory board or I/O assembly fails, perform one of the following tasks:


procedure icon  To Hot-Swap a CPU/Memory Board Using DR

1. Use DR to unconfigure and disconnect the CPU/Memory board out of the domain.

Refer to the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide.

2. Verify the state of the LEDs on the board.

Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

3. Remove and replace the board.

Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

4. Power on the board:


schostname:SC> poweron board_name

where board_name is sb0-sb5 or ib6-ib9.

5. Check the version of the firmware that is installed on the board by using the showboards command:


schostname:SC> showboards -p version

The firmware version of the new replacement board must be compatible with the system controller firmware.

6. If the firmware version of the replacement board or assembly is not compatible with the SC firmware, use the flashupdate -c command to update the firmware from another board in the current domain.


schostname:SC> flashupdate -c source_board destination_board

For a description of command syntax, refer to the flashupdate command in the Sun Fire Midrange System Controller Command Reference Manual.

7. Use DR to connect and configure the board back into the domain.

Refer to the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide.

8. Verify the state of the LEDs on the board.

Refer to the CPU/Memory board chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.


procedure icon  To Hot-Swap an I/O Assembly Using DR

The following procedure describes how to hot-swap an I/O assembly and test it in a spare domain that is not running the Solaris operating environment.

1. Use DR to unconfigure and disconnect the I/O assembly out of the domain.

Refer to the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide

2. Verify the state of the LEDs on the assembly.

Refer to the I/O assembly chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

3. Remove and replace the assembly.

Refer to the I/O assembly chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

4. Power on the board:


schostname:SC> poweron board_name

5. Check the version of the firmware that is installed on the assembly by using the showboards command:


schostname:SC> showboards -p version

The firmware version of the new replacement board must be compatible with the system controller firmware.

6. If the firmware version of the replacement board or assembly is not compatible with the SC firmware, use the flashupdate -c command to update the firmware from another board in the current domain:


schostname:SC> flashupdate -c source_board destination_board

For details on the flashupdate command syntax, refer to the command description in the Sun Fire Midrange System Controller Command Reference Manual.

7. Before you bring the board back to the Solaris operating environment, test the I/O assembly in a spare domain that contains at least one CPU/Memory board with a minimum of one CPU.

a. Enter a spare domain.

b. Test the I/O assembly.

For details, see Testing an I/O Assembly.

8. Use DR to connect and configure the assembly back into the domain running the Solaris operating environment.

Refer to the Sun Fire Midrange Systems Dynamic Reconfiguration User Guide.


CompactPCI and PCI Cards

If you need to remove and replace a CompactPCI or PCI card, use the procedures that follow. These procedures do not involve DR commands. For additional information on replacing CompactPCI and PCI cards, refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.


procedure icon  To Remove and Replace a PCI Card

1. Halt the Solaris operating environment in the domain, power off the I/O assembly, and remove it from the system.

Complete Step 1 and Step 2 in To Remove and Replace a System Board.

2. Remove and replace the card.

Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

3. Replace the I/O assembly and power it on.

Complete Step 3 and Step 4 in To Remove and Replace a System Board.

4. Reconfigure booting of the Solaris operating environment in the domain.

At the ok prompt, type boot -r.


ok boot -r


procedure icon  To Remove and Replace a CompactPCI Card

1. Halt the Solaris operating environment in the domain, power off the I/O assembly, and remove it from the system.

Complete Step 1 and Step 2 in To Remove and Replace a System Board.

2. Remove and replace the CompactPCI card from the I/O assembly.

For details, refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

3. Reconfigure booting of the Solaris operating environment in the domain.

At the ok prompt, type boot -r.


ok boot -r


Repeater Boards

This section discusses the firmware steps necessary to remove and replace a Repeater board. Only the Sun Fire E6900/E4900/6800/4810/4800 systems have Repeater boards. The Sun Fire 3800 system has the equivalent of two Repeater boards on the active centerplane.


procedure icon  To Remove and Replace a Repeater Board

1. Determine which domains are active by typing the showplatform -p status system controller command from the platform shell.

2. Determine which Repeater boards are connected to each domain (TABLE 11-1).


TABLE 11-1 Repeater Boards and Domains

System

Partition Mode

Repeater Board Names

Domain IDs

Sun Fire E6900 and 6800 systems

Single partition

RP0, RP1, RP2, RP3

A, B

Sun Fire E6900 and 6800 systems

Dual partition

RP0, RP1

A, B

Sun Fire E6900 and 6800 systems

Dual partition

RP2, RP3

C, D

Sun Fire 4810 system

Single partition

RP0, RP2

A, B

Sun Fire 4810 system

Dual partition

RP0

A

Sun Fire 4810 system

Dual partition

RP2

C

Sun Fire E4900 and 4800 systems

Single partition

RP0, RP2

A, B

Sun Fire E4900 and 4800 systems

Dual partition

RP0

A

Sun Fire E4900 and 4800 systems

Dual partition

RP2

C

Sun Fire 3800 system

 

Equivalent of two Repeater boards integrated into the active centerplane.


3. Complete the steps to

Complete Step 1 through Step 3 in To Power Off the System.

4. Power off the Repeater board with the poweroff command.


schostname:SC> poweroff board_name

where board_name is the name of the Repeater board (rp0, rp1, rp2, or rp3).


5. Verify that the green power LED ( Power on LED icon)is off.


caution icon

Caution - Be sure you are properly grounded before you remove and replace the Repeater board.



6. Remove and replace the Repeater board.

Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual and the Sun Fire E6900/E4900 Systems Service Manual.

7. Boot each domain using the boot procedure described in To Power On the System.


System Controller Board

This section discusses how to remove and replace a System Controller board.


procedure icon  To Remove and Replace the System Controller Board in a Single SC Configuration



Note - This procedure assumes that your system controller has failed and that there is no spare system controller.



1. For each active domain, use an SSH or Telnet session to access the domain (see Chapter 2 for details), and halt the Solaris operating environment in the domain.



caution icon

Caution - Because you do not have access to the console, you will not be able to determine when the operating environment is completely halted. Wait until you can best judge that the operating environment has halted.



2. Turn off the system completely.



caution icon

Caution - Be sure to power off the circuit breakers and the power supply switches for the Sun Fire 3800 system. Make sure you power off all the hardware components to the system.



Refer to the "Powering Off and On" chapter in the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

3. Remove the defective System Controller board and replace the new System Controller board.

Refer to the "System Controller Board" chapter in the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

4. Check the firmware version of the new replacement board by using the showsc command:


schostname:SC> showsc

The firmware version of the new System Controller board must be compatible with other components in the system. If the firmware version is not compatible, use the flashupdate command to upgrade or downgrade the firmware on the new system controller board. Refer to the Install.info file for instructions on upgrading or downgrading system controller firmware.

5. Power on the redundant transfer units (RTUs), AC input boxes, and the power supply switches.

Refer to the "Powering Off and On" chapter in the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual. When the specified hardware is powered on, the System Controller board will automatically power on.

6. Do one of the following:

You must have saved the latest platform and domain configurations of your system with the dumpconfig command in order to restore the latest platform and domain configurations with the restoreconfig command. For command syntax and examples, see the restoreconfig command in the Sun Fire Midrange System Controller Command Reference Manual.



Note - When you insert a new System Controller board into the system, it is set to the default values of the setupplatform command. It is set to DHCP, which means the system controller will use DHCP to get to its network settings.

If DHCP is not available (there is a 120-second timeout waiting period), then the System Controller board will boot and the network (setupplatform -p net) will need to be configured before you can type the restoreconfig command.



7. Check the date and time for the platform and each domain.

Type the showdate command in the platform shell and in each domain shell.

If you need to reset the date or time, go to Step 8. Otherwise, skip to Step 9.

8. Set the date and time for the platform and for each domain (if needed).

a. Set the date and time for the platform shell.

See the setdate command in the Sun Fire Midrange System Controller Command Reference Manual.

b. Set the date for each domain shell.

9. Check the configuration for the platform by typing showplatform at the platform shell. If necessary, run the setupplatform command to configure the platform.

See To Configure Platform Parameters.

10. Check the configuration for each domain by typing showdomain in each domain shell. If necessary, run the setupdomain command to configure each domain.

See To Configure Domain-Specific Parameters.

11. Boot the Solaris operating environment in each domain you want powered on.

12. Complete Step 4 and Step 5 in To Power On the System.


procedure icon  To Remove and Replace a System Controller Board in a Redundant SC Configuration



Note - When you replace a pair of System Controller boards with SC V2s (enhanced-memory SCs), replace the spare SC first, perform a manual failover, and then replace the other SC as described in the steps below. Mixed SC versions are not supported, except during the brief period in which the main and spare SCs are upgraded to SC V2s.



1. Run the showsc or showfailover -v command to determine which system controller (SC) is the main.

2. If the working SC (the one that is not to be replaced) is not the main, perform a manual failover:


schostname:sc> setfailover force

The working system controller becomes the main SC.

3. Power off the system controller to be replaced:


schostname:SC> poweroff component_name

where component_name is the name of the System Controller board to be replaced, either SSC0 or SSC1.

The System Controller board is powered off, and the hot-plug LED is illuminated. A message indicates when you can safely remove the system controller.

4. Remove the System Controller board to be replaced and insert the new System Controller board.

The new System Controller board powers on automatically.

5. Verify that the firmware on the new system controller matches the firmware on the working SC.

You can use the showsc command to check the firmware version (the ScApp version) running on the system controller. If the firmware versions do not match, use the flashupdate command to upgrade or downgrade the firmware on the new system controller so that it matches the firmware version of the other SC. Refer to the Install.info file for details.

6. Re-enable SC failover by running the following command on the main or spare SC:


schostname:SC> setfailover on 


ID Board and Centerplane

This section explains how to remove and replace an ID board and centerplane.


procedure icon  To Remove and Replace an ID Board and Centerplane

1. Before you begin, be sure to have a terminal connected to the serial port of the system controller and have the following information available (it will be used later in this procedure):

You can find information on labels affixed to the system. Refer to the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual for more information on label placement.

In most cases, when only the ID board and centerplane are replaced, the original System Controller board will be used. The above information was already cached by the system controller and will be used to program the replacement ID board. You will be asked to confirm the above information.

2. Complete the steps to remove and replace the centerplane and ID board.

Refer to the "Centerplane and ID Boards" chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.



Note - The ID board can be written only once. Exercise care to manage this replacement process carefully. Any errors may require a new ID board.



3. After removing and replacing the ID board, make every attempt to use the original System Controller board installed in slot ssc0 in this system.

Using the same System Controller board allows the system controller to automatically prompt with the correct information.

4. Power on the hardware components.

Refer to the "Power Off and On" chapter of the Sun Fire 6800/4810/4800/3800 Systems Service Manual or the Sun Fire E6900/E4900 Systems Service Manual.

The system controller boots automatically.

5. If you have a serial port connection, access the console for the system controller because the system will prompt you to confirm the board ID information (CODE EXAMPLE 11-1).

The prompting will not occur with a remote connection (SSH or telnet).


CODE EXAMPLE 11-1 Confirming Board ID Information
It appears that the ID Board has been replaced.
Please confirm the ID information:
(Model, System Serial Number, Mac Address Domain A, HostID Domain A, COD Status)
Sun Fire 4800, 45H353F, 08:00:20:d8:a7:dd, 80d8a7dd, non-COD
Is the information above correct? (yes/no):

If you have a new System Controller board, skip Step 6 and go to Step 7.

6. Compare the information collected in Step 1 with the information you have been prompted with in Step 5.

7. If you answer no to the question in Step 6 or if you are replacing both the ID board and the System Controller board at the same time, you will be prompted to enter the ID information manually.



Note - Enter this information carefully, as you have only one opportunity to do so. Use the information collected in Step 1 to answer the questions prompted for in CODE EXAMPLE 11-2. Be aware that you must specify the MAC address and Host ID of domain A (not the SC).




CODE EXAMPLE 11-2 ID Information to Enter Manually
Please enter System Serial Number: xxxxxxxx
Please enter the model number (3800/4800/4810/6800/E4900/E6900): xxx
MAC address for Domain A: xx:xx:xx:xx:xx:xx 
Host ID for Domain A: xxxxxxxx
Is COD (Capacity on Demand) system ? (yes/no): xx
Programming Replacement ID Board
Caching ID information

8. Complete Step 3 and Step 4 in To Power On the System.