A P P E N D I X  C

Troubleshooting and Operational Procedures

This appendix contains the following section:


Troubleshooting and Servicing the Array

The Sun StorageTek Common Array Manager Service Advisor provides array troubleshooting information and removal and replacement procedures for customer-replaceable units (CRUs). Service Advisor also includes field-replaceable units (FRUs) that can be replaced by Sun field engineers or by Sun-trained customer administrators.

To access Service Advisor:

1. Log in to Sun StorageTek Common Array Manager (CAM).

2. From the Storage System Summary page, click the name of the array you want to service.

3. At the top right of the page, click Service Advisor.

The Service Advisor opens in a new window.

4. Expand the Removal/Replacement procedure for the component you want to replace, and select the failed component.

The corresponding procedure is displayed in the right pane.

Service Advisor guides you through the following procedures:


Controller Tray Components

FIGURE C-1 describes the components of the controller tray with the front bezel removed.

FIGURE C-1 Sun Storage 6580 and 6780 Array Controller Tray (Front View)


Figure showing controller tray front view with component and LED locations.


Figure Legend

1

Interconnect battery canister

3

Power supply-fan LEDs

2

Power supply-fan assemblies

4

Interconnect battery LEDs


TABLE C-1 Controller Tray Front LED Descriptions

LED

Symbol

Location

Function

Power


Power-fan

Interconnect battery

  • On: The canister has power.
  • Off: The canister does not have power.

Note: The controller canisters do not have a power LED. They receive their power from the power supplies inside the power-fan canisters.

Battery Needs Attention


Interconnect battery

On: A problem exists with the battery.

Service Action Allowed


Power-fan

Interconnect battery

On: You can remove the canister safely.

See Service Action Allowed LED.

Service Action Required

(Fault)


Power-fan

Interconnect battery

On: A problem exits with the canister.

Locate


Interconnect battery

On: A tray is located.


Service Action Allowed LED

Each controller canister, power-fan canister, and interconnect-battery canister has a Service Action Allowed LED, which is a blue LED. The Service Action Allowed LED lets you know when you can remove a canister safely.



caution icon Caution - Possible loss of data access. Never remove a controller canister, a power-fan canister, or an interconnect-battery canister unless the Service Action Allowed LED is turned on.


If a controller canister or a power-fan canister fails and must be replaced, the Service Action Required (Fault) LED (an amber LED) on that canister comes on to indicate that service action is required. The Service Action Allowed LED also comes on if it is safe to remove the canister. If data availability dependencies exist or other conditions that dictate a canister should not be removed, the Service Action Allowed LED stays off.

The Service Action Allowed LED automatically comes on or goes off as conditions change. In most cases, the Service Action Allowed LED comes on when the Service Action Required (Fault) LED comes on for a canister.



Note - If the Service Action Required (Fault) LED comes on but the Service Action Allowed LED is off for a particular canister, you might need to service another canister first. Check your storage management software to determine the action that you should take.


Controller Tray Diagnostic Codes


TABLE C-2 Diagnostic Codes

Code

Description

L0

The controller types are mismatched.

L1

The interconnect-battery CRU is missing.

L2

A persistent memory error has occurred.

L3

A persistent hardware error has occurred.

L4

A persistent data protection error has occurred.

L5

The auto-code synchronization (ACS) has failed.

L6

An unsupported host interface card is installed.

L7

The sub-model identifier is not set or is mismatched.

L8

A memory configuration error has occurred.


Figure showing cable connections from controller tray to sixteen expansion trays.


About the Controller Tray ID Numeric Display and Diagnostic Display

The Sun Storage 6580 and 6780 controllers have a pair of 7-segment displays located at the back of the controller tray that form a 2-digit display. This section defines the indicators and what conditions they represent when activated.


TABLE C-3 FC 4Gb Host Card LED Link Rate Indicators

L1

L2

Definition

Off

Off

No connection or link down

On

Off

1 Gb link rate

Off

On

2 Gb link rate

On

On

4 Gb link rate


Each digit has a decimal point, and is rotated 180 degrees relative to the other digit as shown in FIGURE C-2. With this orientation, the display looks the same regardless of controller orientation.

The decimal point for the lower digit is defined as the Diagnostic Light. The decimal point for the upper digit is defined as the Heartbeat light.

FIGURE C-2 Tray ID Display


Illustration showing the upper and lower digits of the tray id display.

The values on each display (Controller A and Controller B) are shown as if the digits had the same orientation. For example, if the tray ID is set to 43, the top controller display might appear as shown in FIGURE C-3, while the bottom controller display would then appear as shown in FIGURE C-4.

FIGURE C-3 Controller A Tray ID Example


Illustration of controller A tray identifier.

FIGURE C-4 Controller B Tray ID Example


Illustration of conroller B tray identifier.

Alphanumeric characters are represented on the display as shown in FIGURE C-5. During normal operation, the tray ID display on each controller is used to display the enclosure tray ID. The display is also used for diagnostic codes. The Diagnostic Light indicates current usage. The Diagnostic Light is off when the display is used to show the current tray ID.

FIGURE C-5 Seven-Segment Alphanumeric Characters


Illustration how alpnumeric characters appear in the display.

The tray ID is an attribute of the enclosure. In other words, both controllers will always display the same tray ID. It is possible, however, that one controller may display the tray ID, while the other controller displays a diagnostic code.

Sequence Category Codes

TABLE C-4 defines the sequence category codes and their associated detail codes. Startup errors and operational states can be displayed in sequences by themselves. If the display is used to identify a component failure, information about the controller state in which the error was identified will also be displayed, as indicated in TABLE C-5.



Note - If the Sun Storage 6580 or 6780 controller module is powered on when the interconnect canister is missing, or if Controller B is inserted when the interconnect canister is missing, the values shown on the Controller B tray ID display will be inverted.



TABLE C-4 Seven-Segment Display Sequence Code Definitions

Category Code

Detail Codes

Category

(Notation described in the notes at the end of this table)

Startup Error

SE+

  • 88+ Power-on default
  • dF+ Power-on diagnostic fault

Operational Error

OE+

  • Lx+ Lock-down codes (Note 3)

Operational State

OS+

  • OL+ Offline (held in reset, Note 11)
  • bb+ Battery Backup (operating on batteries)
  • CF+ Component failure (Note 12)

Component Failure

CF+

  • dx+ Processor/Cache DIMM (x = location, Note 6)
  • Cx+ Cache DIMM (x = location, Note 7)
  • Px+ Processor DIMM (x = location, Note 8)
  • Hx+ Host card (x = location)
  • Fx+ Flash drive (x = location)

Category Delimiter

dash+

  • Separator between category-detail code pairs (Notes 4, 9)

End-of-Sequence Delimiter

blank-

  • End-of-sequence indicator (Notes 5, 10)

Note -

  1. xy+ 2-digit code with the Diagnostic light ON.
  2. xy- 2-digit code with the Diagnostic light OFF.
  3. Lx+ Lock-down codes (see Seven-Segment Display Lock-Down Codes).
  4. dash+ All segments off except for the middle segments and with the Diagnostic light ON.
  5. blank- All segments off with the Diagnostic light OFF.
  6. dx+ Used when there is a single memory system for processor and data cache.
  7. Cx+ Used when there are separate processor and data cache memory systems.
  8. Px+ Used when there are separate processor and data cache memory systems.
  9. Category-Detail separator used when there is more than one category-detail code pair in the sequence. See Table 38 for examples.
  10. End-of-Sequence indicator automatically inserted by hardware at the end of the sequence. Example: SE+ 88+ blank- (repeat)
  11. If a tray ID is being displayed, this sequence is programmed to display if the controller is subsequently held in reset.
  12. The tray ID is nominally displayed during normal operation. This operational state is displayed if an internal controller component failure occurs while the controller is online. An additional detail code identifies the failed component as defined for the Component Failure category. This sequence will continue to be displayed even if the controller is subsequently placed offline (held in reset) to service the failed component.

 


TABLE C-5 Seven-Segment Display Sequence Use Cases

Use Case

Repeating Sequence

Controller power-on

Normal power-on or controller insertion

SE+ 88+ blank-

Controller inserted while held in reset

SE+ 88+ blank-

Operational states

Normal operation

xy- (static controller tray ID)

Controller placed in reset while displaying tray ID

OS+ OL+ blank-

Controller is operating on batteries (cache backup)

OS+ bb+ blank-

Component failure when the controller is operational (Notes 1, 2)

Failed host card

OS+ CF+ Hx+ blank-

Failed flash drive

OS+ CF+ Fx+ blank-

Power-on diagnostic failure (Note 1)

Non-FRU component failure

SE+ dF+ blank-

Processor DIMM failure

SE+ dF+ dash+ CF+ Px+ blank-

Cache memory DIMM failure

SE+ dF+ dash+ CF+ Cx+ blank-

Processor/cache DIMM failure

SE+ dF+ dash+ CF+ dx+ blank-

Controller is suspended and there are no other errors to report

All lock-down conditions

OE+ Lx+ blank-

Controller is suspended due to component errors

Persistent processor DIMM ECC errors

OE+ L2+ dash+ CF+ Px+ blank-

Persistent cache DIMM ECC errors

OE+ L2+ dash+ CF+ Cx+ blank-

Persistent processor/cache DIMM ECC errors

OE+ L2+ dash+ CF+ dx+ blank-

Controller is suspended due to persistent cache backup configuration errors

Write-protect switch set during cache restore

OE+ LC+ blank-

Memory size changed with dirty data in flash drives

OE+ LC+ dd+ blank-

Note -

  1. If more than one component failure occurs, only the first component failure detected will be identified on the seven-segment display.
  2. If a component failure is indicated on the seven-segment display while the controller is operational, other event notification (MEL events, recovery guru procedures, etc.) that normally occurs for that condition will continue to occur.

Seven-Segment Display Lock-Down Codes

Diagnostic codes are used to indicate controller state information. In general, these codes are displayed only when the controller is in a non-operational state. The controller might be non-operational due to a configuration problem (such as mismatched controller types), or it might be non-operational due to a hardware fault. If the controller is non-operational due to system configuration, the controller Fault Light will be off. If the controller is non-operational due to a hardware fault, the controller Fault Light will be on.

TABLE C-6 provides a definition of the diagnostic lock-down codes. The code is displayed as a sequence.


TABLE C-6 Tray ID Display Diagnostic Codes

Value

Controller State

Description

L0

Suspended

Mismatched controller types

L1

Suspended

Missing interconnect canister

L2

Suspended

Persistent memory errors

L3

Suspended

Persistent hardware errors

L4

Suspended

Persistent data protection errors

L5

Suspended

ACS failure

L6

Suspended

Unsupported host card

L7

Suspended

Submodel identifier not set or mismatched

L8

Suspended

Memory configuration error

L9

Suspended

Link speed mismatch

LA

Suspended

Reserved

Lb

Suspended

Host card configuration error

LC

Suspended

Persistent cache backup configuration error

Ld

Suspended

Mixed cache memory DIMMs

LE

Suspended

Uncertified cache memory DIMM sizes

LF

Suspended

Lock-down with limited SYMbol support

LH

Suspended

Controller firmware mismatch



Expansion Tray LED Status Codes

The following is a list of the meanings of the status codes that may display on the numerical LEDs on the 6140 expansion trays.

FF - ESM Boot Diagnostic executing

88 - This ESM is being held in Reset by the other ESM

AA - ESM-A application is booting up

bb - ESM-B application is booting up

L0 - Mismatched ESM types

L2 - Persistent memory errors

L3 - Persistent hardware errors

L9 - Over Temperature

H1 - SFP Speed Mismatch (2 Gb/s SFP installed when operating at 4 Gb/s)

H2 - Invalid/Incomplete Configuration

H3 - Maximum Reboot Attempts Exceeded

H4 - Cannot Communicate with Other ESM

H5 - Midplane Harness Failure

H6 - Firmware Failure

H7 - Current Enclosure Fibre Channel Rate Different than Rate Switch

H8 - SFP(s) Present in Currently Unsupported Slot (2A or 2B)


Powering Off the Array

The array rarely needs to be powered off. You remove power only when you plan to physically move the array to another location.

To power off the array, do the following:

1. Stop all I/O from the hosts, if connected, to the array.

2. Wait approximately 2 minutes until all disk drive LEDs have stopped flashing.

After a 2-minute period, data residing in cache is written to disk and the battery mechanisms are disengaged.



Note - If Media Scan is enabled (default), the disk drive LEDs will continue to flash after the 2-minute period has elapsed. However, the LED flash rate during a media scan (slow, periodic blink) is different from the flash rate of I/O (fast, random).


3. Check the Cache Active LED on the controller to determine if any outstanding cache needs to be written.

If the LED is on, there is still data that needs to be flushed and written to disk.



Note - Ensure that the Cache Active LED is no longer flashing before powering off the array.


4. Press each power switch at the back of the controller tray to the Off position.

5. Press the power switches at the back of each expansion tray to the Off position.