C H A P T E R  2

Diagnostics

This chapter describes the diagnostics that are available for monitoring and troubleshooting the Sun Blade T6340 server module. This chapter is intended for technicians, service personnel, and system administrators who service and repair computer systems.

The following topics are covered:


2.1 Sun Blade T6340 Server Module Diagnostics Overview

There are a variety of diagnostic tools, commands, and indicators you can use to monitor and troubleshoot a Sun Blade T6340 server module.

The LEDs, ILOM, Solaris OS PSH, and many of the log files and console messages are integrated. For example, when the Solaris software detects a fault, it will display the fault, log it, pass information to ILOM where the fault is logged, and depending on the fault, one or more LEDs might be illuminated.

The diagnostic flowchart in FIGURE 2-1 and TABLE 2-1 describes an approach for using the server module diagnostics to identify a faulty field-replaceable unit (FRU). The diagnostics you use, and the order in which you use them, depend on the nature of the problem you are troubleshooting, so you might perform some actions and not others.

Use this flowchart to understand what diagnostics are available to troubleshoot faulty hardware, and use TABLE 2-1 to find more information about each diagnostic in this chapter.

FIGURE 2-1 Diagnostic Flowchart


Figure shows the diagnostic flowchart.

 


TABLE 2-1 Diagnostic Flowchart Actions

Action No.

Diagnostic Action

Resulting Action

For more information, see these sections

1.

Check the OK LED.

The OK LED is located on the front of the Sun Blade T6340 server module.

If the LED is not lit, check that the blade is properly connected and the chassis has power.

Section 2.3, Interpreting System LEDs

2.

Type the ILOM show faulty

command to check for faults.

The faultmgmt command displays the following types of faults:

  • Environmental faults
  • Solaris Predictive Self-Healing (PSH) detected faults
  • POST detected faults

Faulty FRUs are identified in fault messages using the FRU name. For a list of FRU names, see TABLE 1-3.

Section 2.5.1, Displaying System Faults

3.

Check the Solaris log files for fault information.

The Solaris message buffer and log files record system events and provide information about faults.

  • If system messages indicate a faulty device, replace the FRU.
  • To obtain more diagnostic information, go to Action 4.

Section 2.8, Collecting Information From Solaris OS Files and Commands

4.

Run the SunVTS software.

SunVTS can exercise and diagnose FRUs. To run SunVTS, the server module must be running the Solaris OS.

  • If SunVTS reports a faulty device replace the FRU.
  • If SunVTS does not report a faulty device, go to Action 5.

Section 2.10, Exercising the System With SunVTS

5.

Run POST.

POST performs basic tests of the server module components and reports faulty FRUs.

  • If POST indicates a faulty FRU, replace the FRU.
  • If POST does not indicate a faulty FRU, go to Action 9.

Section 2.6, Running POST

6.

Determine if the fault is an environmental fault.

If the fault listed by the show faulty
command displays a temperature or voltage fault, then the fault is an environmental fault. Environmental faults can be caused by faulty FRUs (chassis power supply, fan, or blower) or by environmental conditions such as high ambient temperature, or blocked airflow.

Section 2.5.1, Displaying System Faults

 

7.

Determine if the fault was detected by PSH.

If the fault message displays the following text, the fault was detected by the Solaris Predictive Self-Healing software:
Host detected fault

If the fault is a PSH detected fault, identify the faulty FRU from the fault message and replace the faulty FRU.

After the FRU is replaced, perform the procedure to clear PSH detected faults.

Section 2.7, Using the Solaris Predictive Self-Healing Feature

 

Section 4.2, Common Procedures for Parts Replacement

 

Section 2.7.2, Clearing PSH Detected Faults

 

Section 2.7.3, Clearing the PSH Fault From the ILOM Logs

8.

Determine if the fault was detected by POST.

POST performs basic tests of the server module components and reports faulty FRUs. When POST detects a faulty FRU, it logs the fault and if possible takes the FRU offline. POST detected FRUs display the following text in the fault message:

FRU-name deemed faulty and disabled

In this case, replace the FRU and run the procedure to clear POST detected faults.

Section 2.6, Running POST

 

Section 4.2, Common Procedures for Parts Replacement

 

Section 2.6.4, Clearing POST Detected Faults

9.

Contact Sun for support.

The majority of hardware faults are detected by the server module diagnostics. In rare cases it is possible that a problem requires additional troubleshooting. If you are unable to determine the cause of the problem, contact Sun for support.

Sun Support information:
http://www.sun.com/
support

Section 1.3, Finding the Serial Number



2.2 Memory Configuration and Fault Handling

This section describes how the memory is configured and how the server module deals with memory faults.

2.2.1 Memory Configuration

The Sun Blade T6340 server module has 32 connectors (slots) that hold fully-buffered DIMMs (FB-DIMMs) in the following FB-DIMM capacities:

The Sun Blade T6340 server module performs best if all 32 connectors are populated with 32 identical DIMMs. This configuration also enables the system to continue operating even when a DIMM fails, or if an entire channel fails.



Note - All installed FB-DIMMs will be seen by the system as having the capacity of the smallest installed FB-DIMM.


For example, suppose that you have installed 32 8-Gbyte FB-DIMMs for a total of 256 Gbytes of memory. If you were to replace one of those 8-Gbyte FB-DIMMs with a functioning 1-Gbyte FB-DIMM, the system will now treat all installed FB-DIMMs as 1 Gbyte FB-DIMMs and thus see only 32 Gbytes of installed memory, .

2.2.1.1 FB-DIMM Installation Rules



caution icon Caution - The following FB-DIMM rules must be followed. The server module might not operate correctly if the FB-DIMM rules are not followed. Always use FB-DIMMs that have been qualified by Sun.


Use these FB-DIMM configuration rules to help you plan the memory configuration of your server:

See Section 4.3.1, Removing the DIMMs for DIMM installation instructions.

FIGURE 2-2 DIMM Installation Rules


Figure shows motherboard, the DIMM locate button, and the DIMM ejector levers.

You can also use TABLE 2-2 to identify the DIMMs you want to remove.


TABLE 2-2 FB-DIMM Configuration and Installation

CPU #

Branch Name

Channel Name

FRU Name in ILOM Messages

Motherboard FB-DIMM Connector

FB-DIMM Installation Order[1]

CMP 0

Branch 0

Channel 0

/SYS/MB/CMP0/BR0/CH0/D0

J0501

8

 

 

/SYS/MB/CMP0/BR0/CH0/D1

J0601

16

 

 

 

/SYS/MB/CMP0/BR0/CH0/D2

J0701

24

 

 

/SYS/MB/CMP0/BR0/CH0/D3

J0801

24

 

 

Channel 1

/SYS/MB/CMP0/BR0/CH1/D0

J0901

8

 

 

 

/SYS/MB/CMP0/BR0/CH1/D1

J1001

16

 

 

 

/SYS/MB/CMP0/BR0/CH1/D2

J1101

24

 

 

 

/SYS/MB/CMP0/BR0/CH1/D3

J1201

24

 

Branch 1

Channel 0

/SYS/MB/CMP0/BR1/CH0/D0

J1301

8

 

 

/SYS/MB/CMP0/BR1/CH0/D1

J1401

16

 

 

 

/SYS/MB/CMP0/BR1/CH0/D2

J1501

24

 

 

/SYS/MB/CMP0/BR1/CH0/D3

J1601

24

 

 

Channel 1

/SYS/MB/CMP0/BR1/CH1/D0

J1701

8

 

 

 

/SYS/MB/CMP0/BR1/CH1/D1

J1801

16

 

 

 

/SYS/MB/CMP0/BR1/CH1/D2

J1901

24

 

 

 

/SYS/MB/CMP0/BR1/CH1/D3

J2001

24

CMP 1

Branch 0

Channel 0

/SYS/MB/CMP1/BR0/CH0/D0

J2401

8

 

 

/SYS/MB/CMP1/BR0/CH0/D1

J2501

16

 

 

 

/SYS/MB/CMP1/BR0/CH0/D2

J2601

32

 

 

/SYS/MB/CMP1/BR0/CH0/D3

J2701

32

 

 

Channel 1

/SYS/MB/CMP1/BR0/CH1/D0

J2801

8

 

 

 

/SYS/MB/CMP1/BR0/CH1/D1

J2901

16

 

 

 

/SYS/MB/CMP1/BR0/CH1/D2

J2601

32

 

 

 

/SYS/MB/CMP1/BR0/CH1/D3

J2701

32

 

Branch 1

Channel 0

/SYS/MB/CMP1/BR1/CH0/D0

J3201

8

 

 

/SYS/MB/CMP1/BR1/CH0/D1

J3301

16

 

 

 

/SYS/MB/CMP1/BR1/CH0/D2

J3401

32

 

 

/SYS/MB/CMP1/BR1/CH0/D3

J3501

32

 

 

Channel 1

/SYS/MB/CMP1/BR1/CH1/D0

J3601

8

 

 

 

/SYS/MB/CMP1/BR1/CH1/D1

J3701

16

 

 

 

/SYS/MB/CMP1/BR1/CH1/D2

J3801

32

 

 

 

/SYS/MB/CMP1/BR1/CH1/D3

J3901

32


2.2.1.2 Memory Fault Handling

The Sun Blade T6340 server module uses advanced ECC technology, also called chipkill, that corrects up to 4-bits in error on nibble boundaries, as long as they are all in the same DRAM. If a DRAM fails, the DIMM continues to function.



Note - The chipkill function is only supported on DIMMs that use “x4” DRAMs.


The following server module features manage memory faults independently.

If a memory fault is detected, POST displays the fault with the FRU name of the faulty DIMMs, logs the fault, and disables the faulty DIMMs by placing them in the Automatic System Recovery (ASR) blacklist. For a given memory fault, POST disables half of the physical memory in the system. When this occurs, you must replace the faulty DIMMs based on the fault message and enable the disabled DIMMs with the ILOM command set /SYS/component component_state=enabled .

2.2.1.3 Troubleshooting Memory Faults

If you suspect that the server module has a memory problem, follow the flowchart (FIGURE 2-1). Type the ILOM command: show faulty . The faultmgmt command lists memory faults and lists the specific DIMMs that are associated with the fault. Once you have identified which DIMMs to replace, see Chapter 4 for DIMM removal and replacement instructions. You must perform the instructions in that chapter to clear the faults and enable the replaced DIMMs.


2.3 Interpreting System LEDs

The Sun Blade T6340 server module has LEDs on the front panel and the hard drives. The behavior of LEDs on your server module conforms to the American National Standards Institute (ANSI) Status Indicator Standard (SIS). These standard LED behaviors are described in TABLE 2-3.

2.3.1 Front Panel LEDs and Buttons

The front panel LEDs and buttons are located in the center of the server module (TABLE 2-4, and TABLE 2-5). The functions of their respective devices are displayed as follows:


TABLE 2-3 LED Behavior and Meaning

LED Behavior

Meaning

Off

The condition represented by the color is not true.

Steady on

The condition represented by the color is true.

Standby blink

The system is functioning at a minimal level and ready to resume full function.

Slow blink

Transitory activity or new activity represented by the color that is taking place.

Fast blink

Attention is required.

Feedback flash

Activity is taking place commensurate with the flash rate (such as disk drive activity).


The front panel LEDs on the Sun Blade T6340 are shown in FIGURE 2-3:

 

FIGURE 2-3 Front Panel and Hard Drive LEDs


Illustration of Front Panel and Hard Drive LEDs


Figure Legend

1

White Locator LED

7

Universal Connector Port (UCP)

2

Blue Ready to Remove LED

8

Green Drive OK LED

3

Amber Service Action Required LED

9

Amber Drive Service Action Required LED

4

Green OK LED

10

Blue Drive Ready to Remove LED

5

Power Button

11

Chassis power connector

6

Reset Button (for service use only)

12

Chassis data connector


 


TABLE 2-4 LED Behaviors With Assigned Meanings

Color

Behavior

Definition

Description, Actions, and ILOM Commands

White

Off

Steady state

 

 

Fast blink

4 Hz repeating sequence, equal intervals On and Off.

This indicator helps you to locate a particular enclosure, board, or subsystem (for example, the Locator LED). The LED is activated using one of the following methods:

  • Press the button to toggle the indicator on or off, or
    type the ILOM command:
    set /SYS/LOCATE value=Off

This LED provides the following indications:

 

  • Off- Normal operating state.

Fast blink - The server module received a signal as a result of one of the preceding methods and indicats that the server module is active.

  • Type the ILOM command:
    set /SYS/LOCATE value=Fast_Blink

Blue

Off

Steady state

Steady state - If LED is off, it is not safe to remove the server module from the chassis. You must use software to take the component offline or shut down the server. To turn off the blue LED, type:
set /SYS return_to_service_action=true

 

Steady on

Steady state

If the blue LED is on, a service action can be safely performed on the component.

To remove a server module (and illuminate the blue LED), type:

set /SYS prepare_to_remove_action=true

To remove a hard drive, use the Solaris cfgadm command

Amber

Off

Steady state

 

 

Steady on

Steady state

This indicator signals the existence of a fault condition. Service is required (for example, the Service Required LED). The ILOM show faulty command provides details about any faults that cause this indicator to be lit. To turn off an amber LED, either fix the fault condition or mark the fault condition fixed.

Green

Off

Steady state

Off - The system is unavailable. Either it has no power or ILOM is not running.

 

Standby blink

Repeating sequence consisting of a brief (0.1 sec.) on flash followed by a long off period (2.9 sec.)

The system is running at a minimum level and is ready to be quickly revived to full function (for example, the System Activity LED).

 

Steady on

Steady state

Status normal; system or component functioning with no service actions required.

 

Slow blink

 

A transitory (temporary) event is taking place for which direct proportional feedback is not needed or not feasible.

ILOM is enabled but the server module is not fully powered on. Indicates that the service processor is running while the system is running at a minimum level in standby mode and ready to be returned to its normal operating state.


2.3.2 Power and Reset Buttons


TABLE 2-5 Front Panel Buttons

Button

Color

Description

Power button

gray

Turns the host system on and off. Use a non-conductive stylus to completely press this button.

(reset)

gray

This button causes a reset of the Service Processor.


For information about Ethernet LEDs see the service manual for your modular system chassis or ethernet device at:
http://docs.sun.com/app/docs/prod/blade.6000mod


2.4 Using ILOM for Diagnosis and Repair Verification

The Oracle Integrated Lights Out Manager (ILOM) is contained on firmware on the service processor in the Sun Blade T6340 server module. ILOM enables you to remotely manage and administer your server module.



Note - ILOM also contains an ALOM-CMT compatibility shell. For more information about ALOM-CMT compatibility see the Sun Integrated Lights Out Manager 2.0 Supplement for Sun Blade T6340 Server Modules, 820-3904. Appendix G of this service manual also provides some information about the ALOM CMT CLI.


ILOM enables you to run remote diagnostics such as power-on self-test (POST), that would otherwise require physical proximity to the server module serial port. You can also configure ILOM to send email alerts of hardware failures, hardware warnings, and other events related to the server module or to ILOM.

The ILOM circuitry runs independently of the server module, using the server module standby power. Therefore, ILOM firmware and software continue to function when the server module operating system goes offline or when the server module is powered off.

Faults detected by ILOM, POST, and the Solaris Predictive Self-healing (PSH) technology are forwarded to ILOM for fault handling (FIGURE 2-4).

In the event of a system fault, ILOM ensures that the Service Action Required LED is lit, FRU ID PROMs are updated, the fault is logged, and alerts are displayed (faulty FRUs are identified in fault messages using the FRU name. For a list of FRU names, see TABLE 1-3).

FIGURE 2-4 ILOM Fault Management


Figure shows environmentals, POST, Solaris PSH routed through ILOM fault manager to produce results in LEDs, FRUID PROMs, Logs, and alerts.

In ILOM you can view the ILOM logs to see alerts. FIGURE 2-5 is a sample of the ILOM web interface. Using the CLI you can type the show /SP/logs/event/list/ command.

FIGURE 2-5 Sample Event Log in ILOM Web Interface


Figure shows a sample ILOM event log.

ILOM can detect when a fault is no longer present and clears the fault in several ways:

Many environmental faults can automatically recover. For example, a temperature that is exceeding a threshold might return to normal limits when you connect a fan. The recovery of environmental faults is automatically detected. Recovery events are reported using one of two forms:

There are three thresholds for an environmental fault:

Environmental faults can be repaired through hot removal of the faulty FRU. The FRU removal is automatically detected by the environmental monitoring and all faults associated with the removed FRU are cleared. The message for that case, and the alert sent for all FRU removals is:

fru at location has been removed.


2.5 Using the ILOM Web Interface For Diagnostics

These instructions use the ILOM web interface. To use the command line interface (CLI), see Appendix G of this manual, the ILOM documentation collection.

1. Connect to the ILOM web interface by typing the IP address for the Sun Blade T6340 server module service processor in a web browser.

If you do not know the IP address for the server module, you can obtain the service processor IP address from the following:

2. Type the username and password to access the diagnostics menus in the ILOM web interface. The default user name is root, and the default password is changeme.

FIGURE 2-6 ILOM Login Screen


ILOM Login Screen example

2.5.1 Displaying System Faults

ILOM displays the following faults with the web interface and CLI:

Use the web interface or type the show faulty command for the following reasons:

2.5.1.1 Viewing Fault Status Using the ILOM Web Interface

In the ILOM web interface, you can view the system components currently in a fault state using the Fault Management page.

FIGURE 2-7 Fault Management Page Example


Fault Management Page Example

The Fault Management page lists faulted components by ID, FRU, and TimeStamp. You can access additional information about the faulted component by clicking the faulted component ID. For example, if you clicked the faulted component ID, 0 SYS/MB/, a dialog window similar to the following one appears, displaying additional details about the faulted component.

FIGURE 2-8 Faulted Component ID Window


Figure shows the Fault Properties Dialog screen.

Alternatively, in the ILOM web interface, you can identify the fault status of a component on the Component Management page.

FIGURE 2-9 Component Management Page - Fault Status


Figure shows the Component Management Page and Fault Status screen.

2.5.1.2 Viewing Fault Status Using the ILOM CLI

In the ILOM CLI, you can view the fault status of component(s) by using the show command. For example:

->show faulty

2.5.2 Displaying the Environmental Status with the ILOM CLI

The ILOM show command displays a snapshot of the server module environmental status. This command displays system temperatures, hard drive status, power supply and fan status, front panel LED status, voltage, and current sensors. The output uses a format similar to the Solaris OS command prtdiag (1M).

At the -> prompt, type the show command.

The output differs according to your system model and configuration.


-> show /SYS/MB/V_+12V
 
 /SYS/MB/V_+12V
    Targets:
 
    Properties:
        type = Voltage
        class = Threshold Sensor
        value = 12.411 Volts
        upper_nonrecov_threshold = 13.23 Volts
        upper_critical_threshold = 13.10 Volts
        upper_noncritical_threshold = 12.85 Volts
        lower_noncritical_threshold = 11.15 Volts
        lower_critical_threshold = 10.90 Volts
        lower_nonrecov_threshold = 10.77 Volts
 
    Commands:
        cd
        show



Note - Some environmental information might not be available when the server module is in standby mode.


2.5.3 Displaying the Environmental Status and Sensor Readings with the ILOM Web Interface

1. Open a web browser and type the IP address of the server module service processor in the browser.

2. Select the top System Monitoring tab and the lower Sensor Readings tab (FIGURE 2-10).

3. Click on the sensor reading that you want to check (FIGURE 2-10).

FIGURE 2-10 Obtaining Sensor Readings and Environmental Status With the ILOM Web Interface


Figure shows the System Monitoring and Sensor Readings tabs selected in the window.

FIGURE 2-11 Sensor Reading Window for an FB-DIMM in Channel 1


Figure shows the sensor reading window.

2.5.4 Displaying FRU Information

ILOM can display static FRU information such as the FRU manufacturer, serial number and some FRU status information (FIGURE 2-12).



Note - To view dynamic FRU information you must type the ALOM CMT showfru command. The dynamic FRU information provides more details about FRUs.


2.5.4.1 Using the ILOM Web Interface to Display FRU Information

1. Select the System Information and Components tabs.

2. Click on the component to view the FRU information (FIGURE 2-12).

FIGURE 2-12 Static FRU Information in the ILOM Web Interface


Figure shows the static FRU information displayed in an ILOM window.

2.5.4.2 Using the CLI to Display FRU Information

The show /SYS/MB command displays static information about the FRUs in the server module. Use this command to see information about an individual FRU.

single-step bullet  At the -> prompt, type the show command.

In the following example, the show command displays information about the motherboard (MB).


-> show /SYS/MB
 
/SYS/MB
    Targets:
        SEEPROM
        SCC_NVRAM
        PCIE0
        PCIE1
        PCI-SWITCH0
        PCI-SWITCH1
        REM
        NET0
CMP0
        CMP1
        V_VDDIO
        V_+12V
        V_+3V3
        V_+3V3_STBY
        V_+5V
 
    Properties:
        type = Motherboard
        chassis_name = SUN BLADE 6000 MODULAR SYSTEM
        chassis_part_number = "541-1983-0
        chassis_serial_number = "1005LCB-0804YM04XE
        chassis_manufacturer = SUN MICROSYSTEMS
        product_name = Sun Blade T6340 Server Module
        product_part_number = 541-3299-02
        product_serial_number = 1005LCB-08268N0008
        product_manufacturer = SUN MICROSYSTEMS
        fru_name = T6340_MB
        fru_description = 8C,1.2GHZ VF,T6340,DIRECT-A
        fru_manufacturer = NO JEDEC CODE FOR THIS VENDOR
        fru_version = 02_01
        fru_part_number = 5407762
        fru_serial_number = 8J0010
        fault_state = OK
        clear_fault_action = (none)
 
    Commands:
        cd
        show
->

This example shows a portion of the more detailed dynamic FRU information provided by the ALOM CMT showfru command.


sc> showfru
/SYS/SP (container)
   SEGMENT: ST
      /Status_CurrentR
      /Status_CurrentR/UNIX_Timestamp32: Thu Feb 17 07:25:57 2000
      /Status_CurrentR/status:           0x00 (OK)
   SEGMENT: TH ...
... ... ...
   SEGMENT: FD
      /Customer_DataR
      /Customer_DataR/UNIX_Timestamp32: Wed Feb 16 08:41:44 GMT 2000
      /Customer_DataR/Cust_Data: QT
      /InstallationR (1 iterations)
      /InstallationR[0]
      /InstallationR[0]/UNIX_Timestamp32: Thu Feb 17 07:26:09 GMT 2000
      /InstallationR[0]/Fru_Path: /SYS/MB/REM
      /InstallationR[0]/Parent_Part_Number: 5017821
      /InstallationR[0]/Parent_Serial_Number: 5C00FV
      /InstallationR[0]/Parent_Dash_Level: 04
      /InstallationR[0]/System_Id: 1005LCB-0709YM00FV
      /InstallationR[0]/System_Tz: 0
      /InstallationR[0]/Geo_North: 0
      /InstallationR[0]/Geo_East: 0
      /InstallationR[0]/Geo_Alt: 0
      /InstallationR[0]/Geo_Location: GMT
 ... ... ...
/SYS/MB/CMP0/BR0/CH0/D0 (container)
        /SPD/Timestamp: Mon Feb 12 12:00:00 2007
        /SPD/Description: DDR2 SDRAM FB-DIMM, 4 GByte
        /SPD/Manufacture Location: ff
        /SPD/AMB Vendor: IDT
        /SPD/Vendor: Micron Technology
        /SPD/Vendor Part No:   36HTF51272F667E1D4
        /SPD/Vendor Serial No: d2174043
        /SPD/Num_Banks: 8
        /SPD/Num_Ranks: 2
        /SPD/Num_Rows: 14
        /SPD/Num_Cols: 11
        /SPD/Sdram_Width: 4
        /SunSPD/Sun_Serial_Number:   002C010707D2174043
        /SunSPD/SPD_Format_Version:  20
        /SunSPD/Sun_Part_Dash_Rev:   000-0000-00 Rev 00
        /SunSPD/Certified_Platforms: 0x00000001 (OK)
        /SunSPD/Sun_Key_Code:        0x0000
        /SunSPD/Sun_Certification:   NO
        /SunSPD/timestamp:           Thu Feb 17 07:26:20 2000
        /SunSPD/MACADDR:             00:14:4F:98:84:7A
        /SunSPD/status               0x00 (OK)
        /SunSPD/Initiator            N/A
        /SunSPD/Message:             No message
        /SunSPD/powerupdate:         Thu Feb 17 07:01:16 2000
        /SunSPD/Poweron_minutes:     1487
/SYS/MB/CMP0/BR1/CH0/D0 (container)
 ... ... ...
sc>


2.6 Running POST

Use POST to test and verify server module hardware. Power-on self-test (POST) is a group of PROM-based tests that run when the server module is powered on or reset. POST checks the basic integrity of the critical hardware components in the server module (CPU, memory, and I/O buses).

If POST detects a faulty component, the component is disabled automatically, preventing faulty hardware from potentially harming any software. If the system is capable of running without the disabled component, the system will boot when POST is complete. For example, if one of the processor cores is deemed faulty by POST, that core will be disabled, and the system will boot and run using the remaining cores.

You can use POST as an initial diagnostic tool for the system hardware. In this case, configure POST to run in diagnostic service mode for maximum test coverage and verbose output.



Note - Devices can be manually enabled or disabled using ASR commands (see Section 2.9, Managing Components With Automatic System Recovery Commands).


2.6.1 Controlling How POST Runs

The server module can be configured for normal, extensive, or no POST execution. You can also control the level of tests that run, the amount of POST output that is displayed, and which reset events trigger POST by using diag variables.

TABLE 2-6 lists the DIAG variables used to configure POST and FIGURE 2-13 shows how the variables work together.


TABLE 2-6 Parameters Used For POST Configuration

Parameter

Values

Description

/SYS keyswitch_state

normal

The system can power on and run POST (based on the other parameter settings). For details see FIGURE 2-13. This parameter overrides all other commands.

 

diag

The system runs POST based on predetermined settings.

 

stby

The system cannot power on.

 

locked

The system can power on and run POST, but no flash updates can be made.

diag_mode

off

POST does not run.

 

normal

Runs POST according to diag_level value.

 

service

Runs POST with preset values for diag_level and diag_verbosity.

diag_level

min

If diag_mode = normal, runs minimum set of tests.

 

max

If diag_mode = normal, runs all the minimum tests plus extensive CPU and memory tests.

diag_trigger

none

Does not run POST on reset or poweron.

 

user-reset

Runs POST upon user-initiated resets.

 

power-on-reset

Only runs POST for the first power on.

Default state is ‘power-on-reset error-reset’

 

error-reset

Runs POST if fatal errors are detected.

 

all-resets

Runs POST after any reset.

diag_verbosity

none

No POST output is displayed.

 

min

POST output displays functional tests with a banner and pinwheel.

 

normal

POST output displays all test and informational messages.

 

max

POST displays all test, informational, and some debugging messages.


FIGURE 2-13 Flowchart of ILOM Variables for POST Configuration


Figure shows POST flow chart.

TABLE 2-7 shows typical combinations of ILOM variables and associated POST modes.


TABLE 2-7 POST Modes and Parameter Settings

Parameter

Normal Diagnostic Mode

(default settings)

No POST Execution

Diagnostic Service Mode

Keyswitch Diagnostic Preset Values

diag_mode

normal

off

service

normal

keyswitch_state[2]

normal

normal

normal

diag

diag_level

min

n/a

max

max

diag_trigger

power-on-reset error-reset

none

all-resets

all-resets

diag_verbosity

normal

n/a

max

max

Description of POST execution

This is the default POST configuration. This configuration tests the system thoroughly, and suppresses some of the detailed POST output.

POST does not run, resulting in quick system initialization, but this is not a suggested configuration.

POST runs the full spectrum of tests with the maximum output displayed.

POST runs the full spectrum of tests with the maximum output displayed.


2.6.2 Changing POST Parameters

You can use the web interface or the CLI to change the POST parameters.

2.6.2.1 Using the Web Interface to Change POST Parameters

1. From the ILOM web interface, select the Remote Console tab (FIGURE 2-14).

2. Select the Diagnostics Tab.

3. Select the POST settings that you require.

TABLE 2-7 describes how the POST settings will execute.

4. Click the Save button.



Note - If you do not have a console window open, you should open one. POST will only display output to a console window, not the web interface.


FIGURE 2-14 Setting POST Parameters With the ILOM Web Interface


Figure shows the Remote Control and Diagnostics tabs selected in an ILOM window.

5. Select the Remote Power Control Tab.

6. Select a power control setting and Select Save (FIGURE 2-15).

FIGURE 2-15 Changing Power Settings With the ILOM Web Interface


Figure shows the Remote Control and Remote Server Control tabs selected in an ILOM window.

When you power cycle the server module, POST runs and displays output to the service processor console window:


{0} ok Chassis | critical: Host has been powered off
Chassis | major: Host has been powered on
2007-11-07 18:22:19.511 0:0:0>
2007-11-07 18:22:19.560 0:0:0>Sun Blade T6320 Server Module POST 4.27.4 2007/10/02 19:09 
       /export/delivery/delivery/4.27/4.27.4/post4.27.x/Niagara/glendale/integrated  (root)  
2007-11-07 18:22:19.836 0:0:0>Copyright 2007 Sun Microsystems, Inc. All rights reserved
2007-11-07 18:22:20.001 0:0:0>VBSC cmp 0 arg is: 00ffffff.ffff00ff
2007-11-07 18:22:20.108 0:0:0>POST enabling threads: 00ffffff.ffff00ff
2007-11-07 18:22:20.223 0:0:0>VBSC mode is: 00000000.00000001
2007-11-07 18:22:20.321 0:0:0>VBSC level is: 00000000.00000001
2007-11-07 18:22:20.421 0:0:0>VBSC selecting Normal mode, MAX Testing.
2007-11-07 18:22:20.533 0:0:0>VBSC setting verbosity level 3
2007-11-07 18:22:20.629 0:0:0>  Niagara2, Version 2.1
2007-11-07 18:22:20.714 0:0:0>  Serial Number: 0f880060.768660a8
2007-11-07 18:22:20.843 0:0:0>Basic Memory Tests.....

7. Read the POST output to determine if you need to perform service actions.

See Section 2.6.3, Interpreting POST Messages.

2.6.2.2 Using the CLI to Change POST Parameters

1. Verify the current post parameters with the show command. Type:


-> show /HOST/diag
 
/HOST/diag
    Targets:
 
    Properties:
        level = min
        mode = normal
        trigger = power-on-reset error-reset
        verbosity = normal
 
    Commands:
        cd
        set
        show
-> 

2. Type the set command to change the POST parameters.

TABLE 2-7 describes how the POST settings will execute. This example shows how to set the verbosity to max.


-> set /HOST/diag verbosity=max
Set ’verbosity’ to ’max’
-> 

3. Power cycle the server module to run POST.

There are several ways to initiate a reset. The following example uses the ILOM reset command.


-> reset /SYS
Are you sure you want to reset /SYS (y/n)? y
Performing hard reset on /SYS
-> 
 

4. Read the POST output to determine if you need to perform service actions. See Section 2.6.3, Interpreting POST Messages.

2.6.3 Interpreting POST Messages

When POST is finished running and no faults were detected, the system will boot.

If POST detects a faulty device, the fault is displayed and the fault information is passed to ILOM for fault handling. Faulty FRUs are identified in fault messages using the FRU name. For a list of FRU names, see TABLE 1-3.

1. Interpret the POST messages:

POST error messages use the following syntax:

c:s > ERROR: TEST = failing-test
c:s > H/W under test = FRU
c
:s > Repair Instructions: Replace items in order listed by H/W under test above
c:s > MSG = test-error-message
c
:s > END_ERROR

In this syntax, c = the core number, s = the strand number.

Warning and informational messages use the following syntax:

INFO or WARNING: message

The following example shows a POST error message report for a missing PCI device:


0:0:0>ERROR: TEST = PIU PCI id test
0:0:0>H/W under test = MB/PCI-SWITCH
0:0:0>Repair Instructions: Replace items in order listed by ‘H/W under test’ above.
0:0:0>MSG = PCI ID test device missing Cont. 
					DEVICE NAME: MB/PCI-SWITCH
0:0:0>END_ERROR

2. Type the show faulty command to obtain additional fault information.

The fault is captured by ILOM, where the fault is logged. The Service Action Required LED is lit, and the faulty component is disabled.

For example:


ok #.
->
-> show faulty
 
Target              | Property               | Value
--------------------+------------------------+----------------------------
/SP/faultmgmt/0      | fru                    | /SYS/MB/CMP0/BR0/CH0/D0
/SP/faultmgmt/0     | timestamp              | Sep 12 05:02:52
/SP/faultmgmt/0/    | timestamp              | Sep 12 05:02:52
 faults/0           |                        |
/SP/faultmgmt/0/    | sp_detected_fault      | /SYS/MB/CMP0/BR0/CH0/D0
 faults/0           |                        | Disabled by user
 
    Commands:
        cd
        show

In this example, /SYS/MB/CMP0/BR0/CH0/D0 is disabled by a user. The system can boot using memory that was not disabled until the faulty component is replaced.



Note - You can use ASR commands to display and control disabled components. See Section 2.9, Managing Components With Automatic System Recovery Commands.


2.6.4 Clearing POST Detected Faults

In most cases, when POST detects a faulty component, POST logs the fault and automatically takes the failed component out of operation by placing the component in the ASR blacklist.

See Section 2.9, Managing Components With Automatic System Recovery Commands).

After the faulty FRU is replaced, the fault is normally automatically cleared. In some cases it might be necessary to manually clear the fault by removing the component from the ASR blacklist.

2.6.4.1 Clearing Faults With the Web Interface

This procedure describes how to enable components after a POST fault has been generated. The POST fault log is not actually cleared.

1. Select the tabs: System Information and Components tabs (FIGURE 2-16).

2. Select the radio button for the component that you must clear.

3. In the Actions menu, select: Enable Component.

FIGURE 2-16 Enabling Components With the ILOM Web Interface


Figure shows the component management window.



Note - The Clear Faults command in the Action menu will only clear the PSH-generated faults, and will not enable a component.


2.6.4.2 Clearing Faults With the ILOM CLI

1. At the ILOM prompt, type the show faulty command to identify POST detected faults.

POST detected faults are distinguished from other faults by the text:
deemed faulty and disabled, and no UUID number is reported.

For example:

-> show faulty

2. Type the set component_state=enabled command to clear the fault and remove the component from the ASR blacklist.

Type the cd command with the FRU name that was reported in the fault in the previous step.

This example shows how to change directory to thread P32 on the CPU and enable it.


-> cd /SYS/MB/CMP0/P32
/SYS/MB/CMP0/P32
 
-> show
 
 /SYS/MB/CMP0/P32
    Targets:
 
    Properties:
        type = CPU thread
        component_state = Disabled
 
    Commands:
        cd
        show
 
-> set component_state=enabled
Set ’component_state’ to ’enabled’

The fault is cleared and should not show up when you type the show faulty command. Additionally, the Service Action Required LED is no longer illuminated.

3. Reboot the server module.

You must reboot the server module for the enablecomponent command to take effect.

4. At the ILOM prompt, type the show faulty command to verify that no faults are reported.


-> show faulty
Last POST run: THU MAR 09 16:52:44 2006
POST status: Passed all devices
 
No failures found in System

2.6.4.3 Clearing Faults Manually with ILOM

The ILOM set /SYS/clear_fault_action=enabled command allows you to manually clear certain types of faults without replacing a FRU. It also allows you to clear a fault if ILOM was unable to automatically detect the FRU replacement.

2.6.4.4 Clearing Hard Drive Faults

ILOM can detect hard drive replacement. However, to configure and unconfigure a hard drive, you must type the Solaris cfgadm command. See Section 3.1, Hot-Plugging a Hard Drive. ILOM does not handle hard drive faults. Use the Solaris message files to view hard drive faults. See Section 2.8, Collecting Information From Solaris OS Files and Commands.


2.7 Using the Solaris Predictive Self-Healing Feature

The Solaris Predictive Self-Healing (PSH) technology enables the Sun Blade T6340 server module to diagnose problems while the Solaris OS is running. Many problems can be resolved before they negatively affect operations.

The Solaris OS uses the fault manager daemon, fmd(1M), which starts at boot time and runs in the background to monitor the system. If a component generates an error, the daemon handles the error by correlating the error with data from previous errors and other related information to diagnose the problem. Once diagnosed, the fault manager daemon assigns the problem a Universal Unique Identifier (UUID) that distinguishes the problem across any set of systems. When possible, the fault manager daemon initiates steps to self-heal the system and take the component offline. The daemon also logs the fault to the syslogd daemon and provides a fault notification with a message ID (MSGID). You can use the message ID to get additional information about the problem from Sun’s knowledge article database.

The Predictive Self-Healing technology covers the following Sun Blade T6340 server module components:

The PSH console message provides the following information:

If the Solaris PSH facility has detected a faulty component, type the fmdump command to identify the fault. Faulty FRUs are identified in fault messages using the FRU name. For a list of FRU names, see TABLE 1-3.



Note - Additional Predictive Self-Healing information is available at: http://www.sun.com/msg


2.7.1 Identifying Faults With the fmadm faulty and fmdump Commands

2.7.1.1 Using the fmadm faulty Command

1. Use the fmadm faulty command to identify a faulty component.


# fmadm faulty
STATE RESOURCE /UUID 
faulted cpu:///cpuid=8/serial=FAC006AE4515C47
	8856153f-6f9b-47c6-909a-b05180f53c07

The output shows the UUID of the related fault and provides information for clearing the fault.

2. Use the output of this command to clear the fault as shown in Section 2.7.2, Clearing PSH Detected Faults.

If fmadm faulty does not identify a faulty component or if you need more detailed information, type the fmdump command.

2.7.1.2 Using the fmdump Command

The fmdump command displays the list of faults detected by the Solaris PSH facility. Use this command for the following reasons:

If you already have a fault message ID, go to Step 2 to obtain more information about the fault from the Sun Predictive Self-Healing Knowledge Article web site.



Note - Faults detected by the Solaris PSH facility are also reported through ILOM alerts. In addition to the PSH fmdump command, the ILOM show faulty command also provides information about faults and displays fault UUIDs. See Section 2.5.1, Displaying System Faults.


1. Check the event log by typing the fmdump command with -v for verbose output.

For example:


# fmdump -v
TIME				UUID					SUNW-MSG-ID
Apr 24 06:54:08.2005 lce22523-lc80-6062-e61d-f3b39290ae2c SUN4V-8000-6H
100% fault.cpu.ultraSPARCT2l2cachedata
	FRU:hc:///component=MB
	rsrc: cpu:///cpuid=0/serial=22D1D6604A

In this example, a fault is displayed, indicating the following details:

2. Use the Sun message ID to obtain more information about this type of fault.

a. In a browser, go to the Predictive Self-Healing Knowledge Article web site: http://www.sun.com/msg

b. Type the message ID in the SUNW-MSG-ID field, and press Lookup.

In this example, the message ID SUN4U-8000-6H returns the following information for corrective action:


CPU errors exceeded acceptable levels
 
Type
    Fault 
Severity
    Major 
Description
    The number of errors associated with this CPU has exceeded acceptable levels. 
Automated Response
    The fault manager will attempt to remove the affected CPU from service. 
Impact
    System performance may be affected. 
 
Suggested Action for System Administrator
    Schedule a repair procedure to replace the affected CPU,
the identity of which can be determined using 
fmdump -v -u <EVENT_ID>. 
 
Details
    The Message ID:   SUN4U-8000-6H indicates diagnosis has
determined that a CPU is faulty. The Solaris fault manager arranged
an automated attempt to disable this CPU. The recommended action
for the system administrator is to contact Sun support so a Sun
service technician can replace the affected component. 

c. Follow the suggested actions to repair the fault.

2.7.2 Clearing PSH Detected Faults

When the Solaris PSH facility detects faults, the faults are logged and displayed on the console. After the fault condition is corrected, for example by replacing a faulty FRU, you might have to clear the fault.

1. After replacing a faulty FRU, boot the system.

2. Type fmadm faulty:


# fmadm faulty
STATE RESOURCE /UUID 
faulted cpu:///cpuid=8/serial=FAC006AE4515C47
	8856153f-6f9b-47c6-909a-b05180f53c07

3. Clear the fault from all persistent fault records.

In some cases, even though the fault is cleared, some persistent fault information remains and results in erroneous fault messages at boot time. To ensure that these messages are not displayed, perform the following command:

fmadm repair UUID

For example:


# fmadm repair cpu:///cpuid=8/serial=FAC006AE4515C47
fmadm: recorded repair to cpu:///cpuid=8/serial=FAC006AE4515C47
# fmadm faulty
	STATE RESOURCE/UUID



Note - You can also use the FRU fault UUID instead of the Fault Management Resource Identifier (FMRI).


Typing fmadm faulty after the repair command verifies that there are no more faults.

2.7.3 Clearing the PSH Fault From the ILOM Logs

When the Solaris PSH facility detects faults, the faults are also logged by the ILOM software.



Note - If you clear the faults using Solaris PSH, you do not have to clear the faults in ILOM. If you clear the faults in ILOM, you do not have to clear them with Solaris PSH.




Note - If you are diagnosing or replacing faulty DIMMs, do not follow this procedure. Instead, perform the procedure in Section 4.3.2, Replacing the DIMMs.


1. After replacing a faulty FRU, at the ILOM prompt, type the ILOM -> show faulty command to identify PSH detected faults.

PSH detected faults are distinguished from other faults by the text:
Host detected fault.

For example:

-> show faulty

2. Use the ILOM clear_fault command to clear the fault on the component provided in the show faulty output:


-> set /SYS/component clear_fault_action=true
Clearing fault from component...
Fault cleared.


2.8 Collecting Information From Solaris OS Files and Commands

With the Solaris OS running on the Sun Blade T6340 server module, you have all the Solaris OS files and commands available for collecting information and for troubleshooting.

In the event that POST, ILOM, or the Solaris PSH features did not indicate the source of a fault, check the message buffer and log files for fault notifications. Hard drive faults are usually captured by the Solaris message files.

Type the dmesg command to view the most recent system message.

Use the /var/adm/messages file to view the system messages log file.

2.8.1 Checking the Message Buffer

1. Log in as superuser.

2. Type the dmesg command.


# dmesg

The dmesg command displays the most recent messages generated by the system.

2.8.2 Viewing the System Message Log Files

The error logging daemon, syslogd, automatically records various system warnings, errors, and faults in message files. These messages can alert you to system problems such as a device that is about to fail.

The /var/adm directory contains several message files. The most recent messages are in the /var/adm/messages file. After a period of time (usually every ten days), a new messages file is automatically created. The original contents of the messages file are rotated to a file named messages.1. Over a period of time, the messages are further rotated to messages.2 and messages.3, and then deleted.

1. Log in as superuser.

2. Type the following command.


# more /var/adm/messages

3. If you want to view all logged messages, type:


# more /var/adm/messages*


2.9 Managing Components With Automatic System Recovery Commands

The Automatic System Recovery (ASR) feature enables the server module to automatically unconfigure failed components to remove them from operation until they can be replaced. In the Sun Blade T6340 server module, the following components are managed by the ASR feature:

The database that contains the list of disabled components is called the ASR blacklist (asr-db).

In most cases, POST automatically disables a component when it is faulty. After the cause of the fault is repaired (FRU replacement, loose connector reseated, and so on), you must remove the component from the ASR blacklist.

The ASR commands (TABLE 2-8) enable you to view and manually add or remove components from the ASR blacklist. These commands are run from the ILOM -> prompt. For information about ALOM CMT commands, see the Sun Integrated Lights Out Manager 2.0 Supplement for Sun Blade T6340 Server Modules, 820-3904.


TABLE 2-8 ASR Commands

ILOM Web Interface

ILOM Command

ALOM Command

Description

Select the following tabs: System Information, Components, Actions,
then select the action.

show /SYS/component
component_state

showcomponent [3]

Displays system components and their current state.

set /SYS/component
component_state=enabled

enablecomponent asrkey

Removes a component from the asr-db blacklist, where asrkey is the component to enable.

set /SYS/component
component_state=disabled

disablecomponent asrkey

Adds a component to the asr-db blacklist, where asrkey is the component to disable.

No equivalent in ILOM

clearasrdb

Removes all entries from the asr-db blacklist.




Note - The components (asrkeys) vary from system to system, depending on how many cores and memory are present. Type the showcomponent command to see the asrkeys on a given system.




Note - A reset or powercycle is required after disabling or enabling a component. If the status of a component is changed with power on there is no effect to the system until the next reset or powercycle.


2.9.1 Displaying System Components With the show /SYS Command

To see examples of ILOM web interface and CLI commands that show component status, see Section 2.5.2, Displaying the Environmental Status with the ILOM CLI.

The show command displays the system components (asrkeys) and reports their status.

1. At the -> prompt, type the show command.

An example with no disabled components.


-> show -level all -o table component_state
 
Target              | Property               | Value
--------------------+------------------------+---------------------------------
/SYS/MB/PCIE0       | component_state        | Enabled
/SYS/MB/PCIE1       | component_state        | Enabled
/SYS/MB/PCI-        | component_state        | Enabled
 SWITCH0            |                        |
/SYS/MB/PCI-        | component_state        | Enabled
 SWITCH1            |                        |
/SYS/MB/REM         | component_state        | Enabled
/SYS/MB/NET0        | component_state        | (none)
/SYS/MB/NET1        | component_state        | (none)
/SYS/MB/PCIE-IO     | component_state        | Enabled
/SYS/MB/PCIE-IO/    | component_state        | Enabled
 USB                |                        |
/SYS/MB/PCIE-IO/    | component_state        | Enabled
 GRFX               |                        |
/SYS/MB/CMP0/MCU0   | component_state        | Enabled
/SYS/MB/CMP0/MCU1   | component_state        | Enabled
 
Commands:
        cd
        show
-> 


2.10 Exercising the System With SunVTS

Sometimes a system exhibits a problem that cannot be isolated definitively to a particular hardware or software component. In such cases, it might be useful to run a diagnostic tool that stresses the system by continuously running a comprehensive battery of tests. Sun provides the SunVTS software for this purpose.

2.10.1 Checking SunVTS Software Installation

This procedure assumes that the Solaris OS is running on the Sun Blade T6340 server module, and that you have access to the Solaris command line.

1. Check for the presence of SunVTS packages using the pkginfo command.


# pkginfo | grep -i vts
system 				SUNWvts 				SunVTS Framework
system 				SUNWvtsmn 				SunVTS Man Pages
system 				SUNWvtsr 				SunVTS Framework (root)
system 				SUNWvtss 				SunVTS Server and BUI
system 				SUNWvtsts 				SunVTS Core Installation Tests
#

TABLE 2-9 lists some SunVTS packages.


TABLE 2-9 Sample of installed SunVTS Packages

Package

Description

SUNWvts

SunVTS framework

SUNWvtsr

SunVTS Framework (root)

SUNWvtss

SunVTS middle server and BI components

SUNWvtsts

SunVTS for tests

SUNWvtsmn

SunVTS man pages


If SunVTS is not installed, you can obtain the installation packages from the following resources:

The SunVTS 7.0 software, and subsequent compatible versions, are supported on the Sun Blade T6340 server module.

SunVTS installation instructions are described in the Sun VTS 7.0 User’s Guide, 820-0012.

2.10.2 Exercising the System Using SunVTS Software

Before you begin, the Solaris OS must be running. You should verify that SunVTS validation test software is installed on your system. See Section 2.10.1, Checking SunVTS Software Installation.

The SunVTS installation process requires that you specify one of two security schemes to use when running SunVTS. The security scheme you choose must be properly configured in the Solaris OS for you to run SunVTS.

SunVTS software features both character-based and graphics-based interfaces.

For more information about the character-based SunVTS TTY interface, and specifically for instructions on accessing it by TIP or telnet commands, refer to the Sun VTS 7.0 User’s Guide.

Finally, this procedure describes how to run SunVTS tests in general. Individual tests might presume the presence of specific hardware, or might require specific drivers, cables, or loopback connectors. For information about test options and prerequisites, refer to the following documentation:

1. Log in as superuser to a system with a graphics display.

The display system should be one with a frame buffer and monitor capable of displaying bitmap graphics such as those produced by the SunVTS BI.

2. Enable the remote display.

On the display system, type:


# /usr/openwin/bin/xhost + test-system

where test-system is the name of the server you plan to test.

3. Remotely log in to the server as superuser.

Type a command such as rlogin or telnet.

4. Start SunVTS software.


# /usr/sunvts/bin/startsunvts

As SunVTS starts, it prompts you to choose between using CLI, BI, or tty interfaces. A representative SunVTS BI is displayed below (FIGURE 2-17).

FIGURE 2-17 SunVTS BI


This screen capture shows a small portion of the test selection area in the SunVTS graphical interface.

 

5. (Optional) Select the test category you want to run.

Certain tests are enabled by default, and you can choose to accept these.

Alternatively, you can enable or disable test categories by clicking the checkbox next to the test name or test category name. Tests are enabled when checked, and disabled when not checked.

TABLE 2-10 lists tests that are especially useful to run on this server.


TABLE 2-10 Useful SunVTS Tests to Run on This Server

Category

SunVTS Tests

FRUs Exercised by Tests

CPU

mptest

CPU and motherboard

Graphics

pfbtest, graphicstest--indirectly: systest

DIMMs, CPU motherboard

Processor

cmttest, cputest, fputest, iutest, l1dcachetest, dtlbtest, and l2sramtest--indirectly: mptest, and systest

DIMMs, CPU motherboard

Disk

disktest

Disks, cables, disk backplane

Environment

hsclbtest, cryptotest

Crypto engine (CPU), SP <-->, host communication channels (motherboard)

Network

nettest, netlbtest, xnetlbtest

Network interface, network cable, CPU motherboard

Memory

pmemtest, vmemtest, ramtest

DIMMs, motherboard

I/O ports

usbtest, iobustest

Motherboard, service processor

(Host to service processor interface)


6. (Optional) Customize individual tests.

You can customize test categories by right-clicking on the name of the test.

7. Start testing.

Click the Start button that is located at the top left of the SunVTS window. Status and error messages appear in the test messages area located across the bottom of the window. You can stop testing at any time by clicking the Stop button.

During testing, SunVTS software logs all status and error messages. To view these messages, click the Log button or select Log Files from the Reports menu. This action opens a log window from which you can choose to view the following logs:


2.11 Resetting the Password to the Factory Default

The procedure for resetting the ILOM root password to the factory default (changeme) requires installation of a jumper on the service processor. This procedure should be performed by a technician, a service professional, or a system administrator who services and repairs computer systems. This person should meet the criteria described in the preface of the Sun Blade T6340 Server Module Service Manual.

2.11.1 To Reset the Root Password to the Factory Default

1. Remove the server module from the modular system chassis.

Prepare for removal using ILOM or ALOM CMT commands and ensure that the blue OK to Remove LED is lit, indicating that it is safe to remove the blade.

2. Open the server module and install a standard jumper at location J0601, pins 11 and 12.

3. Close the server module, install it in the modular system chassis, and boot the server module.

Refer to the Sun Blade T6340 Server Module Installation and Administration Guide for instructions.

The ILOM root password is now reset to the factory default (changeme).

4. Change the root password.

Refer to the Sun Blade T6340 Server Module Installation and Administration Guide for instructions.

5. Remove the server module from the modular system chassis and remove the jumper.

As in Step 1, prepare for removal using ILOM or ALOM CMT commands and ensure that the blue OK to Remove LED is lit, indicating that it is safe to remove the blade.

6. Close the server module, install it in the modular system chassis, and boot the server module.

Refer to the Sun Blade T6340 Server Module Installation and Administration Guide for instructions.

 


1 (TableFootnote) Upgrade path: DIMMs should be added with each group populated in the order shown.
2 (TableFootnote) The keyswitch_state parameter, when set to diag, overrides all the other POST variables.
3 (TableFootnote) The showcomponent command might not report all blacklisted DIMMs.