Sun Netra T5440 Server

Exit Print View

Updated: September 2015
 
 

1.5.2 Reasons to Run POST

You can use POST for basic hardware verification and diagnosis, and for troubleshooting as described in the following sections.

1.5.2.1 Verifying Hardware Functionality

POST tests critical hardware components to verify functionality before the system boots and accesses software. If POST detects an error, the faulty component is disabled automatically, preventing faulty hardware from potentially harming software.

1.5.2.2 Diagnosing the System Hardware

You can use POST as an initial diagnostic tool for the system hardware. In this case, configure POST to run in maximum mode (diag_mode=service, setkeyswitch=diag, diag_level=max) for thorough test coverage and verbose output.

Running POST in Maximum Mode

This procedure describes how to run POST when you want maximum testing, as in the case when you are troubleshooting a server or verifying a hardware upgrade or repair.

  1. Switch from the system console prompt to the sc> prompt by issuing the #. escape sequence.
    ok #.
    sc>
    
  2. Set the virtual keyswitch to diag so that POST will run in service mode.
    sc> setkeyswitch diag
    
  3. Reset the system so that POST runs.

    There are several ways to initiate a reset. Initiating POST Using the powercycle Command shows the powercycle command. For other methods, refer to the Sun Netra T5440 Server Administration Guide.

  4. Switch to the system console to view the POST output:
    sc> console
    

    Initiating POST Using the powercycle Command depicts abridged POST output.

  5. Perform further investigation if needed.
    • If no faults were detected, the system will boot.

    • If POST detects a faulty device, the fault is displayed and the fault information is passed to ALOM CMT CLI for fault handling. Faulty FRUs are identified in fault messages using the FRU name.

    1. Interpret the POST messages:

      POST error messages use the following syntax:

      c:s > ERROR: TEST = failing-test
      c:s > H/W under test = FRU
      c:s > Repair Instructions: Replace items in order listed by H/W under test above
      c:s > MSG = test-error-message
      c:s > END_ERROR

      In this syntax, c = the core number, s = the strand number.

      Warning and informational messages use the following syntax:

      INFO or WARNING: message

      In POST Error Message, POST reports a memory error at FB-DIMM location /SYS/MB/CMP0/BR1/CH0/D0. The error was detected by POST running on core 7, strand 2.

    2. Run the showfaults command to obtain additional fault information.

      The fault is captured by ALOM CMT CLI, where the fault is logged, the Service Required LED is lit, and the faulty component is disabled.

      Refer to showfaults Output.

      In this example, /SYS/MB/CMP0/BR1/CH0/D0 is disabled. The system can boot using memory that was not disabled until the faulty component is replaced.


      Note - You can use ASR commands to display and control disabled components. See Managing Components With Automatic System Recovery Commands.
Example 1-1  Initiating POST Using the powercycle Command
sc> powercycle
Are you sure you want to powercycle the system (y/n)? y
Powering host off at Fri Jul 27 08:11:52 2007
Waiting for host to Power Off; hit any key to abort.
Audit | minor: admin : Set : object = /SYS/power_state : value = soft : success
Chassis | critical: Host has been powered off
Powering host on at Fri Jul 27 08:13:08 2007
Audit | minor: admin : Set : object = /SYS/power_state : value = on : success
Chassis | major: Host has been powered on
Example 1-2  Initiating POST Using the powercycle Command
sc> console
 
/export/delivery/delivery/4.x/4.x.build_119/post4.x/Niagara/t5440/integrated  (root) 
2007-07-03 10:25:12.386 0:0:0>Copyright 2007 Sun Microsystems, Inc. All rights reserved
2007-07-03 10:25:12.550 0:0:0>VBSC cmp0 arg is: 00ff00ff.ffffffff
2007-07-03 10:25:13.353 0:0:0>Basic Memory Tests.....
2007-07-03 10:25:12.653 0:0:0>POST enabling threads: 00ff00ff.ffffffff
2007-07-03 10:25:12.766 0:0:0>VBSC mode is: 00000000.00000001
2007-07-03 10:25:13.456 0:0:0>Begin: Branch Sanity Check2007-07-03 10:25:38.399 0:0:0>End  : DRAM Memory BIST
2007-07-03 10:25:13.569 0:0:0>End  : Branch Sanity Check2007-07-03 10:25:39.658 0:0:0>L2 Bank EFuse = 00000000.000000ff 2007-07-03 10:25:39.547 0:0:0>Sys 166 MHz, CPU 1166 MHz, Mem 332 MHz 2007-07-03 10:25:39.760 0:0:0>L2 Bank status = 00000000.00000f0f 2007-07-03 10:25:13.066 0:0:0>VBSC setting verbosity level 32007-07-03 10:25:12.081 0:0:0>@(#)Sun Netra[TM] T5440 POST 4.x.build_119 2007/06/06 09:48 
2007-07-03 10:25:12.867 0:0:0>VBSC level is: 00000000.000000012007-07-03 10:25:12.966 0:0:0>VBSC selecting POST MAX Testing.2007-07-03 10:25:39.864 0:0:0>Core available Efuse = ffff00ff.ffffffff 
2007-07-03 10:25:13.668 0:0:0>Begin: DRAM Memory BIST2007-07-03 10:25:13.793 0:0:0>................................................................................................2007-07-03 10:25:13.161 0:0:0>	Niagara2, Version 2.12007-07-03 10:25:13.247 0:0:0>	Serial Number: 0fac006b.0e654482
2007-07-03 10:25:39.982 0:0:0>Test Memory.....Enter #. to return to ALOM.
2007-07-03 10:25:40.070 0:0:0>Begin: Probe and Setup Memory2007-07-03 10:29:21.683 0:0:0>INFO:
2007-07-03 10:25:40.181 0:0:0>INFO:	  4096MB at Memory Branch 0 2007-07-03 10:29:21.686 0:0:0>	POST Passed all devices.
...2007-07-03 10:29:21.692 0:0:0>POST:	Return to VBSC.
 
 
 
 
 
 
 
 
Example 1-3  POST Error Message
7:2>
7:2>ERROR: TEST = Data Bitwalk
7:2>H/W under test = /SYS/MB/CMP0/BR1/CH0/D0
7:2>Repair Instructions: Replace items in order listed by 'H/W
under test' above.
7:2>MSG = Pin 149 failed on /SYS/MB/CMP0/BR1/CH0/D0 (J2001)
7:2>END_ERROR
 
7:2>Decode of Dram Error Log Reg Channel 2 bits
60000000.0000108c
7:2> 1 MEC 62 R/W1C Multiple corrected
errors, one or more CE not logged
7:2> 1 DAC 61 R/W1C Set to 1 if the error
was a DRAM access CE
7:2> 108c SYND 15:0 RW ECC syndrome.
7:2>
7:2> Dram Error AFAR channel 2 = 00000000.00000000
7:2> L2 AFAR channel 2 = 00000000.00000000
Example 1-4  showfaults Output
ok .#
sc> showfaults
Last POST Run: Wed Jun 27 21:29:02 2007
 
Post Status: Passed all devices
ID FRU                     Fault
0 /SYS/MB/CMP0/BR1/CH0/D0 SP detected fault: /SYS/MB/CMP0/BR1/CH0/D0 Forced fail (POST)

Clearing POST Detected Faults

In most cases, when POST detects a faulty component , POST logs the fault and automatically takes the failed component out of operation by placing the component in the ASR blacklist (see Managing Components With Automatic System Recovery Commands).

In most cases, the replacement of the faulty FRU is detected when the service processor is reset or power cycled. In this case, the fault is automatically cleared from the system. This procedure describes how to identify POST detected faults and, if necessary, manually clear the fault.

  1. After replacing a faulty FRU, at the ALOM CMT CLI prompt use the showfaults command to identify POST detected faults.

    POST detected faults are distinguished from other kinds of faults by the text: Forced fail, and no UUID number is reported.

    See POST Detected Fault.

    If no fault is reported, you do not need to do anything else. Do not perform the subsequent steps.

  2. Use the enablecomponent command to clear the fault and remove the component from the ASR blacklist.

    Use the FRU name that was reported in the fault in Step 1. See Using the enablecomponent Command.

    The fault is cleared and should not show up when you run the showfaults command. Additionally, the Service Required LED is no longer on.

  3. Power cycle the server.

    You must reboot the server for the enablecomponent command to take effect.

  4. At the ALOM CMT CLI prompt, use the showfaults command to verify that no faults are reported.

    See Verifying Cleared Faults Using the showfaults Command.

Example 1-5  POST Detected Fault
sc> showfaults
Last POST Run: Wed Jun 27 21:29:02 2007
 
Post Status: Passed all devices
ID FRU                     Fault
0 /SYS/MB/CMP0/BR1/CH0/D0 SP detected fault: /SYS/MB/CMP0/BR1/CH0/D0 Forced fail (POST)
Example 1-6  Using the enablecomponent Command
sc> enablecomponent /SYS/MB/CMP0/BR1/CH0/D0 
Example 1-7  Verifying Cleared Faults Using the showfaults Command
sc> showfaults
Last POST run: THU MAR 09 16:52:44 2006
POST status: Passed all devices
 
No failures found in System