Go to main content

Oracle® Server X5-4 Product Notes

Exit Print View

Updated: August 2021
 
 

SMI Half-Width Failover Error During Server Boot (20494095)

This was fixed in system software release 1.0.4.

On rare occasions during server power on or reset, the system might light the server front-panel CPU Service Action Required indicator and generate an error for the processor and memory subsystems. Single and isolated incidents of this error can be safely ignored. More information is available by logging in to the Oracle ILOM web interface or CLI. Clear the errors using the CLI fault management shell.

Oracle ILOM Web Interface

To investigate the error, log in to the Oracle ILOM web interface. A Service Required state for the Processor and Memory subsystems appears in the Status section of the Oracle ILOM Summary screen. More information is available by clicking the Open Problems link, wherein the problems are defined as:

A Scalable Memory Interconnect (SMI) half-width failover has been detected.

Note -  Additional information is provided within the problem definition, including identification of the specific processor (P) and memory riser (MR) card.

To repair the fault, see the workaround procedure below.

Oracle ILOM CLI Interface

To investigate and repair the errors using the Oracle ILOM CLI interface, see the workaround procedure below.

Workaround

The processor and MR card errors can be repaired using the CLI fault management shell, as described below. If the error persists or memory performance degrades, contact Oracle Service.

  1. In a terminal window, type the following command to start an ssh session with the server’s service processor (SP):

    ssh root@sp-ip-address

    where sp-ip-address is the IP address of the SP.

  2. When the CLI prompt appears (–>), navigate to the fault management directory by typing the following command:

    cd /SP/faultmgmt

  3. To view components that are in a fault state, type the following command:

    show

    The components are listed under Targets, as shown in the following example:

    /SP/faultmgmt  
    Targets:
    shell  
    0 (/SYS/MB/P0)  
    1 (/SYS/MB/P0/MR1)
  4. Make note of the processor and MR card numbering:

    For example, the following shows the faulted processor as P0 and the faulted MR card as MR1:

    0 (/SYS/MB/P0) 
    1 (/SYS/MB/P0/MR1)
  5. To start the faultmgmt shell, type the following command:

    start shell

    The system responds:

    Are you sure you want to start /SP/faultmgmt/shell (y/n)?

    To confirm, type: y

    The faultmgmt prompt appears:

    (faultmgmtsp>)

  6. To repair the processor, type the following command:

    fmadm repair /SYS/MB/P#

    where P# is the number of the processor

  7. To repair the MR card, type the following command:

    fmadm repair /SYS/MB/P#/MR#

    where P#/MR# is the number of the processor and MR card.

  8. To exit the faultmgmt shell, type:

    exit

  9. Reboot the server and monitor for repeat occurrences of this issue.