Strategy for Diagnostics

This section provides different strategies for diagnostics. This section contains the following topics:

To Diagnose Server Problems
Service Processor
Standalone Package-Based Diagnostics
Offline Operating System-Based Diagnostics
Online Operating System-Based Diagnostics

To Diagnose Server Problems

To be effective, troubleshooting and diagnoses must be systematic and progressive. Therefore, follow these steps when diagnosing server problems:

Use the firmware diagnostics to validate the Oracle Integrated Lights Out Manager (Oracle ILOM) service processor (SP) hardware.
Given a stable SP, expand the scope and coverage using the standalone diagnostics.
Use operating system-based diagnostics for full server-level exercises.
For more information about each element of this approach, see the following topics:

Service Processor

The Oracle Integrated Lights Out Manager (Oracle ILOM) SP uses Linux. The first code executed by the SP is a small boot loader known as U-Boot. The U-Boot code performs similar functions to the BIOS power-on self test (POST) in that it initializes devices, with minimal testing, and boots the Linux kernel.

Standalone Package-Based Diagnostics

Diagnostics that are performed before the operating system (OS) is booted can assume complete control of a subsystem or system’s resources. These diagnostics support the most thorough testing of components, since the diagnostics control all of the resources under test. However, the effort to write the code to manage all resources under test, while providing fine-grained control, can be quite complex (effectively a light-weight OS tailored to testing). To avoid development of such complex infrastructure, pre-OS diagnostics might provide thorough, targeted testing of components in isolation.

Standalone diagnostics are typically run in manufacturing environments or at a customer site during a new server installation. In this environment, the diagnostics can be run without being concerned about corrupting or destroying customer data. Standalone diagnostics are run with the assumption that there are no restrictions on resource utilization (for example, they can force CPU and/or IO boundary conditions to achieve effective testing) since the servers are not in use by customers.

Offline Operating System-Based Diagnostics

When diagnostics are written on top of an operating system, the diagnostics can rely on the resources of the OS (for example, process scheduling) to allow simultaneous testing of multiple components. However, some direct control of the components might be lost. That is, the OS will, as necessary, enforce encapsulation of hardware resources to prevent access by the diagnostics to ensure reliable server behavior.

Further, since the OS inherently manages server resources, exercises can be built using the OS that can test multiple subsystems simultaneously.

Online Operating System-Based Diagnostics

Online OS diagnostics are similar to offline OS diagnostics in terms of support of resources. However, online diagnostics are run in customer sites and cannot alter data repositories and must be careful not to over utilize server resources (for example, these diagnostics must not consume too many CPU cycles or too much network bandwidth).

Note - Oracle does not expect that customers will run online OS diagnostics since those diagnostics drain compute resources and have limited effectiveness due to their inability to lock resources. The Fault Management Architecture eliminates the need for online diagnostics.

Skip Navigation Links
Exit Print View
	Oracle® x86 Servers Diagnostics Guide For Servers Supporting Oracle ILOM 3.0.x