Solaris 10 What's New

Predictive Self-Healing

This feature is new in the Solaris Express 6/04 release. The Solaris Express 10/04 release and the Solaris 10 3/05 release provided important enhancements.

Sun Microsystems has developed a new architecture for building and deploying systems and services capable of Predictive Self-Healing. Self-healing technology enables Sun systems and services to maximize availability when software and hardware faults occur. In addition, the self-healing technology facilitates a simpler and more effective end-to-end experience for system administrators and service providers, thereby reducing costs. The first major set of new features to result from this initiative is available in the Solaris 10 OS. The Solaris 10 software includes components that facilitate self-healing for CPU, memory, I/O bus nexus components, and system services.

Specific details about the components of this new architecture are covered in the following descriptions for the Solaris Service Manager and the Solaris Fault Manager.

Solaris Service Manager

Introduced in the Solaris Express 10/04 release and enhanced in the Solaris 10 3/05 release, the Solaris Service Manager provides an infrastructure that augments the traditional UNIX startup scripts, init run levels, and configuration files. This infrastructure provides the following functions:

See Chapter 9, “Managing Services (Overview),” in the System Administration Guide: Basic Administration for more information about this infrastructure. An overview of the infrastructure can be found in the smf(5) man page.

Solaris Fault Manager

Predictive Self-Healing systems include a simplified administration model. Traditional error messages are replaced by telemetry events that are consumed by software components. The software components automatically diagnose the underlying fault or defect and initiate self-healing activities. Examples of self-healing activities include administrator messaging, isolation or deactivation of faulty components, and guided repair. A new software component is called Fault Manager, fmd(1M). The Fault Manager manages the telemetry, log files, and components. The new fmadm(1M), fmdump(1M), and fmstat(1M) tools are also available in the Solaris 10 OS to interact with the Fault Manager and new log files.

When appropriate, the Fault Manager sends a message to the syslogd(1M) service to notify an administrator that a problem has been detected. The message directs administrators to a knowledge article on Sun's new message Web site, http://www.sun.com/msg/, which explains more about the problem impact and appropriate responses and repair actions.

The Solaris Express 6/04 release introduced self-healing components for automated diagnosis and recovery for UltraSPARC-III and UltraSPARC-IV CPU and memory systems. This release also provided enhanced resilience and telemetry for PCI-based I/O.