Predictive Self-Healing systems include a simplified administration model. Traditional error messages are replaced by telemetry events that are consumed by software components. The software components automatically diagnose the underlying fault or defect and initiate self-healing activities. Examples of self-healing activities include administrator messaging, isolation or deactivation of faulty components, and guided repair. A new software component is called Fault Manager, fmd(1M). The Fault Manager manages the telemetry, log files, and components. The new fmadm(1M), fmdump(1M), and fmstat(1M) tools are also available in the Solaris 10 OS to interact with the Fault Manager and new log files.
When appropriate, the Fault Manager sends a message to the syslogd(1M) service to notify an administrator that a problem has been detected. The message directs administrators to a knowledge article on Sun's new message Web site, http://www.sun.com/msg/, which explains more about the problem impact and appropriate responses and repair actions.
The Solaris Express 6/04 release introduced self-healing components for automated diagnosis and recovery for UltraSPARC-III and UltraSPARC-IV CPU and memory systems. This release also provided enhanced resilience and telemetry for PCI-based I/O.