System Health Check Overview

The server runs a self-diagnostic utility program called syscheck to monitor itself. The system health check utility syscheck tests the server hardware and platform software. Checks and balances verify the health of the server and platform software for each test, and verify the presence of required application software.

If the syscheck utility detects a problem, an alarm code is generated. The alarm code is a 16-character data string in hexadecimal format. All alarm codes are ranked by severity: critical, major, and minor. Alarm Categories lists the platform alarms and their alarm codes.

The syscheck output can be in either of the following forms (see Health Check Outputs for output examples):

The syscheck utility can be run in the following ways:

Functions Checked by syscheck

Table 1 summarizes the functions checked by syscheck.

System Health Check Operation
System Check Function
Disk Access Verify disk read and write functions continue to be operable. This test attempts to write test data in the file system to verify disk operability. If the test shows the disk is not usable, an alarm is reported to indicate the file system cannot be written to.
Smart Verify that the smartd service has not reported any problems.
File System Verify the file systems have space available to operate. Determine what file systems are currently mounted and perform checks accordingly. Failures in the file system are reported if certain thresholds are exceeded, if the file system size is incorrect, or if the partition could not be found. Alarm thresholds are reported in a similar manner.
Memory Verify that 8 GB of RAM is installed.
Network Verify that all ports are functioning by pinging each network connection (provisioning, sync, and DSM networks). Check the configuration of the default route.
Process Verify that the following critical processes are running. If a program is not running the minimum required number of processes, an alarm is reported. If more than the recommended processes are running, an alarm is also reported.
  • sshd (Secure Shelldaemon)
  • ntpd (NTPdaemon)
  • syscheck (System Health Check daemon)
Hardware Configuration Verify that the processor is running at an appropriate speed and that the processor matches what is required on the server. Alarms are reported when a processor is not available as expected.
Cooling Fans Verifies no fan alarm is present. Fan alarm will be issued if fans are outside expected RPM.
Voltages Measure all monitored voltages on the server main board. Verify that all monitored voltages are within the expected operating range.
Temperature

Measure the following temperatures and verify that they are within a specified range.

  • Inlet and Outlet temperatures
  • Processor internal temperature
  • MCH internal temperature
MPS Platform Provide alarm if internal diagnostics detect any other error, such as server syscheck script failures.