About Detecting Underperforming Disks

ASR automatically identifies and removes a poorly performing disk from the active configuration. Recovery Appliance then runs a set of performance tests. When CELLSRV detects poor disk performance, the cell disk status changes to normal - confinedOnline, and the physical disk status changes to warning - confinedOnline. Table 13-2 describes the conditions that trigger disk confinement:

Table 13-2 Alerts Indicating Poor Disk Performance

Alert Code Cause

CD_PERF_HANG

Disk stopped responding

CD_PERF_SLOW_ABS

High service time threshold (slow disk)

CD_PERF_SLOW_RLTV

High relative service time threshold (slow disk)

CD_PERF_SLOW_LAT_WT

High latency on writes

CD_PERF_SLOW_LAT_RD

High latency on reads

CD_PERF_SLOW_LAT_RW

High latency on reads and writes

CD_PERF_SLOW_LAT_ERR

Frequent very high absolute latency on individual I/Os

CD_PERF_IOERR

I/O errors

If the problem is temporary and the disk passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked poor performance, and ASR submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. Otherwise, the cell disk status stays at normal - confinedOnline until the disks can be taken offline safely. See "Removing an Underperforming Physical Disk".

The disk status change is recorded in the server alert history:

MESSAGE ID date_time info "Hard disk entered confinement status. The LUN
 n_m changed status to warning - confinedOnline. CellDisk changed status to normal
 - confinedOnline. Status: WARNING - CONFINEDONLINE  Manufacturer: name  Model
 Number: model  Size: size  Serial Number: serial_number  Firmware: fw_release 
 Slot Number: m  Cell Disk: cell_disk_name  Grid Disk: grid disk 1, grid disk 2
     .
     .
     .
Reason for confinement: threshold for service time exceeded"

These messages are entered in the storage cell alert log:

CDHS: Mark cd health state change cell_disk_name  with newState HEALTH_BAD_
ONLINE pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD
Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0
inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0
global conf related state: numHDsConf: 1 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0
     .
     .
     .