Messages and warnings are not automatically signs of problems.The Fibre Channel protocol and the host drivers are designed to be robust. Occasionally, warnings or messages are generated to the console that do not indicate failures but tend to cause alarm for users.
Most peripherals perform internal retries often without generating any output. Disk drive firmware has fairly complex retry algorithms which retry failures, only reporting an actual failure when retry counts are exhausted. Sun's driver philosophy is to generate these messages and warnings so that diagnosis of real problems may be facilitated. The bottom line is that messages and warnings are not always cause for alarm. The following are some common messages and warnings and some insight behind them.
Messages are informational only and do not imply a failure condition. Messages are sent to the console without any preface (such as WARNING or FATAL ERROR).
Nov 12 14:46:53 kapila unix: ID[SUNWssa.socal.link.5010] socal1: port 1: Fibre Channel is OFFLINE |
(Other messages or warnings) |
Nov 12 14:48:53 kapila unix: ID[SUNWssa.socal.link.5010] socal1: port 1: Fibre Channel is ONLINE |
The Fibre Channel loops may from time to time get re-initialized causing service to the loop to be momentarily suspended during this initialization.Common causes of OFFLINE/ONLINE (loop re-initialization)
Soft or hard addition or removal of a device on the loop
Power cycle of device on the loop
Forced loop-init by driver recovery algorithms
Disk array reset following a download
Temporary loss of sync on the loop
All outstanding commands on this particular loop are automatically retried as soon as the loop's initialization is complete and normal operation will resume.
Warnings are an indication of a non-fatal error. Typically retry logic takes care of the problem. Warning messages are prefaced at the console with the keyword WARNING.
14:43:01 kapila unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,socal@2,0/sf@1,0/ssd@0,0 (ssd10): |
Nov 12 14:43:01 kapila unix: SCSI transport failed: reason 'timeout': retrying command |
This command is retried and normal operations continue. Sometimes the timeout may be accompanied by a loop reset (see OFFLINE/ONLINE sequences).These events are normal and are no cause for alarm unless they occur at a rate greater than five times per 24 hours. No data is lost or corrupted and commands are completed on subsequent retry.
Fibre Channel Loops are specified to have a bit error rate (BER) less than 10E-12. Actual BER is better than 10E-13 and may be as clean as 10E-15.However, you can occasionally experience a bit error that results in a corrupted frame. As corrupted frames are discarded, the end result will be a command that fails to complete and which eventually gets timed out by the ssd driver. A warning indicating a command timeout is generated to the console.
Nov 12 14:45:09 kapila unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,socal@2,0/sf@0,0/ssd@1,0 (ssd33): |
Nov 12 14:45:09 kapila unix: SCSI transport failed: reason 'tran_err': retrying command |
Some warnings that indicate transport errors due to the link being temporarily unavailable during a loop re-initialization can be expected. For example, there may be several of these accompanying an OFFLINE/ONLINE sequence. These commands are retried after the loop is re-initialized.