Sun Enterprise Network Array Installation Supplement

Messages and Warnings

Messages and warnings are not automatically signs of problems.The Fibre Channel protocol and the host drivers are designed to be robust. Occasionally, warnings or messages are generated to the console that do not indicate failures but tend to cause alarm for users.

Most peripherals perform internal retries often without generating any output. Disk drive firmware has fairly complex retry algorithms which retry failures, only reporting an actual failure when retry counts are exhausted. Sun's driver philosophy is to generate these messages and warnings so that diagnosis of real problems may be facilitated. The bottom line is that messages and warnings are not always cause for alarm. The following are some common messages and warnings and some insight behind them.

Messages

Messages are informational only and do not imply a failure condition. Messages are sent to the console without any preface (such as WARNING or FATAL ERROR).

OFFLINE/ONLINE Message Sequences

 Nov 12 14:46:53 kapila unix: ID[SUNWssa.socal.link.5010] socal1: port 1: Fibre Channel is OFFLINE
 (Other messages or warnings)
 Nov 12 14:48:53 kapila unix: ID[SUNWssa.socal.link.5010] socal1: port 1: Fibre Channel is ONLINE

The Fibre Channel loops may from time to time get re-initialized causing service to the loop to be momentarily suspended during this initialization.Common causes of OFFLINE/ONLINE (loop re-initialization)

Warnings

Warnings are an indication of a non-fatal error. Typically retry logic takes care of the problem. Warning messages are prefaced at the console with the keyword WARNING.

timeout Warning

 14:43:01 kapila unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,socal@2,0/sf@1,0/ssd@0,0 (ssd10):
 Nov 12 14:43:01 kapila unix: SCSI transport failed: reason 'timeout': retrying command

This command is retried and normal operations continue. Sometimes the timeout may be accompanied by a loop reset (see OFFLINE/ONLINE sequences).These events are normal and are no cause for alarm unless they occur at a rate greater than five times per 24 hours. No data is lost or corrupted and commands are completed on subsequent retry.

Fibre Channel Loops are specified to have a bit error rate (BER) less than 10E-12. Actual BER is better than 10E-13 and may be as clean as 10E-15.However, you can occasionally experience a bit error that results in a corrupted frame. As corrupted frames are discarded, the end result will be a command that fails to complete and which eventually gets timed out by the ssd driver. A warning indicating a command timeout is generated to the console.

trans_err Warning

 Nov 12 14:45:09 kapila unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,socal@2,0/sf@0,0/ssd@1,0 (ssd33):
 Nov 12 14:45:09 kapila unix: SCSI transport failed: reason 'tran_err': retrying command

Some warnings that indicate transport errors due to the link being temporarily unavailable during a loop re-initialization can be expected. For example, there may be several of these accompanying an OFFLINE/ONLINE sequence. These commands are retried after the loop is re-initialized.