sd - SCSI disk and ATAPI/SCSI CD-ROM device driver
sd@target,lun:partition
To open a device without checking if the vtoc is valid, use the O_NDELAY flag. When the device is opened using O_NDELAY, the first read or write to the device that happens after the open results in the label being read if the label is not currently valid. Once read, the label remains valid until the last close of the device. Except for reading the label, O_NDELAY has no impact on the driver.
The sd SCSI and SCSI/ATAPI driver supports embedded SCSI-2 and CCS-compatible SCSI disk and CD-ROM drives, ATAPI 2.6 (SFF-8020i)-compliant CD-ROM drives, SFF-8090–compliant SCSI/ATAPI DVD-ROM drives, IOMEGA SCSI/ATAPI ZIP drives, SCSI JAZ drives, and USB mass storage devices (refer to scsa2usb(4D)).
To determine the disk drive type, use the SCSI/ATAPI inquiry command and read the volume label stored on block 0 of the drive. (The volume label describes the disk geometry and partitioning and must be present for the disk to be mounted by the system.) A volume label is not required for removable, re-writable or read-only media.
The sd driver supports embedded SCSI-2 and CCS-compatible SCSI disk and CD-ROM drives, ATAPI 2.6 (SFF-8020i)-compliant CD-ROM drives, SFF-8090-compliant SCSI/ATAPI DVD-ROM drives, IOMEGA SCSI/ATAPI ZIP drives, and SCSI JAZ drives.
The x86 BIOS legacy requires a master boot record (MBR) and fdisk table in the first physical sector of the bootable media. If the x86 hard disk contains a Solaris disk label, it is located in the second 512-byte sector of the FDISK partition.
Block-files access the disk using normal buffering mechanism and are read-from and written-to without regard to physical disk records. A raw interface enables direct transmission between the disk and the user's read or write buffer. A single read or write call usually results in a single I/O operation, therefore raw I/O is more efficient when many bytes are transmitted. Block files names are found in /dev/dsk; raw file names are found in /dev/rdsk.
I/O requests to the raw device must be aligned on a 512-byte (DEV_BSIZE) boundary and all I/O request lengths must be in multiples of 512 bytes. Requests that do not meet these requirements will trigger an EINVAL error. There are no alignment or length restrictions on I/O requests to the block device.
A CD-ROM disk is single-sided and contains approximately 640 megabytes of data or 74 minutes of audio. When the CD-ROM is opened, the eject button is disabled to prevent manual removal of the disk until the last close() is called. No volume label is required for a CD-ROM. The disk geometry and partitioning information are constant and never change. If the CD-ROM contains data recorded in a Solaris-supported file system format, it can be mounted using the appropriate Solaris file system support.
DVD-ROM media can be single or double-sided and can be recorded upon using a single or double layer structure. Double-layer media provides parallel or opposite track paths. A DVD-ROM can hold from between 4.5 Gbytes and 17 Gbytes of data, depending on the layer structure used for recording and if the DVD-ROM is single or double-sided.
When the DVD-ROM is opened, the eject button is disabled to prevent the manual removal of a disk until the last close() is called. No volume label is required for a DVD-ROM. If the DVD-ROM contains data recorded in a Solaris-supported file system format, it can be mounted using the appropriate Solaris file system support.
ZIP/JAZ media provide varied data capacity points; a single JAZ drive can store up to 2 GBytes of data, while a ZIP-250 can store up to 250MBytes of data. ZIP/JAZ drives can be read-from or written-to using the appropriate drive.
When a ZIP/JAZ drive is opened, the eject button is disabled to prevent the manual removal of a disk until the last close() is called. No volume label is required for a ZIP/JAZ drive. If the ZIP/JAZ drive contains data recorded in a Solaris-supported file system format, it can be mounted using the appropriate Solaris file system support.
Each device maintains I/O statistics for the device and for partitions allocated for that device. For each device/partition, the driver accumulates reads, writes, bytes read, and bytes written. The driver also initiates hi-resolution time stamps at queue entry and exit points to enable monitoring of residence time and cumulative residence-length product for each queue.
Not all device drivers make per-partition IO statistics available for reporting. sd and ssd(4D) per-partition statistics are enabled by default but may be disabled in their configuration files.
Based on the implementation of SCSI FMA phase III, the sd driver is able to send out FMA telemetries (ereports) when detecting an error condition. The ereports detail what is happening at the kernel driver level.
Ereports (error reports) are generated upon the detection of an abnormal condition, recorded in persistent storage (for example a file system) in binary format, and used as input to automated diagnosis engines.
An ereport is described by its event class (hierarchy path) and a payload of name-value pairs that can be used for diagnosis and logging.
Six new ereports are introduced by SCSI FMA:
Media error
Device error
SCSI command status error
Unexpected data error
SCSI command recovered from a failure
SCSI command transport error
There are many payloads along with these ereports. For analyzing problems, ENA and driver-assessment are quite useful.
ENA (error numeric association) is used in SCSI FMA as a link for a sequence of related ereports. For example, a command retried several times that finally succeeds would result in a sequence of posted ereports that are associated by the same ENA value.
The driver-assessment value is used to indicate the action the driver is going to take. Usually this value is helpful for the administrator to analyze what happened to a specific SCSI command at the kernel level. The table in the “driver-assessment values” section below lists the available values of driver-assessment.
Other useful payloads for analyzing SCSI FMA reports are listed in the following table.
|
The following utilities are useful for inspecting details of ereports:
A fault management log viewer. The FMA framework maintains two categories of logs: one for faults and another for ereports. Using fmdump you can see the detail of a specific pattern of ereports and also the fault list produced by the diagnosis engine.
A tool for fault management configuration. It provides many functions, some of them quite frequently used, including viewing the faulty system component and resolving a fault.
Both these utilities need to be run with specific authorizations or rights profiles - see their manual pages for details. The following table lists some example usage of these utilities:
|
The following table lists the available values of driver-assessment:
|
See the Example section of this manual page for examples of ereports.
Refer to dkio(4I), and cdio(4I)
Permission denied
The partition was opened exclusively by another thread
The argument features a bad address
Invalid argument
The device does not support the requested ioctl function
During opening, the device did not exist. During close, the drive unlock failed
The device is read-only
Resource temporarily unavailable
A signal was caught during the execution of the ioctl() function
Insufficient memory
Insufficient access permission
An I/O error occurred. Refer to notes for details on copy-protected DVD-ROM media.
The sd driver can be configured by defining properties in the sd.conf file. The sd driver supports the following properties:
The default value is 1, which allows resetting to occur. Set this value to 0 (zero) to prevent the sd driver from calling scsi_reset(9F) with a second argument of RESET_TARGET when in error-recovery mode. This scsi_reset(9F) call may prompt the HBA driver to send a SCSI Bus Device Reset message. The scsi_reset(9F) call with a second argument of RESET_TARGET may result from an explicit request via the USCSICMD ioctl. Some high-availability multi-initiator systems may wish to prohibit the Bus Device Reset message; to do this, set the allow-error-recovery-reset property to 0.
The default value is 1, which causes partition IO statistics to be maintained. Set this value to zero to prevent the driver from recording partition statistics. This slightly reduces the CPU overhead for IO, minimizes the amount of sar(1) data collected and makes these statistics unavailable for reporting by iostat(8) even though the –p/–P option is specified. Regardless of this setting, disk IO statistics are always maintained.
Controls the binding of the driver to non self-identifying SCSI target optical devices. (See scsi(5)). The default value is 1, which causes sd to bind to DTYPE_OPTICAL devices (as noted in scsi(5)). Setting this value to 0 prevents automatic binding. The default behavior for the SPARC-based sd driver prior to Solaris 9 was not to bind to optical devices.
Boolean type, when set to False, it indicates that the disk does not support power condition field in the START STOP UNIT command.
The supplied value is passed as the qfull-retries capability value of the HBA driver. See scsi_ifsetcap(9F) for details.
The supplied value is passed as the qfull-retry interval capability value of the HBA driver. See scsi_ifsetcap(9F) for details.
In addition to the above properties, some device-specific tunables can be configured in sd.conf using the sd-config-list global property. The value of this property is a list of duplets. The formal syntax is:
sd-config-list = <duplet> [, <duplet> ]* ; where <duplet>:= "<vid+pid>" , "<tunable-list>" and <tunable-list>:= <tunable>[, <tunable> ]*; <tunable> = <name> : <value> The <vid+pid> is the string that is returned by the target device on a SCSI inquiry command. The <tunable-list> contains one or more tunables to apply to all target devices with the specified <vid+pid>. Each <tunable> is a <name> : <value> pair. Supported tunable names are: delay-busy: when busy, nsecs of delay before retry. retries-timeout: retries to perform on an IO timeout. disable-caching: to disable cache, set this boolean property to true.
Configure the behavior for a given device (4k native disk) dealing with misaligned IOs. It can be set to,
0 : Do RMW (READ MODIFY WRITE) with warning message. 1 : Do RMW without warning message. 2 : Do NOT do RMW and return error.
The following warning message is displayed on the console:
Write requests are not aligned to the physical sector size (4096 bytes). Although they are handled through Read-Modify-Write operations, it may result in performance degradation.
There is I/O error statistic for Non-Aligned Writes. You can use the following commands to track these occurrences:
iostat -E
kstat -c device_error -s "Non-Aligned Writes"
Turns RMW support on or off in the sd driver for 512e disks in the emulation mode. Use 0 to turn off and 1 to turn on. Default is off. Emulation mode drive is a disk that has different physical block size and logical block size. This improves the throughputs of some SSDs that have bad RMW performance in firmware.
For optical drives compliant with MMC-3 and supporting the GET EVENT STATUS NOTIFICATION command, this command is used for periodic media state polling, usually initiated by the DKIOCSTATE dkio(4I) ioctl. To disable the use of this command, set this boolean property to false. In that case, either the TEST UNIT READY or zero-length WRITE(10) command is used instead.
SCSI Disk drivers take this value as the physical block size of the disks that do not report valid physical block size. The value must be a power of two. If not specified, DEV_BSIZE(512 bytes) is implied.
Controls the flag that enables logging of FMA events. The default value is 0, which causes sd to disable the ability to print any log for FMA events. Setting this value to 1 enables sd to print FMA events to /var/adm/messages and to the console.
This tunable is intended for special LUNs which internally support deduplication, but are not able to report this feature by SCSI protocol. Setting this tunable will enable the sd driver to report block unmap support to file system. And when the filesystem asks for unmap operation, the appropriate blocks will be rewritten by zeroes. Allowed parameter is non-negative integer. The value specifies the maximum length of rewritten chunk at one time in logical blocks. Zero value (recommended) will cause auto-adjustment. When the tunable is not set, only unmap operations reported by SCSI protocol will be used.
The following is an example of a global sd-config-list property:
sd-config-list = "SUN T4", "delay-busy:6000000000, retries-timeout:6", "SUN StorEdge_3510", "retries-timeout:3";Example 2 Example of an ereport where the driver-assessment value is fail.
Apr 04 2010 01:30:23.464768275 ereport.io.scsi.cmd.disk.dev.uderr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.uderr ena = 0xde0cd54f84201c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@0,0/pci8086,25f8@4/pci108e,286@0/disk@5,0 devid = id1,sd@TSun_____STK_RAID_INT____EA4B6F24 (end detector) driver-assessment = fail op-code = 0x1a cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 stat-code = 0x0 un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page code mismatch 0 un-decode-value __ttl = 0x1 __tod = 0x4bb7cf8f 0x1bb3cd13Example 3 Example of an ereport where the driver-assessment value is fatal.
Jan 20 2011 18:50:16.276742278 ereport.io.scsi.cmd.disk.dev.rqs.merr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.rqs.merr ena = 0xf83e2f0e78101c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci at 0,0/pci8086,340e at 7/pci1000,3080 at 0/iport at f0/disk at w500 0c50010384d1d,0 devid = id1,sd at n5000c50010384d1f (end detector) driver-assessment = fatal op-code = 0x28 cdb = 0x28 0x0 0x9 0xcb 0x6f 0x0 0x0 0x0 0x80 0x0 pkt-reason = 0x0 pkt-state = 0x3f pkt-stats = 0x0 stat-code = 0x2 key = 0x3 asc = 0x11 ascq = 0x0 sense-data = 0xf0 0x0 0x3 0x9 0xcb 0x6f 0x77 0xa 0x0 0x0 0x0 0x0 0x11 0x0 0x81 0x80 0x0 0x9d 0 xdd 0xba lba = 0x9cb6f00 __ttl = 0x1 __tod = 0x4d3883e8 0x107ec086Example 4 Example of an ereport where the driver-assessment value is recovered.
Okt 08 2010 10:51:12.889604904 ereport.io.scsi.cmd.disk.recovered nvlist version: 0 class = ereport.io.scsi.cmd.disk.recovered ena = 0x92500a9c0ca01801 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci <at> 0,0/pci8086,3410 <at> 9/pci1077, 138 <at> 0/fp <at> 0,0/disk <at> w2100001378ac026e,0 devid = id1,sd <at> n2034001378ac026e (end detector) driver-assessment = recovered op-code = 0x8a cdb = 0x8a 0x0 0x0 0x0 0x0 0x3 0x1a 0x49 0x7e 0xa9 0x0 0x0 0x1 0x0 0x0 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 __ttl = 0x1 __tod = 0x4caedb80 0x35064b28Example 5 Example of an ereport where the driver-assessment value is retry.
Jan 09 2012 10:04:31.334477741 ereport.io.scsi.cmd.disk.dev.rqs.derr nvlist version: 0 class = ereport.io.scsi.cmd.disk.dev.rqs.derr ena = 0xc3ca9ccb73e00c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci at 0,0/pci8086,3410 at 9/pci15d9,400 at 0/iport at 80/disk at w5000c50033f5bfb9,0 devid = id1,sd at n5000c50033f5bfbb (end detector) devid = id1,sd at n5000c50033f5bfbb driver-assessment = retry op-code = 0x28 cdb = 0x28 0x0 0x11 0x5d 0x75 0xf9 0x0 0x1 0x0 0x0 pkt-reason = 0x0 pkt-state = 0x37 pkt-stats = 0x0 stat-code = 0x2 key = 0x6 asc = 0x29 ascq = 0x2 sense-data = 0x70 0x0 0x6 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0 0x0 0x29 0x2 0x2 0x0 0x0 0x0 0xdd 0xba __ttl = 0x1 __tod = 0x4f0b2c2f 0x13efb9ad
Driver configuration file
Block files
Raw files
Where:
controller n
SCSI target id n (0-6)
SCSI LUN n (0-7 normally; some HBAs support LUNs to 15 or 32. See the specific manual page for details).
partition n (0-7)
raw files
Where:
Where n=0 the node corresponds to the entire disk.
sar(1), close(2), ioctl(2), lseek(2), read(2), write(2), scsa2usb(4D), ssd(4D), hsfs(4FS), pcfs(4FS), udfs(4FS), cdio(4I), dkio(4I), driver.conf(5), scsi(5), filesystem(7), cfgadm_scsi(8), fdisk(8), fmadm(8), fmdump(8), format(8), iostat(8), scsi_ifsetcap(9F), scsi_reset(9F), scsi_pkt(9S)
ANSI Small Computer System Interface-2 (SCSI-2)
ATA Packet Interface for CD-ROMs, SFF-8020i
Mt.Fuji Commands for CD and DVD, SFF8090v3
Error for Command:<command name> Error Level: Fatal Requested Block: <n> Error Block: <m> Vendor:'<vendorname>' Serial Number:'<serial number>' Sense Key:<sense key name>
The command indicated by <command name> failed. The Requested Block is the block where the transfer started and the Error Block is the block that caused the error. Sense Key, ASC, and ASCQ information is returned by the target in response to a request sense command.
The drive is not ready because no caddy has been inserted.
A REQUEST SENSE command completed with a check condition. The original command will be retried a number of times.
There is a discrepancy between the label and what the drive returned on the READ CAPACITY command.
The request sense data was less than expected.
The REQUEST SENSE command did not transfer any data.
The drive was reserved by another initiator.
The host adapter has failed to transport a command to the target for the reason stated. The driver will either retry the command or, ultimately, give up.
The REQUEST SENSE data included an invalid sense.
<n> The drive is not ready.
A failure to switch back to read mode 1.
The disk label is corrupted.
The disk label is corrupted.
The disk label is corrupted.
The drive returned busy during a number of retries.
The drive is powered down or died
A retry on a Unit Attention condition failed.
The geometry of the drive could not be established.
There was a residue after the command completed normally.
A bp with consistent memory could not be allocated.
A bp with consistent memory could not be allocated.
A bp with consistent memory could not be allocated.
A bp with consistent memory could not be allocated.
Free memory pool exhausted.
Free memory pool exhausted.
Free memory pool exhausted.
The disk label is corrupted.
A packet could not be allocated during dumping.
Drive went offline; probably powered down.
Driver attempted to retry a command and experienced a transport error.
Driver attempted to retry a command and experienced a transport error.
Illegal request size.
Driver attempted to submit a request sense command and failed.
Host adapter driver was unable to accept a command.
Failure to read disk label.
Drive went offline; probably powered down.
DVD-ROM media containing DVD-Video data may follow/adhere to the requirements of content scrambling system or copy protection scheme. Reading of copy-protected sector will cause I/O error. Users are advised to use the appropriate playback software to view video contents on DVD-ROM media containing DVD-Video data.
The sd driver can handle 4096 LUNs on x86 and 32,768 LUNs on SPARC. In order to increase this limit to 32,768 LUNs on x86 and 262,144 LUNs on SPARC, add the following line to /etc/system:
set devt_version = 2
Once this has been done on a system, the devt_version (1 by default) should not be changed back to 1.