Writing Device Drivers

Appendix D DDI Interfaces for Cluster-Aware Drivers

The device node types supported by the Solaris operating environment can be divided roughly into two categories: physical and pseudo devices. This categorization is important when the device nodes are created and used by SunTM Cluster.

The concept of device classes and the necessary interface modifications and additions are introduced in this release of the Solaris operating environment so that device driver writers can adopt the new interfaces for use with future versions of Sun Cluster. The device classes will not have an impact on base Solaris operation because they are ignored by the base kernel without Sun Cluster software installed.

Device Classification

Sun Cluster introduces a new device classification scheme. These new classifications are based on the extended behavior of the devices in a Sun Cluster environment.

Enumerated Devices

ENUMERATED_DEV

Node Specific Devices

NODESPECIFIC_DEV

Global Devices

GLOBAL_DEV

Node Bound Devices

NODEBOUND_DEV

The ddi_create_minor_node(9F) routine has been enhanced to add the capability of reporting the additional device classification of the device minor nodes created by the device driver. The device node class is specified using the flag parameter. If the device class is not indicated, the default class for pseudo devices will be NODESPECIFIC_DEV and for physical devices will be ENUMERATED_DEV. These device classes do not effect the creation of the device node in a non-clustered environment; but they are required for device drivers intended for use in a clustered environment. The device categories are described in the following sections.

Enumerated Devices

Enumerated devices are physical devices with a one-to-one correspondence between a particular device node and a host where that device node is present. Examples of this category include various disk and tape devices, such as /dev/dsk/c0t0d0s0 and /dev/rmt/0l. Nearly all physical devices belong to this category. This is the default category for all non-pseudo devices.

Node Specific Devices

Node specific devices include devices that report particular information about the host where the device node is opened. An example of such a device is the /dev/kmem device. Opening this device gives access to host-specific information on the local host. Administrative pseudo device nodes used in configuring or gathering information about a particular device driver also fit this category. The Sun Cluster software ensures the creation of two user device nodes for each instance of a kernel device node in the cluster, so that the intended device node can be accessed both locally and remotely.

Global Devices

Global devices are node invariant pseudo devices such as /dev/ip. In principle, the open instance of a device, such as ip or tcp, does not depend on which host, in the cluster, the open occurs. A single copy of each device is in the kernel. All device I/O requests for this device class are performed locally and the device node can be accessed by a remote host within the cluster. This is the default behavior for all pseudo devices in the system.

Node Bound Devices

A node bound device is a pseudo device that maintains a cluster-wide state. This device should, in principle, be opened on one node only. Devices such as /dev/ticotsord belong to this class (see the ticotsord(7D) man page). Highly available devices with automatic fail-over also belong to this class. Only one pseudo node is present but all opens are directed to the same node, with the exception of HA devices, where the hosting node might change, transparent to the device user.

Minor Number Space Management

dev_t consists of a major and a minor number space. The major number space is managed by base Solaris and the minor number space is managed by the device driver space. With Sun Cluster, the minor number behaves differently within the user space and the kernel space.

Sun Cluster preserves the assumption that two equal dev_ts point to the same device regardless of the host where the process is executed. This model satisfies the expectations of programs that depend on this feature to establish the equivalence of two devices. Sun Cluster introduces a dual view of minor numbers and the necessary interfaces to implement this dual view. In-kernel dev_ts correspond to the major number of the driver in addition to the minor number that the driver has created using ddi_create_minor_node(9F). External minor numbers (viewed from the user space) are managed and assigned unique cluster-wide numbers by the device configuration manager in Sun Cluster.

This dual numbering scheme has one unfortunate side effect, namely that a particular minor number created in the kernel can result in creation of a different minor number in the user space. This discrepancy might be unexpected by user space programs that expect to be able to ascertain some device characteristics from the minor number pattern.

An example of the discrepancy is the use of minor number bit patterns in specifying the particular slice of a disk or the density of a tape device. This class of problems is primarily alleviated by the use of globally unique instance numbers. By encoding the instance number of a device in the minor, the driver can guarantee the creation of cluster-wide unique dev_t values; this avoids minor numbers that do not have the same value between the kernel and the user space.

All dev_t values that are passed in through the standard Solaris entry points, such as open(9E), close(9E) and ioctl(9E), encode the kernel minor number. The getminor(9F) interface can be used to extract this minor number. However, if the dev_t value is passed as a part of the ioctl data from the user space, the dev_t value has the minor number from the user space encoded. A new DDI interface, ddi_getiminor(9F), has been introduced to ensure that the driver can map between internal and external minor numbers.

Device Interfaces

The following interface sets up a driver and prepares it for use:

int ddi_create_minor_node(dev_info_t *dip, char *name, 
            int spec_type, int minor_num, char *node_type, int flag);

ddi_create_minor_node(9F) advertises a minor device node, which will eventually appear in the /devices directory and refer to the device specified by dip. If the device is a clone device, then flag is set to CLONE_DEV. If it is not a clone device, then flag is set to 0. For device drivers intended for use in a clustered environment, flag must specify the device node class of GLOBAL_DEV, NODEBOUND_DEV, NODESPECIFIC_DEV, or ENUMERATED_DEV.

The following new interface is used to translate between user-visible device numbers and in kernel device numbers:

minor_t ddi_getiminor(dev_t dev);

ddi_getiminor(9F) extracts the minor number as a device number.