Solaris 7 Software Developer Supplement

DDI Interfaces for Cluster-Aware Drivers

This feature was new in the Solaris 7 3/99 software release.

The device node types supported by the Solaris operating environment can be divided into two categories: physical and pseudo devices. This categorization is important when the device nodes are created and used by SunTM Cluster.

The concept of device classes and the necessary interface modifications and additions are introduced in the 3/99 release of the Solaris operating environment so that device driver writers can adopt the new interfaces for use with future versions of Sun Cluster. The device classes will not have an impact on Solaris operation because they are ignored by the base kernel without Sun Cluster software installed.

For more information regarding device drivers, see Writing Device Drivers.

Device Classification

Sun Cluster introduces four new device classes. These new classifications are based on the extended behavior of the devices in a Sun Cluster environment.

Enumerated Devices

ENUMERATED_DEV

Node Specific Devices

NODESPECIFIC_DEV

Global Devices

GLOBAL_DEV

Node Bound Devices

NODEBOUND_DEV

The ddi_create_minor_node(9F) routine has been enhanced to add the capability of reporting the additional device classification of the device minor nodes created by the device driver. The device categories are described in the following sections.

Enumerated Devices

Enumerated Devices are physical devices with a one-to-one correspondence between a particular device node and a host where that device node is present. Examples of this category include various disk and tape devices, such as /dev/dsk/c0t0d0s0 and /dev/rmt/0l. Nearly all physical devices belong to this category. This is the default category for all non-pseudo devices.

Node Specific Devices

Node Specific Devices include devices that report particular information about the host where the device node is opened. An example of such a device is the /dev/kmem device. Opening this device gives access to host-specific information on the local host. Administrative pseudo device nodes used in configuring or gathering information about a particular device driver also fit this category. The Sun Cluster software ensures the creation of two user device nodes for each instance of a kernel device node in the cluster, so that the intended device node can be accessed both locally and remotely.

Global Devices

Global Devices are node invariant pseudo devices such as /dev/ip. In principle, the open instance of a device, such as ip or tcp, does not depend on which host, in the cluster, the open occurs. A single copy of each device is in the kernel. All device I/O requests for this device class are performed locally and the device node can be accessed by a remote host within the cluster. This is the default behavior for all pseudo devices in the system.

Node Bound Devices

A node bound device is a pseudo device that maintains a cluster-wide state. This device should, in principle, be opened on one node only. Devices such as /dev/ticotsord belong to this class. Highly available devices with automatic fail-over also belong to this class. Only one pseudo node is present but all opens are directed to the same node, with the exception of HA devices, where the hosting node might change transparently to the device user.

Minor Number Space Management

dev_t consists of a major and a minor number space. Major number space is managed by Solaris and the minor number space is managed by the device driver space. With Sun Cluster, the minor number behaves differently within the user space and the kernel space.

Cluster Wide dev_t

For historical reasons each device node, in addition to its path, is identified by an integral type dev_t. The dev_t is a part of the system interface expected by programmers and system administrators. stat(2) system calls and backup utilities deal directly with dev_ts. dev_t is also a programming interface for device driver writers.

Sun Cluster preserves the assumption that two equal dev_ts point to the same device regardless of the host where the process is executed. This model satisfies the expectations of programs that depend on this feature to establish the equivalence of two devices. Sun Cluster introduces a dual view of minor numbers and the necessary interfaces to implement this dual view. In kernel dev_ts correspond to the major number of the driver in addition to the minor number that the driver has created using ddi_create_minor_node(9F). External minor numbers (viewed from the user space) are managed and assigned unique cluster-wide numbers by the device configuration manager in Sun Cluster.

This dual numbering scheme has one unfortunate side effect, namely that a particular minor number created in the kernel can result in creation of a different minor number in the user space. This discrepancy might be unexpected by user space programs that expect to be able to ascertain some device characteristics from the minor number pattern.

An example of the discrepancy is the use of minor number bit patterns in specifying the particular slice of a disk or the density of a tape device. This class of problems is primarily alleviated by the use of globally unique instance numbers. By encoding the instance number of a device in the minor, the driver can guarantee the creation of cluster-wide unique dev_t values; this avoids minor numbers that do not have the same value between the kernel and the user space.

All dev_t values that are passed in through the standard Solaris entry points such as open, close and ioctl, encode the kernel minor number. The getminor(9F) interface can be used to extract this minor number. However, if the dev_t value is passed as a part of the ioctl data from the user space, the dev_t value has the minor number from the user space encoded. A new DDI interface, ddi_getiminor(9F), has been introduced to ensure that the driver can map between internal and external minor numbers.

Device Interfaces

The following interface sets up a driver and prepares it for use:

int ddi_create_minor_node(dev_info_t *dip, char *name, 
			int spec_type, int minor_num, char *node_type, int flag);

ddi_create_minor_node(9F) advertises a minor device node, which will eventually appear in the /devices directory and refer to the device specified by dip. If the device is a clone device, then flag is set to CLONE_DEV. If it is not a clone device, then flag is set to 0. For device drivers intended for use in a clustered environment, flag must specify the device node class of GLOBAL_DEV, NODEBOUND_DEV, NODESPECIFIC_DEV, or ENUMERATE_DEV.

The following new interface is used to translate between user-visible device numbers and in kernel device numbers:

minor_t ddi_getiminor(dev_t dev);

ddi_getiminor(9F) extracts the minor number as a device number.