Solaris 8 Software Developer Supplement

Chapter 4 SPARC: Driver Hardening Test Harness

The driver hardening test harness is new in the Solaris 8 SPARCTM Platform Edition 4/01 release. For information about how to create a Solaris device driver, see Writing Device Drivers.

The driver hardening test harness is a Solaris device driver development tool. The test harness injects a wide range of simulated hardware faults when the driver under development accesses its hardware. This chapter describes how to configure the test harness, create error-injection specifications (referred to as errdefs), and execute the tests on your device driver.


Note –

For the most current man pages, use the man command. The Solaris 8 Update release man pages include new feature information that is not in the Solaris 8 Reference Manual Collection.


Test Harness Description

Hardened device drivers are resilient to potential hardware faults. You must test the resilience of device drivers as part of the driver development process. This type of testing requires that the driver handle a wide range of typical hardware faults in a controlled and repeatable way. The driver hardening test harness enables driver developers to simulate such hardware faults in software.

The test harness intercepts calls from the driver to various DDI routines, then corrupts the result of the calls as if the hardware had caused the corruption. In addition, the harness allows for corruption of accesses to specific registers as well as definition of more random types of corruption.


Note –

The driver must perform all I/O accesses by using DDI routines to comply with the Solaris DDI/DKI.


The test harness can generate test scripts automatically by tracing all register accesses as well as direct memory access (DMA) and interrupt usage during the running of a specified workload. A script is generated that reruns that workload while injecting a set of faults into each access.

The driver tester must create additional test cases to force the driver down more obscure failure paths. The tester should also remove duplicate test cases from the generated scripts.

The test harness is implemented as a device driver called bofi, which stands for bus_ops fault injection, and two user-level utilities, th_define(1M) and th_manage(1M).

The test harness does the following:

Fault Injection

The driver hardening test harness intercepts and, when requested, corrupts each access a driver makes to its hardware. This section provides information you should understand to create faults to test the resilience of your driver.

Data Access Functions

Solaris devices are managed inside a tree-like structure called the device (devinfo) tree. Each node of the devinfo tree stores information that relates to a particular instance of a device in the system. Each leaf node corresponds to a device driver, while all other nodes are called nexus nodes. Typically, a nexus represents a bus. A bus node isolates leaf drivers from bus dependencies, which enables architecturally independent drivers to be produced.

Many of the DDI functions, particularly the data access functions (DAFs), result in upcalls to the bus nexus drivers. When a leaf driver accesses its hardware, it passes a handle to an access routine. The bus nexus understands how to manipulate the handle and fulfill the request. A DDI-compliant driver only accesses hardware through use of these DDI access routines. The test harness intercepts these upcalls before they reach the specified bus nexus. If the data access matches the criteria that is specified by the driver tester, the access will be corrupted. If the data access does not match the criteria, it is given to the bus nexus to handle in the usual way.

A driver obtains an access handle by using the ddi_map_regs_setup(dip, rset, ma, offset, size, handle) function. The arguments specify which ``offboard'' memory is to be mapped. The driver must use the returned handle when it references the mapped I/O addresses, as handles are meant to isolate drivers from the details of bus hierarchies. Therefore, do not directly use the returned mapped address, ma. Direct use of the mapped address destroys the current and future uses of the DAF mechanism.

For programmed I/O, the suite of DAFs are:

X and repcnt are the number of bytes to be transferred. X is the bus transfer size of 8, 16, 32, or 64 bytes.

DMA has a similar, yet richer, set of DAFs.

Setting Up the Test Harness

The driver hardening test harness is part of the Solaris Developer Cluster and the Entire Distribution Cluster. If you have not installed either of these Solaris clusters, you must manually install the test harness packages appropriate for your platform.

Installing the Test Harness

To install the test harness packages (SUNWftduu, SUNWftdur, and SUNWftdux), use pkgadd(1M).

As superuser, go to the directory in which the packages are located and type:


# pkgadd -d . SUNWftduu SUNWftdur SUNWftdux

Configuring the Test Harness

After the test harness is installed, edit the /kernel/drv/bofi.conf file to configure the harness to interact with your driver. See the following section for descriptions of the test harness properties.

When the harness configuration is complete, reboot the system to load the harness driver.

Test Harness Properties

The test harness behavior is controlled by boot-time properties that were set in the /kernel/drv/bofi.conf configuration file.

When the harness is first installed, enable the harness to intercept the DDI accesses to your driver by setting these properties:

bofi-nexus

Bus nexus type, such as the PCI bus

bofi-to-test

Name of the driver under test

For example, to test a PCI bus network driver called xyznetdrv, set the following property values:


bofi-nexus="pci"
bofi-to-test="xyznetdrv"

Other properties relate to the use and harness checking of the Solaris DDI data access mechanisms for reading and writing from peripherals that use PIO and transferring data to and from peripherals that use DMA.

bofi-range-check

When this property is set, the test harness checks the consistency of the arguments that are passed to PIO DAFs.

bofi-ddi-check

When this property is set, the test harness verifies that the mapped address that is returned by ddi_map_regs_setup() is not used outside of the context of the DAFs.

bofi-sync-check

When this property is set, the test harness verifies correct usage of DMA functions and ensures that the driver makes compliant use of ddi_dma_sync().

Testing the Driver

This section describes how to create and inject faults by using the th_define(1M) and th_manage(1M) commands.

Creating Faults

The th_define(1M) utility provides an interface to the bofi device driver for defining errdefs. An errdef corresponds to a specification for how to corrupt a device driver's accesses to its hardware. The th_define command-line arguments determine the precise nature of the fault to be injected. If the supplied arguments define a consistent errdef, the th_define process stores the errdef with the bofi driver. The process suspends itself until the criteria given by the errdef becomes satisfied. In practice, the suspension ends when the access counts go to zero (0).

Injecting Faults

The test harness operates at the level of data accesses. The characteristics of a data access include the:

The test harness intercepts data accesses and injects appropriate faults into the driver. An errdef, specified by the th_define(1M) command, encodes the following information:

Use the -a acc_chk option to simulate framework faults in an errdef.

Fault-Injection Process

The process of injecting a fault involves two phases:

  1. Create errdefs by using the th_define command.

    Create errdefs by passing test definitions to the bofi driver, which stores the definitions so they can be accessed by using th_manage(1M).

  2. Create a workload, then use th_manage to activate and manage the errdef.

    The th_manage(1M) command is a user interface to the various ioctls that are recognized by the bofi harness driver. th_manage operates at the level of driver names and instances and includes these commands: get_handles to list access handles, start to activate errdefs, and stop to deactivate errdefs.

    The activation of an errdef results in qualifying data accesses to be faulted. The th_manage utility supports these commands: broadcast to provide the current state of the errdef and clear_errors to clear the errdef.

    See th_define(1M) and th_manage(1M) for more information.

Test Harness Warnings

You can configure the test harness to handle warning messages in the following ways:

Use the second method to help pinpoint the root cause of a problem.

When the bofi-range-check property value is set to warn, the harness prints the following messages (or panics if set to panic) when it detects a range violation of a DDI function by your driver:


ddi_getX() out of range addr %x not in %x
ddi_putX() out of range addr %x not in %x
ddi_rep_getX() out of range addr %x not in %x
ddi_rep_putX() out of range addr %x not in %x

X is 8, 16, 32, or 64.

When the harness has been requested to insert over 1000 extra interrupts, the following message is printed if the driver does not detect interrupt jabber:


undetected interrupt jabber - %s %d

Using Scripts to Automate the Test Process

You can create fault-injection test scripts by using the logging access type of the th_define utility:


# th_define -n name -i instance -a log [-e fixup_script]

th_define takes the instance offline and brings it back online. Then th_define runs the workload that is described by the fixup script and logs I/O accesses that are made by the driver instance.

The fixup script is called twice with the set of optional arguments—once just before the instance is taken offline and again after the instance has been brought online. The following variables are passed into the environment of the called executable:

DRIVER_PATH

Device path of the instance

DRIVER_INSTANCE

Instance number of the driver

DRIVER_UNCONFIGURE

Set to 1 when the instance is about to be taken offline

DRIVER_CONFIGURE

Set to 1 when the instance has just been brought online

Typically, the fixup script ensures that the device under test is in a suitable state to be taken offline (unconfigured) or in a suitable state for error injection (for example, configured, error free, and servicing a workload). A minimal script for a network driver could be:


#!/bin/ksh
driver=xyznetdrv
ifnum=$driver$DRIVER_INSTANCE
 
if [[ $DRIVER_CONFIGURE = 1 ]]; then
   ifconfig $ifnum plumb	
   ifconfig $ifnum ...	
   ifworkload start $ifnum
elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then	
   ifworkload stop $ifnum	
   ifconfig $ifnum down	
   ifconfig $ifnum unplumb
fi
exit $?

Note –

ifworkload should initiate the workload as a background task. The fault injection occurs after the fixup script configures the driver under test and brings it online (DRIVER_CONFIGURE is set to 1).


If the -e fixup_script option is present, it must be the last option on the command line. However, if that option is not present, a default script is used. The default script repeatedly attempts to bring the device under test offline and online. Thus the workload consists of the driver's attach and detach paths.

The resulting log is converted into a set of executable scripts that are suitable for running unassisted fault-injection tests. These scripts are created in a subdirectory of the current directory with the name driver.test.id. The scripts inject faults, one at a time, into the driver while running the workload that is described by the fixup script.

The driver tester has substantial control over the errdefs that are produced by the test automation process. See th_define(1M).

If the tester chooses a suitable range of workloads for the test scripts, the harness gives good coverage of the hardening aspects of the driver. However, to achieve full coverage, the tester might need to create additional test cases manually. Add these cases to the test scripts. To ensure that testing completes in a timely manner, the tester might need to manually delete duplicate test cases.

Automated Test Process

The process for automated testing follows.

  1. Identify the aspects of the driver to be tested.

    Test all aspects of the driver that interact with the hardware:

    • Attach and detach

    • Plumb and unplumb under a stack

    • Normal data transfer

    • Documented debug modes

    A separate workload script (fixup_script) must be generated for each mode of use.

  2. For each mode of use, prepare an executable program (fixup_script) that configures and unconfigures the device, and creates and terminates a workload.

  3. Run th_define with the errdefs, together with an access type of -a log.

  4. Wait for the logs to fill.

    The logs contain a dump of the bofi driver's internal buffers. This data is included at the front of the script.

    Because it can take from a few seconds to several minutes to create the logs, use the th_manage broadcast command to check the progress.

  5. Change to the created test directory and run the master test script.

    The master script runs each generated test script in sequence. Separate test scripts are generated per register set.

  6. Store the results for analysis.

    Successful test results, such as success (corruption reported) and success (corruption undetected), show that the driver under test is behaving properly.

    It is fine for a few test not triggered failures to appear in the output. However, several such failures indicate that the test is not working properly. These failures can appear when the driver does not access the same registers as when the test scripts were generated.

  7. Run the test on multiple instances of the driver concurrently to test the multithreading of error paths.

    For example, each th_define command creates a separate directory that contains test scripts and a master script:


    # th_define -n xyznetdrv -i 0 -a log -e script
    # th_define -n xyznetdrv -i 1 -a log -e script
    

    Once created, run the master scripts in parallel.


    Note –

    The generated scripts produce only simulated fault injections that are based on what was logged during the time the logging errdef was active. When you define a workload, ensure that the required results are logged. Also analyze the resulting logs and fault-injection specifications. Verify that the hardware access coverage that the resulting test scripts created is what is required.