Go to main content

Writing Device Drivers in Oracle® Solaris 11.4

Exit Print View

Updated: November 2020
 
 

Tuning Drivers

The Oracle Solaris OS provides kernel statistics structures so that you can implement counters for your driver. The trace facility enables you to analyze performance in real time. This section presents the following topics on device performance:

  • Kernel Statistics – The Oracle Solaris OS provides a set of data structures and functions for capturing performance statistics in the kernel. Kernel statistics (called stats) enable your driver to export continuous statistics while the system is running. The stat data is handled pro grammatically by using the stat functions.

  • Trace for Dynamic Instrumentation – trace enables you to add instrumentation to your driver dynamically so that you can perform tasks like analyzing the system and measuring performance. trace takes advantage of predefined stat structures.

Kernel Statistics

The kstat facility enables performance tuning. The stat facility provides a set of functions and data structures for device drivers and other kernel modules to export module-specific kernel statistics.

A stat is a data structure for recording quantifiable aspects of a device's usage. A stat is stored as a null-terminated linked list. Each stat has a common header section and a type-specific data section. The header section is defined by the kstat2_t structure.

Optional Kernel Statistics

There are certain kstats which might not be practical or desirable to permanently collect. Some kstats are useful only for short-term diagnosis of a problem, and collection of some kstats can impose a significant performance penalty. The kstats framework provides a facility for a driver to register optional kstats which can be enabled and disabled by a user of the system rather than being enabled all the time.


Note -  Managing the optional kstats require the PRIV_KSTAT2_MANAGE privilege and the solaris.smf.manage.kstats authorization.
Registering Optional Kstats

To make a kstat optional, you must first register the kstat by using the kstat2_register_optional() function in your module code. This function registers the specified kstat with the kernel kstats framework. After the kstat is registered, you can manage the kstat by using the kstat2adm command-line utility.

Example 146  Registering Optional Kstats

This example shows how to register an optional kstat.

#include <sys/kstat2.h>

/* Declare the mydrv:latency_kstats kstat*/
   static const char *mydrv_optional_kstats_name = "mydrv:latency_kstats";
   static const char *mydrv_optional_kstats_desc = "Latency kstats for mydrv operations";

      static int mydrv_opt_kstats_cb(zoneid_t, kstat2_dyn_cb_type_t, void *);

      static struct mydrv_priv_s {
              int instance;
              kmutex_t mydrv_opt_kstats_lck;
      }mydrv_priv;

      static kstat2_opt_id_t mydrv_opt_ksid;

      int_init(void)
      {
           int rc = modinstall(&modlinkage);
           zoneid_t thiszone = getzoneid();

           /* Register the mydrv:latency_kstats kstat*/
           mydrv_opt_ksid = kstat2_register_optional(
               mydrv_optional_kstats_name, mydrv_optional_kstats_desc,
               mydrv_optional_kstats_cb, (void *)&mydrv_priv);

           if (mydrv_opt_ksid == -1) {
                   /*
                    * Registration failed. A message will have been
                    * reported by the kstats framework.
                    */
           }
           return (rc);
      }

Registration of optional kstats associates a user-visible identifier with a callback function in the driver. The callback is triggered by a user running the kstat2adm command or another utility which calls the kstat2_optional_set_state() API. The callback function is called to enable (create) or disable (delete) the optional kstats or return them to their default state. The default state is usually disabled. The callback can also be called to query the current enabled state of the optional kstats.

Example 147  Optional Kstats Callback Handling

This example shows how optional kstats might be implemented in a driver.

#
include < sys / kstat2.h >

static kstat2_t * mydrv_opt_kstat1 = NULL;
static kstat2_t * mydrv_opt_kstat2 = NULL;

/*
 * When called to enable kstats, creates them in the zone
 * passed in as an argument.
 */
static kstat2_opt_estate_t
mydrv_opt_kstats_cb(zoneid_t zid, kstat2_dyn_cb_type_t op, void * arg) {
    struct mydrv_priv_s * priv = (struct mydrv_priv_s * ) arg;
    kstat2_opt_estate_enabled rc = KSTAT2_OPT_ESTATE_ERROR;

    /*
     * Enter a mutex to prevent multiple calls to this fn
     * messing with the kstats at the same time.
     */
    mutex_enter( & priv - > mydrv_opt_kstats_lck);

    switch (op) {
    case KSTAT2_OPT_CB_ENABLE:
        if (mydrv_opt_kstat1 == NULL) {
            mydrv_opt_kstat1 = kstat2_create(zid, ...);
            if (mydrv_opt_kstat1 != NULL) {
                mydrv_opt_kstat1 - > ks2_priv = priv;
                kstat2_install(mydrv_opt_kstat1);
                rc = KSTAT2_OPT_ESTATE_ENABLED;
            } else {
                /* Report the failure */
                rc = KSTAT2_OPT_ESTATE_ERROR;
            }
        } else {
            /*
             * Optional kstats already exist, so the
             * return state should indicate they are
             * enabled.
             */
            rc = KSTAT2_OPT_ESTATE_ENABLED;
        }
        break;

    case KSTAT2_OPT_CB_DISABLE:
        if (mydrv_opt_kstat1 != NULL) {
            kstat2_delete(mydrv_opt_kstat1);
            mydrv_opt_kstat1 = NULL;
        }
        rc = KSTAT2_OPT_ESTATE_DISABLED;
        break;

    case KSTAT2_OPT_CB_QUERY:
        if (mydrv_opt_kstat1 == NULL) {
            /*
             * The default state of these optional kstats
             * is disabled, so tag the query result
             * accordingly.
             */
            rc = KSTAT2_OPT_ESTATE_DISABLED |
                KSTAT2_OPT_ESTATE_DEFAULT;
        } else {
            rc = KSTAT2_OPT_ESTATE_ENABLED;
        }
        break;

    default:
        rc = KSTAT2_OPT_ESTATE_ERROR;
        break;
    }

    mutex_exit( & priv - > mydrv_opt_kstats_lck);
    return (rc);
}

To enable and disable requests, the callback returns the actual state of the optional kstats in the specified zone. If the driver is unable to enable or disable the kstats, the callback function returns KSTAT2_OPT_ESTATE_ERROR. The driver can choose to report the nature of the error through the FMA framework.

When the callback request type is KSTAT2_OPT_CB_QUERY and the returned state is the optional kstat's default state, the return value is tagged with KSTAT2_OPT_ESTATE_DEFAULT.

Enabling and Disabling the Optional Kstats

You can enable and disable an optional kstat by using the kstat2_enable_optional() and kstat2_disable_optional() functions.

Example 148  Enabling Optional Kstats

This example shows how to enable an optional kstat.

kstat2_handle_t handle;
kstat2_status_t st;
st = kstat2_open( & handle, NULL);
...
st = kstat2_optional_set_state(handle, id, KSTAT2_OPT_ESTATE_ENABLED);
if (st == KSTAT2_S_OK) {
    printf("Optional kstats for %s enabled\n", id);
} else if (st == KSTAT2_S_NO_PERM) {
    fprintf(stderr, "You do not have permission to change "
        "optional kstat id %s\n", id);
} else if (st == KSTAT2_S_INVAL_ARG) {
    fprintf(stderr, "%s: no such optional kstat id\n", id);
} else {
    fprintf(stderr, "Failed to set enabled state for "
        "optional kstats \"%s\": %s\n", id, strerror(errno));
}
Example 149  Disabling Optional Kstats

This example shows how to disable an optional kstat.

kstat2_handle_t handle;
kstat2_status_t st;
st = kstat2_open( & handle, NULL);
...
st = kstat2_optional_set_state(handle, id, KSTAT2_OPT_ESTATE_DISABLED);
if (st == KSTAT2_S_OK) {
    printf("Optional kstats for %s disabled\n", id);
} else if (st == KSTAT2_S_NO_PERM) {
    fprintf(stderr, "You do not have permission to change "
        "optional kstat id %s\n", id);
} else if (st == KSTAT2_S_INVAL_ARG) {
    fprintf(stderr, "%s: no such optional kstat id\n", id);
} else {
    fprintf(stderr, "Failed to set disabled state for "
        "optional kstats \"%s\": %s\n", id, strerror(errno));
}
Unregistering the Optional Kstats

Before you unload a device-driver, the optional kstats created by the driver must be unregistered and deleted. To unregister the optional kstats use the kstat2_unregister_optional() function with the id returned from the earlier call to the kstat2_register_optional() function. Typically both the unregistering and deletion of any active kstats is done in the _fini(9E) function of the driver before unload.

For more information, see the kstat2_unregister_optional(9F) and kstat2_register_optional(9F) man pages.

Managing Optional Kstats Using the kstat2adm Utility

The kstat2adm command-line utility enables you to manage the optional kstats. You can list, enable, or disable the registered optional kstats. The state of the optional kstats persists across reboots.

Example 150  Listing the Optional Kstats
# kstat2adm
IDENTIFIER              DEFAULT   STATE     PSTATE   DESCRIPTION
nfs:v2_op_latency       disabled  disabled  -        NFSv2 server op latency
nfs:v3_op_latency       disabled  disabled  disabled NFSv3 server op latency
nfs:v41_op_latency      disabled  disabled  -        NFSv4.1 server op latency
nfs:v4_op_latency       disabled  enabled   enabled  NFSv4 server op latency
Example 151  Enabling Optional Kstats

This example shows how to enable the nfs:v4_op_latency and nfs:v41_op_latency kstats.

# kstat2adm enable nfs:v4_op_latency nfs:v41_op_latency
nfs:v4_op_latency kstats enabled
nfs:v41_op_latency kstats enabled
Example 152  Disabling Optional Kstats

This example shows how to disable the nfs:v4_op_latency and nfs:v41_op_latency kstats.

# kstat2adm disable nfs:v4_op_latency nfs:v41_op_latency
nfs:v4_op_latency kstats disabled
nfs:v41_op_latency kstats disabled

For more information, see the kstat2adm(8) man page.

Optional Kstats in Zones

When optional kstats are registered, the registration is visible in all zones. Therefore, the kstat2adm command always lists all registered optional kstats, regardless of whether the caller is in global zone or non-global zone. However, you can enable or disable of optional kstats specific to a zone. The kernel kstats framework passes the current zone id to the driver's registered callback, allowing the driver to determine whether or not to enable the kstats based on the zone id. The callback ignores the zone id parameter, if the driver's kstat is zone-agnostic.

If a driver has enabled the optional kstats in a non-global zone and that zone is shutdown, such kstats are automatically deleted by the kstats framework.

Optional Kstats Persistence

The state of the optional kstat is persisted across reboots and driver unload or reload, if the state is set by the kstat2adm command or the kstat2_optional_set_state() function. The persistence is handled by SMF through the kstat2adm service. To clear any persistent state for optional kstats, use the kstat2adm command.

Example 153  Clear Persistent State of Optional Kstats
# kstat2adm default nfs:v4_op_latency
nfs:v4_op_latency kstats reset to default

Kernel Statistics Structure Members

The members of a stat structure are:

kstat_t ks1

Older kstat(9S) data.

ks2_path

Unique path to kstat.

ks2_update

Update function, which is used to update the kstat's data and set the size.

ks2_id

kstat ID.

ks2_type

kstat data type.

ks2_flags

kstat flags.

kstat_data

kstat type-specific data.

ks2_ndata

Number of data records.

ks2_data_size

Size of kstat data section.

ks2_private

For private use by drivers.

ks2_lock

Holds a lock on kstat.

Kernel Statistics Structures

The structures for the different kinds of stats are:

kstat2(9S)

Each kernel statistic (stat) that is exported by device drivers consists of a header section and a data section. The kstat2 structure is the header portion of the statistic.

kstat2_intr(9S)

Structure for interrupt stats. The types of interrupts are:

  • Hard interrupt – Sources from the hardware device itself.

  • Soft interrupt – Induced by the system through the use of some system interrupt source.

  • Watchdog interrupt – Induced by a periodic timer call.

  • Spurious interrupt – An interrupt entry point was entered but there was no interrupt to service.

  • Multiple service – An interrupt was detected and serviced just prior to returning from any of the other types.

Drivers generally report only claimed hard interrupts and soft interrupts from their handlers, but measurement of the spurious class of interrupts is useful for auto-vectored devices to locate any interrupt latency problems in a particular system configuration. Devices that have more than one interrupt of the same type should use multiple structures.

kstat2_io(9S)

Structure for I/O stats.

kstat2_named(9S)

Structure for named stats. A named stat is an array of name-value pairs. These pairs are kept in the kstat2_named structure.

Kernel Statistics Functions

The functions for using stats are:

kstat2_create(9F)

Allocate and initialize a kstat2(9S) structure.

kstat2_delete(9F)

Remove a stat from the system.

kstat2_create_with_template(9F)

Create a kstat using a template.

kstat2_create_histogram(9F)

Create a histogram kstat.

kstat2_install(9F)

Mark a fully initialized kstat as ready to read.

kstat2_nv_init(9F)

Initialize a named kstat.

kstat2_nv_setstr(9F)

Initialize a named stat with a string value.

kstat2_nv_setstrs(9F)

Initialize a named stat with an array of strings.

kstat2_nv_setints(9F)

Initialize a named stat with an array of integers.

kstat_queue(9F)

A large number of I/O subsystems have at least two basic queues of transactions to be managed. One queue is for transactions that have been accepted for processing but for which processing has yet to begin. The other queue is for transactions that are actively being processed but are not yet done. For this reason, two cumulative time statistics are kept: wait time and run time. Wait time is prior to service. Run time is during the service. The kstat_queue() family of functions manages these times based on the transitions between the driver wait queue and run queue:

Kernel Statistics for Oracle Solaris Ethernet Drivers

The stat interface described in the following table is an effective way to obtain Ethernet physical layer statistics from the driver. Ethernet drivers should export these statistics to guide users in better diagnosis and repair of Ethernet physical layer problems. With exception of link_up, all statistics have a default value of 0 when not present. The value of the link_up statistic should be assumed to be 1.

The following example gives all the shared link setup. In this case Mei is used to filter statistics.

stat ce:0:mii:link_*
Table 26  Ethernet MII/GMII Physical Layer Interface Kernel Statistics
Stat Variable
Type
Description
xcvr_addr
KSTAT_DATA_UINT32
Provides the MEI address of the transceiver that is currently in use.
  • (0) - (31) are for the MEI address of the physical layer device in use for a given Ethernet device.

  • (-1) is used where there is no externally accessible MEI interface, and therefore the MEI address is undefined or irrelevant.

xcvr_id
KSTAT_DATA_UINT32
Provides the specific vendor ID or device ID of the transceiver that is currently in use.
xcvr_inuse
KSTAT_DATA_UINT32
Indicates the type of transceiver that is currently in use. The IEEE aPhytType enumerates the following set:
  • (0) other undefined

  • (1) no MEI interface is present, but no transceiver is connected

  • (2) 10 Mbits/s Clause 7 10 Mbits/s Manchester

  • (3) 100BASE-T4 Clause 23 100 Mbits/s 8B/6T

  • (4) 100BASE-X Clause 24 100 Mbits/s 4B/5B

  • (5) 100BASE-T2 Clause 32 100 Mbits/s PAM5X5

  • (6) 1000BASE-X Clause 36 1000 Mbits/s 8B/10B

  • (7) 1000BASE-T Clause 40 1000 Mbits/s 4D-PAM5

This set is smaller than the set specified by ifMauType, which is defined to include all of the above plus their half duplex/full duplex options. Since this information can be provided by the cap_* statistics, the missing definitions can be derived from the combination of xcvr_inuse and cap_* to provide all the combinations of ifMayType.
cap_1000fdx
KSTAT_DATA_CHAR
Indicates the device is 1 Gbits/s full duplex capable.
cap_1000hdx
KSTAT_DATA_CHAR
Indicates the device is 1 Gbits/s half duplex capable.
cap_100fdx
KSTAT_DATA_CHAR
Indicates the device is 100 Mbits/s full duplex capable.
cap_100hdx
KSTAT_DATA_CHAR
Indicates the device is 100 Mbits/s half duplex capable.
cap_10fdx
KSTAT_DATA_CHAR
Indicates the device is 10 Mbits/s full duplex capable.
cap_10hdx
KSTAT_DATA_CHAR
Indicates the device is 10 Mbits/s half duplex capable.
cap_asmpause
KSTAT_DATA_CHAR
Indicates the device is capable of asymmetric pause Ethernet flow control.
cap_pause
KSTAT_DATA_CHAR
Indicates the device is capable of symmetric pause Ethernet flow control when cap_pause is set to 1 and cap_asmpause is set to 0. When cap_asmpause is set to 1, cap_pause has the following meaning:
  • cap_pause = 0 Transmit pauses based on receive congestion.

  • cap_pause = 1 Receive pauses and slow down transmit to avoid congestion.

cap_rem_fault
KSTAT_DATA_CHAR
Indicates the device is capable of remote fault indication.
cap_autoneg
KSTAT_DATA_CHAR
Indicates the device is capable of auto-negotiation.
adv_cap_1000fdx
KSTAT_DATA_CHAR
Indicates the device is advertising 1 Gbits/s full duplex capability.
adv_cap_1000hdx
KSTAT_DATA_CHAR
Indicates the device is advertising 1 Gbits/s half duplex capability.
adv_cap_100fdx
KSTAT_DATA_CHAR
Indicates the device is advertising 100 Mbits/s full duplex capability.
adv_cap_100hdx
KSTAT_DATA_CHAR
Indicates the device is advertising 100 Mbits/s half duplex capability.
adv_cap_10fdx
KSTAT_DATA_CHAR
Indicates the device is advertising 10 Mbits/s full duplex capability.
adv_cap_10hdx
KSTAT_DATA_CHAR
Indicates the device is advertising 10 Mbits/s half duplex capability.
adv_cap_asmpause
KSTAT_DATA_CHAR
Indicates the device is advertising the capability of asymmetric pause Ethernet flow control.
adv_cap_pause
KSTAT_DATA_CHAR
Indicates the device is advertising the capability of symmetric pause Ethernet flow control when adv_cap_pause is set to 1 and adv_cap_asmpause is set to 0. When adv_cap_asmpause is set to 1, adv_cap_pause has the following meaning:
  • adv_cap_pause = 0 Transmit pauses based on receive congestion.

  • adv_cap_pause = 1 Receive pauses and slow down transmit to avoid congestion.

adv_rem_fault
KSTAT_DATA_CHAR
Indicates the device is experiencing a fault that it is going to forward to the link partner.
adv_cap_autoneg
KSTAT_DATA_CHAR
Indicates the device is advertising the capability of auto-negotiation.
lp_cap_1000fdx
KSTAT_DATA_CHAR
Indicates the link partner device is 1 Gbits/s full duplex capable.
lp_cap_1000hdx
KSTAT_DATA_CHAR
Indicates the link partner device is 1 Gbits/s half duplex capable.
lp_cap_100fdx
KSTAT_DATA_CHAR
Indicates the link partner device is 100 Mbits/s full duplex capable.
lp_cap_100hdx
KSTAT_DATA_CHAR
Indicates the link partner device is 100 Mbits/s half duplex capable.
lp_cap_10fdx
KSTAT_DATA_CHAR
Indicates the link partner device is 10 Mbits/s full duplex capable.
lp_cap_10hdx
KSTAT_DATA_CHAR
Indicates the link partner device is 10 Mbits/s half duplex capable.
lp_cap_asmpause
KSTAT_DATA_CHAR
Indicates the link partner device is capable of asymmetric pause Ethernet flow control.
lp_cap_pause
KSTAT_DATA_CHAR
Indicates the link partner device is capable of symmetric pause Ethernet flow control when lp_cap_pause is set to 1 and lp_cap_asmpause is set to 0. When lp_cap_asmpause is set to 1, lp_cap_pause has the following meaning:
  • lp_cap_pause = 0 Link partner will transmit pauses based on receive congestion.

  • lp_cap_pause = 1 Link partner will receive pauses and slow down transmit to avoid congestion.

lp_rem_fault
KSTAT_DATA_CHAR
Indicates the link partner is experiencing a fault with the link.
lp_cap_autoneg
KSTAT_DATA_CHAR
Indicates the link partner device is capable of auto-negotiation.
link_asmpause
KSTAT_DATA_CHAR
Indicates the link is operating with asymmetric pause Ethernet flow control.
link_pause
KSTAT_DATA_CHAR
Indicates the resolution of the pause capability. Indicates the link is operating with symmetric pause Ethernet flow control when link_pause is set to 1 and link_asmpause is set to 0. When link_asmpause is set to 1 and is relative to a local view of the link, link_pause has the following meaning:
  • link_pause = 0 This station will transmit pauses based on receive congestion.

  • link_pause = 1 This station will receive pauses and slow down transmit to avoid congestion.

link_duplex
KSTAT_DATA_CHAR
Indicates the link duplex.
  • link_duplex = 0 Link is down and duplex is unknown.

  • link_duplex = 1 Link is up and in half duplex mode.

  • link_duplex = 2 Link is up and in full duplex mode.

link_up
KSTAT_DATA_CHAR
Indicates whether the link is up or down.
  • link_up = 0 Link is down.

  • link_up = 1 Link is up.

Trace for Dynamic Instrumentation

Trace is a comprehensive dynamic tracing facility for examining the behavior of both user programs and the operating system itself. With trace, you can collect data at strategic locations in your environment, referred to as probes. Trace enables you to record such data as stack traces, time stamps, the arguments to a function, or simply counts of how often the probe fires. Because trace enables you to insert probes dynamically, you do not need to recompile your code. For more information about trace, see the Oracle Solaris 11.4 DTrace (Dynamic Tracing) Guide.