Writing Device Drivers for Oracle® Solaris 11.2

Exit Print View

Updated: September 2014
 
 

GLDv3 Capabilities

GLDv3 implements a capability mechanism that allows the framework to query and enable capabilities that are supported by the GLDv3 driver. Use the mc_getcapab(9E)entry point to report capabilities. If a capability is supported by the driver, pass information about that capability, such as capability-specific entry points or flags through mc_getcapab(). Pass a pointer to the mc_getcapab() entry point in the mac_callback structure. See GLDv3 MAC Registration Data Structures for more information about the mac_callbacks structure.

boolean_t mc_getcapab(void *driver_handle, mac_capab_t cap, void *cap_data);

The cap argument specifies the type of capability being queried. The value of cap can be MAC_CAPAB_HCKSUM (hardware checksum offload) , MAC_CAPAB_LSO (large segment offload) or MAC_CAPAB_RINGS. Use the cap_data argument to return the capability data to the framework.

If the driver supports the cap capability, the mc_getcapab() entry point must return B_TRUE. If the driver does not support the cap capability, mc_getcapab() must return B_FALSE.

Example 19-5  The mc_getcapab() Entry Point
static boolean_t
xx_m_getcapab(void *arg, mac_capab_t cap, void *cap_data)
{
        switch (cap) {
        case MAC_CAPAB_HCKSUM: {
                uint32_t *txflags = cap_data;
                *txflags = HCKSUM_INET_FULL_V4 | HCKSUM_IPHDRCKSUM;
                break;
        }
        case MAC_CAPAB_LSO: {
                /* ... */
                break;
        }
			 case MAC_CAPAB_RINGS: {


                /* ... */


                break;


        }
        default:
                return (B_FALSE);
        }
        return (B_TRUE);
}

MAC Rings Capability

The following sections describe the supported capabilities and the corresponding capability data to return.

Rings and Ring Groups Layer–2 Classification

Both transmit and receive hardware rings are DMA channels and can be exposed by device drivers. Rings are associated with ring groups. Receive ring groups are associated with one or more MAC addresses, and all network traffic matching any of the MAC addresses associated with a receive group must be delivered by the NIC through one of the rings of that group. The steering of traffic to the receive ring groups is enabled in hardware through layer-2 classification.

The mapping of receive rings to ring groups can be either dynamic or static. With dynamic ring groups, rings can be moved between the groups, as requested by the framework, thereby dynamically shrinking or growing the size of the groups. However with static ring groups, the rings are statically assigned to the groups and this assignment cannot change.

If a receive group contains more than one ring, the NIC must spread traffic through these rings using a hashing mechanism such as RSS (Receive Side Scaling) allowing multiple connections to be assigned different ring.

Exactly one of the receive groups must be designated as the default group (usually the first group at index 0). The following properties are associated with this receive group :

  • Should have at least one ring.

  • Is assigned to the primary MAC client of the NIC. The primary MAC client is assigned the primary MAC address of the NIC, and is typically IP.

  • Must be used to receive all multicast and broadcast traffic received from the network.

  • If the NIC is placed in promiscuous mode, it must be used to receive all traffic which does not match the MAC addresses assigned to non-default receive groups.

The following points are noteworthy with regards to the hardware implementation of receive rings and receive ring groups:

  • If multiple receive rings are implemented but layer-2 classification is not supported, the hardware should expose a single receive ring group with all the ring belonging to that group to the framework.

  • If layer-2 hardware classification is implemented but RSS is not supported, the hardware should register multiple receive groups, with one ring per group.

  • If both layer-2 hardware classification and RSS are implemented, the hardware should register multiple receive groups with one or more rings per group.

  • If neither layer-2 hardware classification nor RSS is implemented, the hardware should either not advertise a ring capability, or advertise a ring capability with a single pseudo ring and ring group, which can be used to dynamically poll the adapter for traffic.

Registering Rings and Groups Process Overview

Registering rings with the framework involves a process consisting of various calls from the framework to the driver. The following steps describe the registration process :

  1. The framework queries the MAC_CAPAB_RINGS capability of the driver by calling the driver. One call is made for the transmit rings and one call for the receive rings. See MAC_CAPAB_RINGS Capability for more information.

  2. The framework uses the mr_rget(9E) and mr_gget(9E) entry points which are obtained from the previous step, to retrieve information about a specific ring or ring group. See the mr_rget(9E) and mr_gget(9E) man pages for more information.

  3. When the framework wants to use a ring, it starts the ring group with the mgi_start(9E) entry point, and then starts the ring using the mri_start(9E) entry point as advertised in the previous step.

    Traffic can now flow through the rings until they are stopped through the mgi_stop(9E) and mri_stop(9E) entry points.

MAC_CAPAB_RINGS Capability

To obtain information about support for hardware transmit and receive rings, the framework sends MAC_CAPAB_RINGS in the cap argument and expects the information back in the cap_data field, which points to the mac_capab_rings structure.

The framework allocates the mac_capab_rings(9S) structure and sets the mr_type member to MAC_RING_TYPE_RX for receive rings, or MAC_RING_TYPE_TX for transmit rings. The remaining members of the structure mac_capab_rings is then filled by the drivers.

The following fields are defined in the mac_capab_rings structure:

mr_version

Must be set to MAC_RINGS_VERSION_1.

mr_rnum

Number of rings.

mr_gnum

Number of groups.

mr_group_type

The following values are defined:

  • MAC_GROUP_TYPE_DYNAMIC – The group is dynamic.

  • MAC_GROUP_TYPE_STATIC – The group is static.

See Rings and Ring Groups Layer–2 Classification for more information.

mr_gget()

Driver entry point to get more information about ring groups. See mr_gget() Entry Point for more information.

mr_rget()

Driver entry point to get more information about ring. See mr_rget() Entry Point for more information.

mr_gaddring()

Driver entry point to add a ring to a group. See mr_gaddring(9E).

mr_gremring()

Driver entry point to remove a ring from a group. See mr_gremring(9E).

mr_gget() Entry Point

The mr_gget(9E) entry point is invoked by the framework for each valid group indices corresponding to the number of groups which is indicated by mr_gnum parameter. See mr_gget(9E) for more information. After the call to mr_gget(), the group information is returned in the mac_group_info structure by the driver. The structure itself is pre-allocated by the framework and is filled in by the driver.

The following fields are defined in the mac_group_infostructure:

mgi_driver

An opaque driver group handle which is used by the framework in future calls to group entry points.

mgi_count

Number of rings in the group.

mgi_flags

Group flags MAC_GROUP_DEFAULT identifies the group to be a default group. See Rings and Ring Groups Layer–2 Classification for more information.

mgi_start

Group start entry point.

mgi_stop

Group stop entry point.

mgi_addmac

Add unicast MAC address entry point.

mgi_remmac

Remove unicast MAC address entry point.

mgi_addvlan

Entry point to add hardware VLAN filtering, tagging, and stripping of VLAN tags.

mgi_remvlan

Entry point to remove hardware VLAN filtering, tagging, and stripping of VLAN tags.

mgi_setmtu

Set RX group MTU entry point

mgi_getsriov_info

Entry point to retrieve SR-IOV information for the group. See Ring Groups and SR-IOV for more information.

See mac_group_info(9S) and mac_group_info(9E) for detailed information.


Note - mgi_addmac(9E) and mgi_remmac(9E) entry points are used only for the receive groups. The mc_unicst(9E) entry point must be set to NULL whenever device drivers support rings capability.

Note - The mgi_addvlan() entry point performs the following actions:
  • It defines VLAN IDs that must be allowed, for transmission and reception, by the NIC. That is, any tagged packet that is not in the configured list will be dropped.

  • If the MAC_GROUP_VLAN_TRANSPARENT_ENABLE flag is set then it also enables the hardware VLAN tagging and stripping for that particular VLAN ID.


mr_rget() Entry Point

The mr_rget(9E) entry point is invoked by the framework for each valid group and ring indices corresponding to the number of groups which is indicated by mr_gnum and the number of rings which is indicated by mr_rnum as advertised by the call to MAC_CAPAB_RINGS. See mr_rget(9E) for detailed information.

After the call to mr_rget() is completed, the ring information is returned in the mac_ring_info structure by the driver. The structure is pre-allocated by the framework and is filled in by the driver.

The following fields are defined in the mac_ring_info structure:

mri_driver

An opaque driver group handle which is used by the framework in future calls to ring entry points.

mri_start

Ring start entry point.

mri_stop

Ring stop entry point

mri_stat

Ring statistics entry point. See GLDv3 Network Statistics for more information.

mri_tx

Ring transmit entry point. See Transmit Data Path for more information.

mri_poll

Ring poll entry point. Receive Data Path for more information.

mri_intr_ddi_handle

The DDI interrupt handle associate with the interrupt for this ring.

mri_intr_enable(9E)

Enable interrupts on RX rings. See Receive Data Path for more information.

mri_intr_disable(9E)

Disable interrupts on RX rings. Receive Data Path for more information.

See mac_group_info(9S) and mac_ring_info(9S) man pages for detailed information.


Note - mri_tx() must be set for transmit rings only and mri_poll() must be set only for receive rings.

Note - If a driver implements rings capability, then the mc_tx() entry point in the mac_callbacks structure must be set to NULL.
Ring Groups and SR-IOV

The device drivers that are SR-IOV capable use the MAC_CAPAB_RINGS capability to inform the framework that they are SR-IOV capable by implementing the mgi_getsriov_info(9E) group entry point. The PF driver is responsible for implementing this entry point.

After the call to mgi_getsriov_info(9E), the SR-IOV information is returned in the mac_sriov_info structure by the driver. The structure is pre-allocated by the framework and is filled-in by the driver.

The PF (Physical Function) driver instance registers as many transmit and receive ring groups as the number of VFs (Virtual Functions). These ring groups advertised by the PF driver are special and are used to manage the VFs. The ring groups do not have any data flowing through them. They are used to configure unicast MAC address, set MTU, add VLAN filters, remove VLAN filters, remove VLAN hardware, and perform VLAN tagging and stripping for VFs.


Note - The VF driver programs the MAC multicast group that the driver wants to join. The PF driver does not control the programming of these addresses.

The msi_vf_index structure member, set by the PF driver, captures the VF index that corresponds to a ring group. This is the same index used by the device driver when the driver calls the pci_plist_getvf(9F) function.

See Chapter 21, SR-IOV Drivers for detailed information about SR-IOV drivers.

Hardware Checksum Offload

To get data about support for hardware checksum offload, the framework sends MAC_CAPAB_HCKSUM in the cap argument. See Hardware Checksum Offload Capability Information.

To query checksum offload metadata and retrieve the per-packet hardware checksumming metadata when hardware checksumming is enabled, use mac_hcksum_get(9F). See The mac_hcksum_get() Function Flags.

To set checksum offload metadata, use mac_hcksum_set(9F). See The mac_hcksum_set() Function Flags.

See Hardware Checksumming: Hardware and Hardware Checksumming: MAC Layer for more information.

Hardware Checksum Offload Capability Information

To pass information about the MAC_CAPAB_HCKSUM capability to the framework, the driver must set a combination of the following flags in cap_data, which points to a uint32_t. These flags indicate the level of hardware checksum offload that the driver is capable of performing for outbound packets.

HCKSUM_INET_PARTIAL

Partial 1's complement checksum ability

HCKSUM_INET_FULL_V4

Full 1's complement checksum ability for IPv4 packets

HCKSUM_INET_FULL_V6

Full 1's complement checksum ability for IPv6 packets

HCKSUM_IPHDRCKSUM

IPv4 Header checksum offload capability

The mac_hcksum_get() Function Flags

The flags argument of mac_hcksum_get() is a combination of the following values:

HCK_FULLCKSUM

Compute the full checksum for this packet.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

Compute the partial 1's complement checksum based on other parameters passed to mac_hcksum_get(). HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

Compute the IP header checksum.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

The mac_hcksum_set() Function Flags

The flags argument of mac_hcksum_set() is a combination of the following values:

HCK_FULLCKSUM

The full checksum was computed and passed through the value argument.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

The partial checksum was computed and passed through the value argument. HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

The IP header checksum was computed and passed through the value argument.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

Large Segment (or Send) Offload

To query support for large segment (or send) offload, the framework sends MAC_CAPAB_LSO in the cap argument and expects the information back in cap_data, which points to a mac_capab_lso(9S) structure. The framework allocates the mac_capab_lso structure and passes a pointer to this structure in cap_data. The mac_capab_lso structure consists of an lso_basic_tcp_ipv4(9S) structure and an lso_flags member. If the driver instance supports LSO for TCP on IPv4, set the LSO_TX_BASIC_TCP_IPV4 flag in lso_flags and set the lso_max member of the lso_basic_tcp_ipv4 structure to the maximum payload size supported by the driver instance.

Use mac_lso_get(9F) to obtain per-packet LSO metadata. If LSO is enabled for this packet, the HW_LSO flag is set in the mac_lso_get() flags argument. The maximum segment size (MSS) to be used during segmentation of the large segment is returned through the location pointed to by the mss argument. See Large Segment Offload for more information.