Go to main content

Oracle® Solaris 11.4 DTrace (Dynamic Tracing) Guide

Exit Print View

Updated: November 2018
 
 

Network and Network Service Protocol Providers

This section lists all the network and network services protocol providers.


Note - IP addresses in this guide conform to RFC 5737 (https://tools.ietf.org/html/rfc5737), IPv4 Address Blocks Reserved for Documentation and rfc 3849 (https://tools.ietf.org/html/rfc3849), IPv6 Address Prefix Reserved for Documentation. IPv4 addresses used in this documentation are blocks 192.0.2.0/24, 198.51.100.0/24, and 203.0.113.0/24. IPv6 addresses have prefix 2001:DB8::/32.

To show a subnet, the block is divided into multiple subnets by borrowing enough bits from the host to create the required subnet. For example, host address 192.0.2.0 might have subnets 192.0.2.32/27 and 192.0.2.64/27.


icmp Provider

The icmp provider provides probes for tracing the Internet Control Message Protocol (ICMP).

ICMP Probes

The icmp probes are:

send

Probe that fires whenever ICMP sends a message.

receive

Probe that fires whenever ICMP receives a message.

The send and receive probes cover ICMP messages on IP interfaces, and cover both IPv4 and IPv6 ICMP traffic.

ICMP Probe Arguments

The argument types for the icmp probes are listed in the table below. The arguments are described in the following section.

Table 70  icmp Probe Arguments
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
NULL
icmpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
NULL
icmpinfo_t *
ICMP pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
ICMP csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection. For ICMP, the connection information is populated for ICMP errors received, which are sent to a specific connection. Other ICMP data received such as ICMP echo requests/replies is not directed to a specific connection, so the connection ID is 0. For outbound ICMP data, the pid and zoneid are specified but the connection ID (cid) is 0.

typedef struct csinfo {
        uintptr_t cs_addr;
        uint64_t cs_cid;
        pid_t cs_pid;
        zoneid_t cs_zoneid;
 } csinfo_t;

The following table specifies the csinfo_t members:

Table 71  List of ICMP csinfo_t Members
Member
Description
cs_addr
Address of translated ip_xmit_attr_t *.
cs_cid
Connection ID. A unique per-connection identifier, which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ICMP ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

The following table specifies the ipinfo_t members:

Table 72  List of ICMP ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4, this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ICMP icmpinfo_t Structure

The icmpinfo_t structure is a DTrace translated version of the information contained in the various forms of ICMP and ICMP6 headers.

typedef struct icmpinfo {
        uint8_t icmp_version;
        uint8_t icmp_type;
        uint8_t icmp_code;
        uint16_t icmp_checksum;
        uint32_t icmp_address_mask;     
        uint16_t icmp_echo_id;
        uint16_t icmp_echo_seq;
        uint32_t icmp_param_problem_ptr;
        uint32_t icmp_pmtu_update;
        uint8_t icmp_radv_num_addrs;
        uint16_t icmp_radv_lifetime;
        uint32_t *icmp_radv_addrs;
        string icmp_redirect_gateway;
        uint32_t icmp_timestamp_otime;
        uint32_t icmp_timestamp_rtime;
        uint32_t icmp_timestamp_ttime;
        string icmp6_mld_addr;
        uint8_t icmp6_mld_v2_num_mars;
        uintptr_t icmp6_mld_v2_mars;
        string icmp6_nd_target;
        string icmp6_nd_redirect_destination;
        uint32_t icmp6_nd_radv_reachable;
        uint32_t icmp6_nd_radv_retransmit;
        uint8_t icmp6_rr_segnum;
        uint8_t icmp6_rr_flags;
        uint16_t icmp6_rr_maxdelay;
        /* Original data that triggered ICMP error - NULL/0 if not ICMP error */
        ipha_t *icmp_error_ip_hdr;    /* Orig. IP hdr for ICMP error */
        ip6_t *icmp_error_ip6_hdr;    /* Orig. IPv6 hdr for ICMP error */
        uint16_t icmp_error_sport;
        uint16_t icmp_error_dport;
        struct icmp *icmp_hdr;
        icmp6_t *icmp6_hdr;
} icmpinfo_t;
Table 73  List of ICMP icmpinfo_t Members
Member
Description
icmp_type
ICMP/ICMPv6 message type.
icmp_code
ICMP/ICMPv6 message code.
icmp_checksum
Checksum of ICMP header and payload.
icmp_hdr
Pointer to raw ICMP header at the time of tracing.
icmp6_hdr
Pointer to raw ICMPv6 header at the time of tracing.
icmp_address_mask
ICMP address mask reply.
icmp_echo_id
ICMP echo request/response ID.
icmp_echo_seq
ICMP echo request/response sequence number.
icmp_param_problem_ptr
Offset of parameter in original datagram that caused the ICMP/ICMPv6 parameter problem.
icmp_pmtu_update
Path MTU update for ICMP "destination unreachable/needs fragmentation" and ICMPv6 "packet too big".
icmp_radv_num_addrs
Number of ICMP router advertisements to follow.
icmp_radv_lifetime
Lifetime of router advertisements.
icmp_radv_addrs
Pointer to router advertisements.
icmp_redirect_gateway
Gateway for ICMP redirect.
icmp_timestamp_otime
Originating time set in ICMP timestamp request - number of seconds since 0:00 UT.
icmp_timestamp_rtime
Receive time of timestamp request set in ICMP timestamp response.
icmp_timestamp_ttime
Transmit time of timestamp reply sent in response to ICMP timestamp request.
icmp6_mld_addr
ICMP6 Multicast layer discovery address.
icmp6_mld_v2_num_mars
Number of ICMPv6 Multicast Address Records.
icmp6_mld_v2_mars
Pointer to first ICMPv6 Multicast Address Record.
icmp6_nd_target
ICMPv6 neighbor discovery target.
icmp6_nd_redirect_destination
ICMPv6 neighbor discovery redirect destination.
icmp6_nd_radv_reachable
ICMPv6 neighbor discovery router advertisement reachable.
icmp6_nd_radv_retransmit
ICMPv6 neighbor discovery router advertisement.
icmp6_rr_segnum
ICMPv6 router renumbering segment number.
icmp6_rr_flags
ICMPv6 router renumbering flags.
icmp6_rr_maxdelay
ICMPv6 router renumbering maximum delay.
icmp_error_ip_hdr
Original IP hdr for ICMP/ICMPv6 error.
icmp_error_ip6_hdr
Original IPv6 hdr for ICMP/ICMPv6 error.
icmp_error_sport
Original layer 4 source port for ICMP/ICMPv6 error.
icmp_error_dport
Original layer 4 destination port for ICMP/ICMPv6 error.
icmp_hdr
Pointer to original ICMP header, NULL for ICMP6.
icmp6_hdr
Pointer to original ICMPv6 header, NULL for ICMP.

For more information about the various ICMP/ICMPv6 message header formats, see RFC 792, 1256, and 2463.

Using the ICMP Provider

The following examples show the usage of simple the icmp provider.

ICMP Errors Sent by Remote Host/Port

This DTrace one-liner monitors ICMP errors sent, by aggregating the originating host and target port:

# dtrace -n 'icmp:::send / args[4]->icmp_error_dport != 0 / { @[args[2]->ip_daddr, args[4]->icmp_error_dport] = count(); }'
dtrace: description 'icmp:::send ' matched 9 probes
^C

  203.0.113.5                                           517               21

#

The output shows 21 ICMP errors sent in response to packets sent from the system to the local port 517. These UDP packets are generated by running the talk() program when the associated inetd service is not enabled. The ICMP errors are destination unreachable/port unreachable errors, and this script can be used to catch UDP service connection refusal.

ICMP Packets by Process

This DTrace one-liner counts ICMP sent/received packets by process:

# dtrace -n 'icmp:::send,icmp:::receive { @[args[1]->cs_pid] = count(); }'
dtrace: description 'icmp:::send,icmp:::receive ' matched 20 probes
^C

   100961                1
   100965                1
   100968                1

These represent 3 ICMP messages received in response to ping - U requests.

Count Events by ICMP

This DTrace one-liner count events by ICMP:

# dtrace -n 'icmp:::send,icmp:::receive { @i[icmp_type_string[args[4]->icmp_type]] = count(); }'
dtrace: description 'icmp:::send,icmp:::receive ' matched 25 probes
^C

 Echo reply                                                        7
 Echo request                                                      7

ICMP Stability

The icmp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 74  Stability Mechanism for the icmp Provider
Element
Name stability
Data stability
Dependency class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

igmp Provider

The igmp provider provides probes for tracing the Internet Group Management Protocol (IGMP).

igmp Probes

The igmp probes are:

send

Probe that fires whenever IGMP sends a message.

receive

Probe that fires whenever IGMP receives a message.

igmp Probe Arguments

The argument types for the igmp probes are listed in the table below. The arguments are described in the following section.

Table 75  List of igmp Probe Arguments
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
NULL
igmpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
NULL
igmpinfo_t *
IGMP pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
IGMP csinfo_t Structure

The csinfo_t structure is where connection state information is made available. For IGMP, the connection ID and process ID fields are unused since IGMP events are not tied to particular processes or connections.

typedef struct csinfo {
        uintptr_t cs_addr;
        uint64_t cs_cid;
        pid_t cs_pid;
        zoneid_t cs_zoneid;
 } csinfo_t;
Table 76  List of IGMP csinfo_t Members
Member
Description
cs_addr
Address of translated ill_t *.
cs_cid
Connection ID. Unused for IGMP.
cs_pid
Process ID. Unused for IGMP.
cs_zoneid
Zone ID associated with the connection.
IGMP ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;
Table 77  List of IGMP ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6. For IGMP, the version number is 4 since it is an IPv4-only protocol.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad.
IGMP igmpinfo_t Structure

The igmpinfo_t structure is a DTrace translated version of the information contained in the various forms of IGMP header.

typedef struct igmpinfo {
        uint8_t igmp_type;
        uint8_t igmp_code;
        uint16_t igmp_checksum;
        string igmp_group_addr;
        uint8_t igmp_query_version;
        uint8_t igmp_v3_query_max_response_time;
        uint16_t igmp_v3_query_num_sources;
        uintptr_t igmp_v3_query_sources;
        uint16_t igmp_v3_report_num_records;
        struct grphdr *igmp_v3_report_records;
        struct igmp *igmp_hdr;
} igmpinfo_t;
Table 78  List of IGMP igmpinfo_t Members
Member
Description
igmp_type
IGMP message type.
igmp_code
IGMP message code.
igmp_checksum
Checksum of IGMP header and payload.
igmp_group_addr
String representation of IGMP multicast group address. Not valid for IGMPv3 membership reports.
igmp_query_version
For an IGMP membership query, this field is set to 1, 2, or 3. Otherwise 0.
igmp_v3_query_max_response_time
Valid for an IGMPv3 membership query.
igmp_v3_query_num_sources
Number of IPv4 addresses specifying sources for group-and-source queries.
igmp_v3_query_sources
Array of IPv4 addresses specifying sources for group-and-source queries.
igmp_v3_report_num_records
Number of IGMPv3 group records specifying sources for group-and-source queries.
igmp_v3_report_records
Array of IGMPv3 group records specifying group and sources for group-and-source reports.
igmp_hdr
Pointer to raw IGMP header at time of tracing.

For more information about the various IGMP message header formats, see RFCs 1112, 2236, and 3376.

Monitoring IGMP Traffic by Zone

The following is an example of how to monitor IGMP traffic by zone.

This DTrace one-liner monitors IGMP traffic sent and received by type.

# dtrace -n 'igmp:::send,igmp:::receive { @[igmp_type_string[args[4]->igmp_type], probename, args[1]->cs_zoneid] = count(); }'
dtrace: description 'igmp:::send,igmp:::receive ' matched 3 probes

^C

  membership query             receive                0                1
  v1 membership report         receive                0                1
  v1 membership report         send                   0                1

In this example, the global zone with the zoneid 0, you receive has one membership query and one membership report, and sent has one membership report.

IGMP Stability

The igmp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 79  Stability Mechanism for the igmp Provider
Element
Name stability
Data stability
Dependency class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

ip Provider

The ip provider provides probes for tracing both IPv4 and IPv6 protocols.

ip Probes

The ip probes are described in the following table.

Table 80  List of ip Probes
Probe
Description
send
Fires when the kernel network stack sends an ip packet.
receive
Fires when the kernel network stack receives an ip packet.
drop-in
Fires when the kernel network stack drops an incoming ip packet.
drop-out
Fires when the kernel network stack drops an outgoing ip packet.
address-add
IP address added to the system.
address-delete
IP address removed from the system.
address-state-change
IP interface state (flags) change.
route-add
IP route added.
route-change
IP route changed.
route-delete
IP route removed.
route-losing
IP route is failing or Kernel suspects partitioning.
route-miss
Route lookup failed on this address.
route-redirect
Use different route.

These probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by ip. An IP packet must have a full IP header to be visible by these probes.


Note -  Loopback tcp packets on Oracle Solaris may be processed by tcp fusion, a performance feature that by-passes the ip layer. These fused packets will not be visible using the ip:::send and ip:::receive probes. They are typically all loopback tcp packets after the tcp handshake.

ip Probe Arguments

The argument types for the ip probes are listed in the table below. The arguments are described in the following section.

Table 81  ip Probe Arguments
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
args[5]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
drop-in
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
drop-out
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
address-add
NULL
csinfo_t *
NULL
NULL
ipaddrinfo_t*
address-delete
NULL
csinfo_t *
NULL
NULL
ipaddrinfo_t*
address-state-change
NULL
csinfo_t *
NULL
NULL
ipaddrinfo_t*
route-add
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
route-change
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
route-delete
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
route-losing
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
route-miss
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
route-redirect
NULL
csinfo_t *
NULL
NULL
routeinfo_t *
pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state information can be made available if connection IDs become supported by the kernel in the future.

The cs_addr member is currently always NULL.

typedef struct csinfo {
        uintptr_t cs_addr;              /* currently always NULL */
} csinfo_t;
ipinfo_t Structure

The ipinfo_t structure contains common IP information for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;
Table 82  ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr.
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ifinfo_t Structure

The ifinfo_t structure contains network interface information.

typedef struct ifinfo {
        string if_name;                /* interface name */
        int8_t if_local;               /* is delivered locally */
        netstackid_t if_ipstack;       /* ipstack ID */
        uintptr_t if_addr;             /* pointer to raw ill_t */
} ifinfo_t;
Table 83  ifinfo_t Members
Member
Description
if_name
Interface name as a string. For example, "eri0", "lo0", "ip.tun0", "<unknown>".
if_local
Is-local status. 1: is a local interface, 0: is not a local interface, -1: is unknown.
if_ipstack
ipstack ID, for associating ip stack instances, or NULL.
if_addr
Pointer to raw kernel structure for advanced debugging only.

The ifinfo_t details are provided for debugging convenience in the ip layer, if that information is available. There may be some types of traffic where some or all of that information is not available during the ip layer, for which the members may be: "<null>", -1, NULL, NULL.

ipv4info_t Structure

The ipv4info_t structure is a DTrace translated version of the IPv4 header.

typedef struct ipv4info {
        uint8_t ipv4_ver;               /* IP version (4) */
        uint8_t ipv4_ihl;               /* header length, bytes */
        uint8_t ipv4_tos;               /* type of service field */
        uint16_t ipv4_length;           /* length (header + payload) */
        uint16_t ipv4_ident;            /* identification */
        uint8_t ipv4_flags;             /* IP flags */
        uint16_t ipv4_offset;           /* fragment offset */
        uint8_t ipv4_ttl;               /* time to live */
        uint8_t ipv4_protocol;          /* next level protocol */
        string ipv4_protostr;           /* next level protocol, as a string */
        uint16_t ipv4_checksum;         /* header checksum */
        ipaddr_t ipv4_src;              /* source address */
        ipaddr_t ipv4_dst;              /* destination address */
        string ipv4_saddr;              /* source address, string */
        string ipv4_daddr;              /* destination address, string */
        ipha_t *ipv4_hdr;               /* pointer to raw header */
} ipv4info_t;
Table 84  ipv4info_t Members
Member
Description
ipv4_ver
IP version 4.
ipv4_ihl
IPv4 header length, in bytes.
ipv4_tos
Contents of IPv4 type of service field.
ipv4_length
IPv4 packet length (header + payload) at time of tracing, in bytes.
ipv4_ident
IPv4 identification field.
ipv4_flags
IPv4 flags. See the ipv4_flags table below for bitwise values.
ipv4_offset
IPv4 fragment offset, in bytes.
ipv4_ttl
IPv4 time to live.
ipv4_protocol
IPv4 encapsulated protocol number. See /usr/include/netinet/in.h for the protocol list (IPPROTO_*).
ipv4_protostr
IPv4 encapsulated protocol, as a string. For example, "TCP".
ipv4_checksum
IPv4 header checksum, if available at time of tracing.
ipv4_src
IPv4 source address, as an ipaddr_t.
ipv4_dst
IPv4 destination address, as an ipaddr_t.
ipv4_saddr
IPv4 source address, as a dotted decimal quad string.
ipv4_daddr
IPv4 destination address, as a dotted decimal quad string.
ipv4_hdr
Pointer to raw IPv4 header at the time of tracing.

See RFC 791 for a detailed explanation for these IPv4 header fields. If the packet is IPv6, these members are either "<null>", 0, or NULL depending on type.

Table 85  ipv4_flags Values
Value
Description
IPH_DF
Do not fragment
IPH_MF
More fragments
ipaddrinfo_t Structure

The ipaddrinfo_t structure is a translated version of the ifa_msghdr_t associated with the routing socket message associated with the IP address event.

typedef struct ipaddrinfo {
        string ipaddr_ifname;
        int ipaddr_ifflags;
        uint16_t ipaddr_ifindex;
        int ipaddr_metric;
        int ipaddr_addresses;
        string ipaddr_address;
        string ipaddr_netmask;
        string ipaddr_broadcast;
        uintptr_t ipaddr_addr;
} ipaddrinfo_t;
routeinfo_t Structure

The routeinfo_t structure represents a translated version of the rt_msghdr associated with the routing socket event.

typedef struct routeinfo {
        uint8_t route_version;
        uint8_t route_type;
        uint16_t route_ifindex;
        int route_flags;
        int route_seq;
        pid_t route_pid;
        int route_metrics;
        /* Metric values */
        uint32_t route_mtu_metric;
        uint32_t route_hopcount_metric;
        uint32_t route_expire_metric;
        uint32_t route_recvpipe_metric;
        uint32_t route_sendpipe_metric;
        uint32_t route_ssthresh_metric;
        uint32_t route_rtt_metric;
        uint32_t route_rtt_variance_metric;
        uint32_t route_packets_sent;
        /* Addresses etc in message */
        int route_addresses;
        string route_destination_address;
        string route_gateway_address;
        string route_netmask_address;
        string route_source_address;
        string route_redirect_author_address;
        rt_msghdr_t *route_addr;
} routeinfo_t;
ipv6info_t Structure

The ipv6info_t structure is a DTrace translated version of the IPv6 header.

typedef struct ipv6info {
        uint8_t ipv6_ver;               /* IP version (6) */
        uint8_t ipv6_tclass;            /* traffic class */
        uint32_t ipv6_flow;             /* flow label */
        uint16_t ipv6_plen;             /* payload length */
        uint8_t ipv6_nexthdr;           /* next header protocol */
        string ipv6_nextstr;            /* next header protocol, as a string*/
        uint8_t ipv6_hlim;              /* hop limit */
        in6_addr_t *ipv6_src;           /* source address */
        in6_addr_t *ipv6_dst;           /* destination address */
        string ipv6_saddr;              /* source address, string */
        string ipv6_daddr;              /* destination address, string */
        ip6_t *ipv6_hdr;                /* pointer to raw header */
} ipv6info_t;
Table 86  ipv6info_t Members
Member
Description
ipv6_ver
IP version 6.
ipv6_tclass
IPv6 traffic class.
ipv6_plen
IPv6 payload length at time of tracing, in bytes.
ipv6_nexthdr
IPv6 next header protocol number. See /usr/include/netinet/in.h for the protocol list (IPPROTO_*).
ipv6_nextstr
IPv6 next header protocol, as a string. For example, "TCP".
ipv6_hlim
IPv6 hop limit.
ipv6_src
IPv6 source address, as an in6_addr_t.
ipv6_dst
IPv6 destination address, as an in6_addr_t.
ipv6_saddr
IPv6 source address, as an RFC 1884 convention 2 string with lower case hexadecimal digits.
ipv6_daddr
IPv6 destination address, as an RFC 1884 convention 2 string with lower case hexadecimal digits.
ipv6_hdr
Pointer to raw IPv6 header at the time of tracing.

See RFC 2460 for a detailed explanation for these IPv6 header fields. If the packet is IPv4, these members are either "<null>", 0, or NULL depending on type.

Using the ipProvider

Some simple examples of ip provider usage follow.

Counting Received Packets by Host Address

This DTrace one-liner counts received packets by host address:

# dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'ip:::receive ' matched 4 probes
^C

  192.0.2.5/27                                                       1
  192.0.2.20/27                                                     4
  fe80::214:4fff:fe3b:76c8                                          9
  127.0.0.1                                                        14
  192.0.2.25/27                                                    28

The output above shows that 28 IP packets were received from 192.0.2.25/27, 14 IP packets from 127.0.0.1, and so on.

Sent Size Distribution

This DTrace one-liner prints distribution plots of sent payload size by destination:

# dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'ip:::send ' matched 11 probes
^C

  192.0.2.35/27                                      
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          7        
              32 |@@@@                                     1        
              64 |@@@@                                     1        
             128 |                                         0        

  192.0.2.25/27                                     
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@                                    5        
              32 |@@@                                      3        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@               24       
             128 |@                                        1        
             256 |@                                        1        
             512 |@@                                       2        
            1024 |@                                        1        
            2048 |                                         0       
Using ipio.d

The following DTrace script traces IP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %10s %15s    %15s %8s %6s\n", "CPU", "DELTA(us)",
            "SOURCE", "DEST", "INT", "BYTES");
        last = timestamp;
}

ip:::send
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s -> %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_saddr, args[2]->ip_daddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

ip:::receive
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s <- %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_daddr, args[2]->ip_saddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

This example output shows tracing packets as they pass in and out of tunnels:

# ./ipio.d
 CPU  DELTA(us)          SOURCE               DEST      INT  BYTES
   1     598913    203.0.113.1 ->   192.0.2.55/27  ip.tun0     68
   1         73   192.0.2.3/27 ->     192.0.2.1/27     nge0    140
   1      18325   192.0.2.3/27 <-     192.0.2.1/27     nge0    140
   1         69    203.0.113.1 <-   192.0.2.55/27  ip.tun0     68
   0     102921    203.0.113.1 ->   192.0.2.55/27  ip.tun0     20
   0         79   192.0.2.3/27 ->     192.0.2.1/27     nge0     92

The fields printed are:

Field
Description
CPU
CPU ID that event occurred on
DELTA(us)
Elapsed time since previous event
SOURCE
Source IP address
DEST
Destination IP address
INT
Interface
BYTES
Payload bytes

Note -  The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; keep an eye on changes in the CPU column, or add a timestamp column and post sort.
ipproto.d for IP Traffic Summary

This DTrace script provides a neat summary for both send and receive IP traffic, including the next level protocol:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

ip:::send,
ip:::receive
{
        this->protostr = args[2]->ip_ver == 4 ?
            args[4]->ipv4_protostr : args[5]->ipv6_nextstr;
        @num[args[2]->ip_saddr, args[2]->ip_daddr, this->protostr] = count();
}

dtrace:::END
{
        printf("   %-28s %-28s %6s %8s\n", "SADDR", "DADDR", "PROTO", "COUNT");
        printa("   %-28s %-28s %6s %@8d\n", @num);
}

This script was run on a system with both IPv4 and IPv6 interfaces for several seconds:

# ./ipproto.d 
Tracing... Hit Ctrl-C to end.
^C
   SADDR                      DADDR                       PROTO    COUNT
   192.0.2.3/27               192.0.2.40/27               UDP      1
   192.0.2.3/27               192.0.38/27                 UDP      1
   192.0.2.3/27               192.0.130/27                UDP      1
   192.0.2.3/27               192.0.2.5/27                UDP      1
   192.0.2.3/27               192.0.2.35/27               ICMP     1
   192.0.2.20/27              192.0.70/27                 UDP      1
   192.0.2.5/27               192.0.2.3/27                UDP      1
   192.0.2.35/27              192.0.2.3/27                ICMP     1
   fe80::214:4fff:fe3b:76c8   ff02::1                     ICMPV6   1
   fe80::2e0:81ff:fe5e:8308   fe80::214:4fff:fe3b:76c8    ICMPV6   1
   fe80::2e0:81ff:fe5e:8308   ff02::1:2                   UDP      1
   192.0.2.10/27              192.0.2.31/27               UDP      2
   192.0.2.12/27              192.0.2.31/27               UDP      3
   192.0.2.14/27              192.0.2.3/27                TCP      428
   192.0.2.16/27              192.0.2.14/27               TCP      789

The following table describes the fields that are printed.

Field
Description
SADDR
Source IP address
DADDR
Destination IP address
PROTO
IP next level protocol
COUNT
Number of packets

The example output above provides a quick summary of network activity with host address details; you can see that both 192.0.2.14/27 and 192.0.2.16/27 are swapping many packets via TCP.

Diagnosing Route Flaps

You can diagnose route flaps by aggregating the count of add or remove route by IP address.

# dtrace -qn 'ip:::route-add,ip:::route-delete { @events[args[4]->route_destination_address] = count(); }'
^C

203.0.113.5                              2

We can also print additional information from the routeinfo_t structure such as process ID to show the responsible process.

ip Stability

The ip provider uses stability mechanism of DTrace to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 87  Stability Mechanism for the ip Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

iscsi Provider

The iscsi provider provides probes for tracing iSCSI target activity.

This is a kernel provider built into the COMSTAR iSCSI target port provider. The COMSTAR iSCSI target and the user-land iSCSI target /usr/sbin/iscsitgtd are mutually exclusive. Only one of the targets can be enabled at a time. The COMSTAR iSCSI target DTrace provider provides all the probes that are provided by the user-land iSCSI provider, so that any DTrace script written for the userland provider built into the iSCSI target daemon (iscsitgtd) will work with the COMSTAR iSCSI target port provider as well without any modification. Since this provider instruments the iSCSI target activity, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.

iscsi Probes

SCSI Event
Probes
SCSI command/response
iscsi:::scsi-commandiscsi:::scsi-response
Data out/in/request (rtt)
iscsi:::data-sendiscsi:::data-receiveiscsi:::data-request
Login and logout command/response
iscsi:::login-commandiscsi:::login-responseiscsi:::logout-commandiscsi:::logout-response
NOP out/in (pings)
iscsi:::nop-receiveiscsi:::nop-send
Text and task command/response
iscsi:::task-commandiscsi:::task-responseiscsi:::text-commandiscsi:::text-response
Asynchronous message from target
iscsi:::async-send
Buffer dispatch and completion (not available with the USDT provider)
iscsi:::xfer-startiscsi:::xfer-done

iscsi Probe Arguments

Table 88  Probe Arguments for the iscsi Provider
Probes
Variable
Type
Description
*
args[0]
conninfo_t *
Connection information
*
args[1]
iscsiinfo_t *
Common iSCSI properties
scsi-command
args[2]
scsicmd_t *
SCSI command block (cdb)
xfer-startxfer-done
args[2]
xferinfo_t *
Buffer information

COMSTAR iSCSI Argument Types

All COMSTAR iSCSI target probes have the first and second argument in common:

  • args[0] conninfo_t * connection information

  • conninfo_t

typedef struct conninfo {
        string ci_local;        /* local host IP address */
        string ci_remote;       /* remote host IP address */
        string ci_protocol;     /* protocol ("ipv4", "ipv6") */
} conninfo_t;
The conninfo_t structure is used by NFSv4 provider, 
Fibre Channel provider and is intended for use by all
application protocol providers as the first argument 
to indicate some basic information about the connection.
  • args[1] iscsiinfo_t * common iSCSI properties

  • iscsiinfo_t

typedef struct iscsiinfo {
	string ii_target;	/* target iqn */
	string ii_initiator;	/* initiator iqn */
        string ii_isid;		/* initiator session identifier */
	string ii_tsih;		/* target session identifying handle */
	string ii_transport;	/* transport type ("iser-ib", "sockets") */

	uint64_t ii_lun;	/* target logical unit number */

	uint32_t ii_itt;	/* initiator task tag */
	uint32_t ii_ttt;	/* target transfer tag */

	uint32_t ii_cmdsn;	/* command sequence number */
	uint32_t ii_statsn;	/* status sequence number */
	uint32_t ii_datasn;	/* data sequence number */

	uint32_t ii_datalen;	/* length of data payload */
	uint32_t ii_flags;	/* probe-specific flags */
} iscsiinfo_t;
The iscsiinfo_t structure is used to provide identifying 
information about the target and the initiator and
also some PDU level information such as lun, data length and sequence numbers.

The third argument is only used for the SCSI command probe or the data transfer probe:

  • args[2] scsicmd_t * SCSI command block (cdb)

  • scsicmd_t

typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t *ic_cdb;        /* CDB data */
} scsicmd_t;
The scsicmd_t structure is used by the SCSI command probe 
and it contains information about the SCSI command
blocks and is intended for use by all the application 
protocols that deal with SCSI data.

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. Since the data transfer phases are mapped to RDMA operations in iSER, the data-send, data-receive, and data-request probes cannot be used with iSER. Instead the xfer-start and xfer-done probes can be used to trace the data transfer irrespective of the transport used. The data-receive, data-request, and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically.

  • args[2] xferinfo_t * data transfer information

  • xferinfo_t

typedef struct xferinfo {
	uintptr_t xfer_laddr;	/* local buffer address */
	uint32_t xfer_loffset;	/* offset within the local buffer */
	uint32_t xfer_lkey;	/* access control to local memory */
	uintptr_t xfer_raddr;	/* remote virtual address */
	uint32_t xfer_roffset;	/* offset from the remote address */
	uint32_t xfer_rkey;	/* access control to remote virtual address */
	uint32_t xfer_len;	/* transfer length */
	uint32_t xfer_type;	/* Read or Write */
} xferinfo_t;
The xferinfo_t structure is used by the xfer-start 
and the xfer-done probes and contain information about the
data transfer. When the transport type is iSER, 
the remote buffer information is given by the xfer_raddr,
xfer_rkey and xfer_roffset fields. It is set to 0 when the transport type is sockets.

Using the iscsi Provider

iscsi One-Line Probes

Frequency of iSCSI command types:

# dtrace -n 'iscsi*::: { @[probename] = count(); }'

Frequency of iSCSI client IP addresses:

# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'

Payload bytes by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'

Payload byte distribution by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'
iscsiwho.d Script

This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

iscsi*:::
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT");
        printa("   %-26s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

# ./iscsiwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     iSCSI EVENT    COUNT
   192.0.2.14/27                 nop-receive        1
   192.0.2.14/27                    nop-send        1
   192.0.2.14/27               scsi-response      126
   192.0.2.14/27                scsi-command      208
   192.0.2.14/27                data-request     1207
   192.0.2.14/27                data-receive     1207

The following table describes the output fields of the iscsiwho.d script.

Field
Description
REMOTE IP
IP address of the client
iSCSI EVENT
iSCSI event type
COUNT
Number of events traced

The example output shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend as they are from the perspective of the iSCSI target server.

iscsixfer.d Probes

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over RDMA, also known as iSER. An iSER initiator should be able to read and write data from an iSER target at high data rates with relatively low CPU utilization compared to iSCSI using TCP/IP. In order to see the transport layer in use, display the ii_transport field from the iscsiinfo_t structure.

Since the data transfer phases are mapped to RDMA operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead here is a simple script to print an aggregation of all the data transferred between two points using the xfer-start probe. This can be used for iSCSI using TCP/IP and iSCSI over Remote DMA.

The data-receive, data-request, and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically. For example, if the PDUs are received out of order, you might want to trace the ii_ttt, ii_datasn, and ii_statsn. To just get a trace of IO activity, the xfer-start/xfer-done probes should suffice.

#!/usr/sbin/dtrace -s

#pragma D option quiet

iscsi:::xfer-start
{
      @[args[0]->ci_remote, args[2]->xfer_type] = sum(args[2]->xfer_len);
}

END
{
      printf("%26s %10s %8s\n",  "REMOTE IP", "READ/WRITE", "BYTES");
      printa("%26s %10s %15@d\n", @);
}

This output shows the transfer of bytes:

# ./iscsixfer.d
Tracing... Hit Ctrl-C to end.
^C

   REMOTE IP                     READ/WRITE   BYTES
   192.0.2.14                 write        464
   192.0.2.14                 read	      1024

The output fields are described in the following table.

Field
Description
REMOTE IP
IP address of the client
READ/WRITE
Read or write
BYTES
Number of bytes transferred

You can use the following script to see the data read or write as it happens.

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
 printf(" %-26s %8s %10s\n", "REMOTE IP", "BYTES", "READ/WRITE");

}

iscsi:::xfer-start
{
  printf("%26s %%8d %108s\n", args[0]->ci_remote, 
args[2]->xfer_len, args[2]->xfer_type);
}

The following table provides the interpretation for some of these events.

iSCSI event
Interpretation
scsi-command
A SCSI command was issued, such as a read or a write. Use other scripts for a breakdown on the SCSI command type.
data-send
Data was sent from the iSCSI target server to the client; the client is performing a read.
data-receive
Data was received on the iSCSI target server from the client. The client is performing a write.

nfsv3 Server Provider

The nfsv3 provider provides probes for tracing NFS version 3 server activity.

nfsv3 Probe Arguments

All NFS operation probes have the first argument in common:

	args[0]		conninfo_t *		socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi) and the NFS v4 provider (nfsv4), and is intended for use by all provider which are providing some higher level protocol, such as iscsi, nfs, http, and ftp.

	typedef struct conninfo {
		string ci_local;        /* local host address */
		string ci_remote;       /* remote host address */
		string ci_protocol;     /* protocol (ipv4, ipv6, and so on) */
	} conninfo_t;

Operation probes have their second argument in common:

	args[1]		nfsv3opinfo_t *		NFS v3 operation properties

	typedef struct nfsv3opinfo {
		string noi_curpath;	/* current file handle path (if any) */
		cred_t *noi_cred;	/* credentials */
		uint64_t noi_xid;	/* transaction ID */
	} nfsv4opinfo_t;

NFSv3 Probes

The following table lists the probes along with the specific argument for each whose type is defined by the NFS version 3 specification.

Probe
args[2]
nfsv3:::op-access-start
ACCESS3args *
nfsv3:::op-access-done
ACCESS3res *
nfsv3:::op-commit-start
COMMIT3args *
nfsv3:::op-commit-done
COMMIT3res *
nfsv3:::op-create-start
CREATE3args *
nfsv3:::op-create-done
CREATE3res *
nfsv3:::op-fsinfo-start
FSINFO3args *
nfsv3:::op-fsinfo-done
FSINFO3res *
nfsv3:::op-fsstat-start
FSSTAT3args *
nfsv3:::op-fsstat-done
FSSTAT3res *
nfsv3:::op-getattr-start
GETATTR3args *
nfsv3:::op-getattr-done
GETATTR3res *
nfsv3:::op-lookup-start
LOOKUP3args *
nfsv3:::op-lookup-done
LOOKUP3res *
nfsv3:::op-link-start
LINK3args *
nfsv3:::op-link-done
LINK3res *
nfsv3:::op-mkdir-start
MKDIR3args *
nfsv3:::op-mkdir-done
MKDIR3res *
nfsv3:::op-mknod-start
MKNOD3args *
nfsv3:::op-mknod-done-
MKNOD3res *
nfsv3:::op-null-start
-
nfsv3:::op-null-done
-
nfsv3:::op-pathconf-start
PATHCONF3args *
nfsv3:::op-pathconf-done
PATHCONF3res *
nfsv3:::op-read-start
READ3args *
nfsv3:::op-read-done
READ3res *
nfsv3:::op-readdir-start
READDIR3args *
nfsv3:::op-readdir-done
READDIR3res *
nfsv3:::op-readdirplus-start
READDIRPLUS3args *
nfsv3:::op-readdirplus-done
READDIRPLUS3res *
nfsv3:::op-readlink-start
READLINK3args *
nfsv3:::op-readlink-done
READLINK3res *
nfsv3:::op-remove-start
REMOVE3args *
nfsv3:::op-remove-done
REMOVE3res *
nfsv3:::op-renamestart
RENAME3args *
nfsv3:::op-rename-done
RENAME3res *
nfsv3:::op-rmdir-start
RMDIR3args *
nfsv3:::op-rmdir-done
RMDIR3res *
nfsv3:::op-setattr-start
SETATTR3args *
nfsv3:::op-setattr-done
SETATTR3res *
nfsv3:::op-symlink-start
SYMLINK3args *
nfsv3:::op-symlink-done
SYMLINK3res *
nfsv3:::op-write-start
WRITE3args *
nfsv3:::op-write-done
WRITE3res *

Note -  The op-null-* probes have an undefined args[2].

Using the nfsv3 Provider

Some examples of nfsv3 provider usage are as follows.

Tracing NFSv3 Read and Writer Requests Using nfsv3rwsnoop.d

This DTrace script traces NFS version 3 read and write requests, showing details of each operation.

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv3:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv3:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024,
            args[2]->data.data_len, args[1]->noi_curpath);
}

The following output shows a read of /export/stuff/bin/ghex2, then a read of /export/stuff/bin/gksu, and finally a write of /export/stuff/words12:

# ./nfsv3iosnoop.d
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME 
4299383207       192.0.2.75       R 0          4096 /export/stuff/bin/ghex2
4299391813       192.0.2.75       R 4         28672 /export/stuff/bin/ghex2
4299395700       192.0.2.75       R 32        32768 /export/stuff/bin/ghex2
4299396038       192.0.2.75       R 96        32768 /export/stuff/bin/ghex2
4299396462       192.0.2.75       R 128        8192 /export/stuff/bin/ghex2
4299396550       192.0.2.75       R 64        32768 /export/stuff/bin/ghex2
4320233417       192.0.2.75       R 0          4096 /export/stuff/bin/gksu
4320240902       192.0.2.75       R 4         28672 /export/stuff/bin/gksu
4320242434       192.0.2.75       R 32        32768 /export/stuff/bin/gksu
4320242730       192.0.2.75       R 64        24576 /export/stuff/bin/gksu
4333460565       192.0.2.75       W 0         32768 /export/stuff/words12
4333461477       192.0.2.75       W 32        32768 /export/stuff/words12
4333463264       192.0.2.75       W 64        32768 /export/stuff/words12
4333463567       192.0.2.75       W 96        32768 /export/stuff/words12
4333463893       192.0.2.75       W 128       32768 /export/stuff/words12
4333464202       192.0.2.75       W 160       32768 /export/stuff/words12
4333464451       192.0.2.75       W 192       10055 /export/stuff/words12

The fields printed are:

Field
Description
TIME(us)
Time of event in microseconds
CLIENT
Remote client IP address
OP
R == read, W == write
OFFSET
File offset of I/O, in Kbytes
BYTES
Tbytes of I/O
PATHNAME
Path name of file, if known

Note -  The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; post sort the TIME column if needed.
nfsv3ops.d Counts NFSv3 Client Operations

This DTrace script counts NFS version 3 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFS version 3 operations:

# ./nfsv3ops.d
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.0.2.75                    op-commit-done                      1
   192.0.2.75                    op-commit-start                     1
   192.0.2.75                    op-create-done                      1
   192.0.2.75                    op-create-start                     1
   192.0.2.75                    op-access-done                      6
   192.0.2.75                    op-access-start                     6
   192.0.2.75                    op-read-done                        6
   192.0.2.75                    op-read-start                       6
   192.0.2.75                    op-write-done                       7
   192.0.2.75                    op-write-start                      7
   192.0.2.75                    op-lookup-done                      8
   192.0.2.75                    op-lookup-start                     8
   192.0.2.75                    op-getattr-done                    10
   192.0.2.75                    op-getattr-start                   10
  
   Client                           Operation                       Count
   
   Client                           Operation                       Count
   192.0.2.75                    op-getattr-done                     1
   192.0.2.75                    op-getattr-start                    1

The following table list the fields that are printed by the nfsv3ops.d script.

Field
Description
Client
Remote client IP address
Operation
NFS version 3 operation, described using the nfsv3 provider probename
Count
Operations during this interval
nfsv3fileio.d Reports Read and Writes

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv3fileio.d
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/stuff/words10
        8624            0  /export/stuff/bin/echo-client-2
       13228            0  /export/stuff/bin/echo
      496292            0  /export/stuff/bin/ecpg

The following table describes the fields that are printed by the nfsv3fileio.d script.

Field
Description
Rbytes
Bytes read for this path name
Wbytes
Bytes written to this path name
Pathname
Path name of NFS file
nfsv3rwtime.d Reports Read and Write Elapsed Times

This DTrace script prints a summary NFS version 3 read and write elapsed times, along with other details.

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-start,
nfsv3:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv3:::op-read-done,
nfsv3:::op-write-done
/start[args[1]->noi_xid] != 0/
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv3 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv3 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv3 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a clear peak in the read time distribution plot in the 64 to 127 microsecond range, and a second smaller peak between 4 and 16 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
              64 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        1        
              32 |@                                        1        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           36       
             128 |@                                        1        
             256 |@                                        1        
             512 |                                         0        
            1024 |                                         0        
            2048 |@                                        1        
            4096 |@@@                                      3        
            8192 |@@@                                      4        
           16384 |                                         0        


NFSv3 read/write by host (total us):

  192.0.2.75                                                 14089

NFSv4 read/write top 10 files (total us):

  /export/stuff/motd                                               63
  /export/stuff/bin/daps                                         5876
  /export/stuff/bin/date                                         8150

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The next example also shows a pair of peaks, the first around a fraction of a millisecond, the second at around 4 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        4        
              32 |@                                        5        
              64 |@@@@@@                                   22       
             128 |@@@@                                     13       
             256 |@@@@@@@@@                                30       
             512 |@@                                       7        
            1024 |@                                        3        
            2048 |@@@                                      12       
            4096 |@@@@@@@                                  26       
            8192 |@@@@                                     15       
           16384 |@                                        2        
           32768 |                                         0        


NFSv3 read/write by host (total us):

  192.0.2.75                                                414458

NFSv3 read/write top 10 files (total us):

  /export/stuff/bin/cal                                         11225
  /export/stuff/bin/cjpeg                                       11947
  /export/stuff/bin/charmap                                     12347
  /export/stuff/bin/cdda2wav.bin                                13449
  /export/stuff/bin/chkey                                       13963
  /export/stuff/bin/cputrack                                    14533
  /export/stuff/bin/catman                                      15535
  /export/stuff/bin/csslint-0.6                                 18302
  /export/stuff/bin/col                                         19926
  /export/stuff/bin/cdrecord.bin                                40622

The first peak is likely to be NFS operations hitting the memory cache, and the second those that missed and went to disk. Further use of DTrace can confirm this theory.

The fields from the distribution plot are:

Field
Description
value
Minimum elapsed time for this event in microseconds
count
Number of events in this time range
nfsv3io.d Reports Host I/O

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv3:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.0.2.75 calling NFS version 3 reads and writes:

# ./nfsv3io.d
Tracing... Interval 5 secs.

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   192.0.2.75                        27      1    686     40      100

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   192.0.2.75                         0      0      0      0        8

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   0.0.0.0                            0      0      0      0        0
   192.0.2.75                         2      0     28      0       18
^C

Other details can be calculated from the output, such as average read and write size, for example, 686(kr/s) / 27(r/s) = 25.4 average kr. These could also be added to the script to be printed as columns.

The fields printed are:

Field
Description
Client
Remote client IP address
r/s
reads per second
w/s
writes per second
kr/s
kilobytes read per second
kw/s
kilobytes written per second
ops/s
Total NFSv3 operations per second (including the reads and writes)

nfsv4 Provider

The nfsv4 provider provides probes for tracing NFS version 4 server activity.

nfsv4 Probe Arguments

All NFS operation probes have the first argument in common:

        args[0]		conninfo_t *		socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi), and is intended for use by all provider which are providing some higher level protocol, such as iscsi, nfs, http, and ftp.

	typedef struct conninfo {
		string ci_local;        /* local host address */
		string ci_remote;       /* remote host address */
		string ci_protocol;     /* protocol (ipv4, ipv6, and so on) */
	} conninfo_t;

Operation probes have their second argument in common:

	args[1]		nfsv4opinfo_t *		NFS v4 operation properties

	typedef struct nfsv4opinfo {
		string noi_curpath;	/* current file handle path (if any) */
		cred_t *noi_cred;	/* credentials */
		uint64_t noi_xid;	/* transaction ID */
	} nfsv4opinfo_t;

Callback operation probes have their second argument in common:

	args[1]		nfsv4cbinfo_t *		NFS v4 callback properties

	typedef struct nfsv4cbinfo {
		string nci_curpath;	/* file handle path (if any) */
	} nfsv4cbinfo_t;

NFSv4 Top-Level Probes

The following table lists the top level operation probes along with the specific argument for each whose type is defined by the NFS version 4 specification.

Probe
args[2]
nfsv4:::compound-op-start
COMPOUND4args *
nfsv4:::compound-op-done
COMPOUND4res *

The following table lists the operation probes along with the specific argument for each whose type is defined by the NFS version 4 specification.

Probe
args[2]
nfsv4:::op-access-start
ACCESS4args *
nfsv4:::op-access-done
ACCESS4res *
nfsv4:::op-close-start
CLOSE4args *
nfsv4:::op-close-done
CLOSE4res *
nfsv4:::op-commit-start
COMMIT4args *
nfsv4:::op-commit-done
COMMIT4res *
nfsv4:::op-create-start
CREATE4args *
nfsv4:::op-create-done
CREATE4res *
nfsv4:::op-delegpurge-start
DELEGPURGE4args *
nfsv4:::op-delegpurge-done
DELEGPURGE4res *
nfsv4:::op-delegreturn-start
DELEGRETURN4args *
nfsv4:::op-delegreturn-done
DELEGRETURN4res *
nfsv4:::op-getattr-start
GETATTR4args *
nfsv4:::op-getattr-done
GETATTR4res *
nfsv4:::op-getfh-start
GETFH4args *
nfsv4:::op-getfh-done
GETFH4res *
nfsv4:::op-link-start
LINK4args *
nfsv4:::op-link-done
LINK4res *
nfsv4:::op-lock-start
LOCK4args *
nfsv4:::op-lock-done
LOCK4res *
nfsv4:::op-lockt-start
LOCKT4args *
nfsv4:::op-lockt-done
LOCKT4res *
nfsv4:::op-locku-start
LOCKU4args *
nfsv4:::op-locku-done
LOCKU4res *
nfsv4:::op-lookup-start
LOOKUP4args *
nfsv4:::op-lookup-done
LOOKUP4res *
nfsv4:::op-lookupp-start
LOOKUPP4args *
nfsv4:::op-lookupp-done
LOOKUPP4res *
nfsv4:::op-nverify-start
NVERIFY4args *
nfsv4:::op-nverify-done
NVERIFY4res *
nfsv4:::op-open-start
OPEN4args *
nfsv4:::op-open-done
OPEN4res *
nfsv4:::op-open-confirm-start
OPEN_CONFIRM4args *
nfsv4:::op-open-confirm-done
OPEN_CONFIRM4res *
nfsv4:::op-open-downgrade-start
OPEN_DOWNGRADE4args *
nfsv4:::op-open-downgrade-done
OPEN_DOWNGRADE4args *
nfsv4:::op-openattr-start
OPENATTR4args *
nfsv4:::op-openattr-done
OPENATTR4res *
nfsv4:::op-putfh-start
PUTFH4args *
nfsv4:::op-putfh-done
PUTFH4res *
nfsv4:::op-putpubfh-start
PUTPUBFH4args *
nfsv4:::op-putpubfh-done
PUTPUBFH4res *
nfsv4:::op-putrootfh-start
PUTROOTFH4args *
nfsv4:::op-putrootfh-done
PUTROOTFH4res *
nfsv4:::op-read-start
READ4args *
nfsv4:::op-read-done
READ4res *
nfsv4:::op-readdir-start
READDIR4args *
nfsv4:::op-readdir-done
READDIR4res *
nfsv4:::op-readlink-start
READLINK4args *
nfsv4:::op-readlink-done
READLINK4res *
nfsv4:::op-release-lockowner-start
RELEASE_LOCKOWNER4args *
nfsv4:::op-release-lockowner-done
RELEASE_LOCKOWNER4res *
nfsv4:::op-remove-start
REMOVE4args *
nfsv4:::op-remove-don
REMOVE4res *
nfsv4:::op-rename-start
RENAME4args *
nfsv4:::op-rename-done
RENAME4res *
nfsv4:::op-renew-start
RENEW4args *
nfsv4:::op-renew-done
RENEW4res *
nfsv4:::op-restorefh-start
<none>
nfsv4:::op-restorefh-done
<none>
nfsv4:::op-savefh-start
SAVEFH4args *
nfsv4:::op-savefh-done
SAVEFH4res *
nfsv4:::op-secinfo-start
SECINFO4args *
nfsv4:::op-secinfo-done
SECINFO4res *
nfsv4:::op-setattr-start
SETATTR4args *
nfsv4:::op-setattr-done
SETATTR4res *
nfsv4:::op-setclientid-start
SETCLIENTID4args *
nfsv4:::op-setclientid-done
SETCLIENTID4res *
nfsv4:::op-setclientid-confirm-start
SETCLIENTID_CONFIRM4args *
nfsv4:::op-setclientid-confirm-done
SETCLIENTID_CONFIRM4res *
nfsv4:::op-verify-start
VERIFY4args *
nfsv4:::op-verify-done
VERIFY4res *
nfsv4:::op-write-start
WRITE4args *
nfsv4:::op-write-done
WRITE4res *

Callback compound probes have an undefined second argument; this slot is reserved for future use.

The following table lists the top-level callback probes, along with the specific argument for each whose type is defined by the NFS version 4 specification.

Probe
args[2]
nfsv4:::compound-cb-start
CB_COMPOUND4args *
nfsv4:::compound-cb-done
CB_COMPOUND4res *

The following table lists the callback probes, along with the specific argument for each probe whose type is defined by the NFS version 4 specification.

Probe
args[2]
nfsv4:::cb-getattr-start
CB_GETATTR4args*
nfsv4:::cb-getattr-done
CB_GETATTR4res *
nfsv4:::cb-recall-start
CB_RECALL4args *
nfsv4:::cb-recall-done
CB_RECALL4res *

Note -  Since the Oracle Solaris NFS v4 implementation does not yet use the 'getattr' callback, the probe will not be implemented; it is noted here in anticipation of a future implementation.

Using the nfsv4 Provider

Examples of nfsv4 provider usage are as follows.

Tracing NFSv4 Read and Writer Requests Using nfsv4rwsnoop.d

This DTrace script traces NFS version 4 reads and writes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv4:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv4:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024, args[2]->data_len,
            args[1]->noi_curpath);
}

This output shows a few files were read, and one was written:

# ./nfsv4rwsnoop.d 
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME
156889725960     192.0.2.14       R 0          4096 /export/share/bin/nawk
156889735515     192.0.2.14       R 4         28672 /export/share/bin/nawk
156889736298     192.0.2.14       R 32        32768 /export/share/bin/nawk
156889736544     192.0.2.14       R 96        32768 /export/share/bin/nawk
156889736902     192.0.2.14       R 64        32768 /export/share/bin/nawk
156916061653     192.0.2.14       R 0          4096 /export/share/bin/ssh
156916069375     192.0.2.14       R 4         28672 /export/share/bin/ssh
156916070098     192.0.2.14       R 32        32768 /export/share/bin/ssh
156916070435     192.0.2.14       R 96        32768 /export/share/bin/ssh
156916070758     192.0.2.14       R 64        32768 /export/share/bin/ssh
156916071036     192.0.2.14       R 128       32768 /export/share/bin/ssh
156916071352     192.0.2.14       R 160       32768 /export/share/bin/ssh
156916071582     192.0.2.14       R 192       32768 /export/share/bin/ssh
156916071696     192.0.2.14       R 72         4096 /export/share/bin/ssh
156916080508     192.0.2.14       R 224        4096 /export/share/bin/ssh
156916080844     192.0.2.14       R 228       28672 /export/share/bin/ssh
156916081566     192.0.2.14       R 256       32768 /export/share/bin/ssh
156916081833     192.0.2.14       R 288       32768 /export/share/bin/ssh
156916082237     192.0.2.14       R 320       20480 /export/share/bin/ssh
156933373074     192.0.2.14       W 0         32768 /export/share/words
156933373351     192.0.2.14       W 32        32768 /export/share/words
156933373855     192.0.2.14       W 64        32768 /export/share/words
156933374185     192.0.2.14       W 96        32768 /export/share/words
156933375442     192.0.2.14       W 128       32768 /export/share/words
156933375864     192.0.2.14       W 160       32768 /export/share/words
156933376105     192.0.2.14       W 192       10055 /export/share/words

The fields printed are:

Field
Description
TIME(us)
Time of event in microseconds
CLIENT
Remote client IP address
OP
R == read, W == write
OFFSET
File offset of I/O, in Kbytes
BYTES
Bytes of I/O
PATHNAME
Path name of file, if known
nfsv4ops.d Reports Client Operations

This DTrace script counts NFS version 4 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFS version 4 operations:

# ./nfsv4ops.d 
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.0.2.14                    op-getattr-done                     1
   192.0.2.14                    op-getattr-start                    1
   192.0.2.14                    op-putfh-done                       1
   192.0.2.14                    op-putfh-start                      1

   Client                           Operation                       Count
   192.0.2.14                    op-access-done                      1
   192.0.2.14                    op-access-start                     1
   192.0.2.14                    op-close-done                       1
   192.0.2.14                    op-close-start                      1
   192.0.2.14                    op-getfh-done                       1
   192.0.2.14                    op-getfh-start                      1
   192.0.2.14                    op-open-done                        1
   192.0.2.14                    op-open-start                       1
   192.0.2.14                    op-getattr-done                     3
   192.0.2.14                    op-getattr-start                    3
   192.0.2.14                    op-read-done                        9
   192.0.2.14                    op-read-start                       9
   192.0.2.14                    op-putfh-done                      12
   192.0.2.14                    op-putfh-start                     12
^C

   Client                           Operation                       Count

The fields printed are:

Field
Description
Client
Remote client IP address
Operation
NFSv4 operation, described using the nfsv4 provider probename
Count
Operations during this interval
nfsv4fileio.d Reports Reads and Writes

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv4fileio.d
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/share/words1
       24528            0  /export/share/bin/xargs
       44864            0  /export/share/bin/ed
      232476            0  /export/share/bin/vi

The fields printed are:

Field
Description
Rbytes
Bytes read for this path name
Wbytes
Bytes written to this path name
Pathname
Path name of NFS file
nfsv4rwtime.d Reports Read and Writer Elapsed Times

This DTrace script prints a summary NFS version 4 read and write elapsed times, along with other details:

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-start,
nfsv4:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv4:::op-read-done,
nfsv4:::op-write-done
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv4 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv4 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv4 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a peak in the read time distribution plot in the 64 to 127 microsecond range, and a second peak between 2 and 8 milliseconds:

# ./nfsv4rwtime.d 
Tracing... Hit Ctrl-C to end.
^C
NFSv4 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@                                   1        
             128 |@@@@@@@@@@@                              2        
             256 |@@@@@@@@@@@@@@@@@                        3        
             512 |@@@@@@                                   1        
            1024 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@                                     6        
              64 |@@@@@@@@@@@@                             17       
             128 |@                                        1        
             256 |@@                                       3        
             512 |@                                        1        
            1024 |@@                                       3        
            2048 |@@@@@@@@                                 12       
            4096 |@@@@@@@@@@                               15       
            8192 |@                                        1        
           16384 |                                         0        


NFSv4 read/write by host (total us):

  192.0.2.14                                                148215

NFSv4 read/write top 10 files (total us):

  /export/share/bin/man                                          5020
  /export/share/bin/makeuuid                                     5132
  /export/share/bin/mc68030                                      5836
  /export/share/bin/m4                                           6446
  /export/share/bin/msgfmt                                       6669
  /export/share/bin/mkmsgs                                       6674
  /export/share/bin/mailstats                                    6935
  /export/share/bin/mkdir                                        7009
  /export/share/bin/mac                                          7693
  /export/share/bin/make                                        27903

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The first peak in the read distribution is likely to be NFS operations hitting the memory cache, and the second those that missed and read from disk. The writes were all fast as they are likely to written to the memory cache and returned asynchronously. Further use of DTrace can confirm these theories.

The fields from the distribution plot are:

Field
Description
value
Minimum elapsed time for this event in microseconds
count
Number of events in this time range
nfsv4io.d Reports Host I/O

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv4:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.0.2.14 calling NFSv4 reads and writes:

# ./nfsv4io.d 
Tracing... Interval 5 secs.

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   192.0.2.14                        17      1    331     40      290

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   192.0.2.14                         9      0    197      0      152

   Client                            r/s    w/s   kr/s   kw/s    ops/s
   192.0.2.14                        16      0    269      0      363

   Client                            r/s    w/s   kr/s   kw/s    ops/s
^C

Other details can be calculated from the output, such as average read and write size, for example 331(kr/s) / 17(r/s) = 19.5 average kr. These could also be added to the script to be printed as columns.

The fields printed are:

Field
Description
Client
Remote client IP address
r/s
Reads per second
w/s
Writes per second
kr/s
Kbytes read per second
kw/s
Kbytes written per second
ops/s
Total NFSv4 operations per second (including the reads and writes)

nlmv4 Provider

For purposes of file locking, the NFSv3 protocol relies on Network Lock Manager version 4 (NLMv4). The nlmv4 provider exposes a set of probes tracking NLMv4 operations on the server, thus providing possibility to track NFSv3 locking operations.

nlmv4 Probe Arguments

All NLMv4 operation probes have the first argument in common:

args[0]        conninfo_t *        socket connection information

The conninfo_t structure is already used by the iSCSI target provider and the NFS providers (NFSv4, NFSv3, and NFSv2), and is intended for use by all providers related to a higher level protocol such as iscsi, nfs, http, and ftp.

typedef struct conninfo {
        string ci_local;            /* NULL (local host address) */
        string ci_remote;           /* remote host address */
        string ci_protocol;         /* protocol (ipv4, ipv6, etc) */
    } conninfo_t;

Operation probes have their second argument in common:

args[1]    nlmv4opinfo_t *      NLMv4 operation properties

typedef struct nlmv4opinfo {
        string noi_curpath;         /* current file handle path (if any) */
        cred_t *noi_cred;           /* NULL (credentials) */
        uint64_t noi_xid;           /* transaction ID */
    } nlmv4opinfo_t;

nlmv4 Probes

All the probes work for both synchronous and asynchronous NLMv4 procedures. The information about remote host is not available in the op-null-* probes.

Probe
args[2]
nlmv4:::op-cancel-start
nlm4_probe_cancargs_t
nlmv4:::op-cancel-done
nlm4_probe_res_t
nlmv4:::op-free-all-start
nlm4_probe_notify_t
nlmv4:::op-free-all-done
none
nlmv4:::op-lock-start
nlm4_probe_lockargs_t
nlmv4:::op-lock-done
nlm4_probe_res_t
nlmv4:::op-nm-lock-start
nlm4_probe_lockargs_t
nlmv4:::op-nm-lock-done
nlm4_probe_res_t
nlmv4:::op-null-start
none
nlmv4:::op-null-done
none
nlmv4:::op-share-start
nlm4_probe_shareargs_t
nlmv4:::op-share-done
nlm4_probe_shareres_t
nlmv4:::op-test-start
nlm4_probe_testargs_t
nlmv4:::op-test-done
nlm4_probe_testres_t
nlmv4:::op-unlock-start
nlm4_probe_unlockargs_t
nlmv4:::op-unlock-done
nlm4_probe_res_t
nlmv4:::op-unshare-start
nlm4_probe_shareargs_t
nlmv4:::op-unshare-done
nlm4_probe_shareres_t

The following table lists the probes that track the NLMv4 GRANTED callback procedure. The server notifies the client that a formerly blocked request has now been granted.

Probes
args[2]
nlmv4:::op-granted-start
nlm4_probe_testargs_t
nlmv4:::op-granted-done
nlm4_probe_res_t
nlmv4:::op-granted-res-start
nlm4_probe_testargs_t
nlmv4:::op-granted-res-done
nlm4_probe_res_t
nlmv4:::notify-granted-start
nlm4_probe_testargs_t
nlmv4:::notify-granted-done
nlm4_probe_notify_granted_res_t

When there is conflicting NFSv4 delegation, the following probe is fired:

nlmv4:::conflicting-delegation  nlm4_probe_conflicting_delegation_t

nlm4_probe_cancargs_t Arguments

nlm4_probe_cancargs_t Structure

The nlm4_probe_cancargs_t structure contains a pointer to raw header.

typedef struct nlm4_probe_cancargs
	nlm4_cancargs *ca; /* pointer to raw header */
} nlm4_probe_cancargs_t;
nlm4_probe_res_t Structure

The nlm4_probe_res_t structure contains a pointer to raw header.

typedef struct nlm4_probe_res {
	nlm4_res *r; /* pointer to raw header */
} nlm4_probe_res_t;
nlm4_probe_conflicting_delegation_t Structure

The nlm4_probe_conflicting_delegation_t structure contains a pointer to raw header.

typedef struct nlm4_probe_conflicting_delegation {
	nlm4_res *r; /* pointer to raw header */
	int delegation_recalled;
} nlm4_probe_conflicting_delegation_t;

The delegation_recalled flag is true when the delegation is successfully recalled.

nlm4_probe_notify_granted_res_t Structure

The nlm4_probe_notify_granted_res_t structure contains a pointer to raw header.

typedef struct nlm4_probe_notify_granted_res {
	nlm4_res *r; /* pointer to raw header */
	int last_errno;
} nlm4_probe_notify_granted_res_t;

The last_errno member of the structure shows the returned error code of the RPC call of the granted procedure.

nlm4_probe_notify_t Structure

The nlm4_probe_notify_t structure contains a pointer to raw header.

typedef struct nlm4_probe_notify
	nlm4_notify *n; /* pointer to raw header */
} nlm4_probe_notify_t;
nlm4_probe_lockargs_t Structure

The nlm4_probe_lockargs_t structure contains a pointer to raw header.

typedef struct nlm4_probe_lockargs
	nlm4_lockargs *la; /* pointer to raw header */
} nlm4_probe_lockargs_t;
nlm4_probe_shareargs_t Structure

The nlm4_probe_shareargs_t structure contains a pointer to raw header.

typedef struct nlm4_probe_shareargs
	nlm4_shareargs *sa; /* pointer to raw header */
} nlm4_probe_shareargs_t;
nlm4_probe_shareres_t Structure

The nlm4_probe_shareres_t structure contains a pointer to raw header.

typedef struct nlm4_probe_shareres
	nlm4_shareres *sr; /* pointer to raw header */
} nlm4_probe_shareres_t;
nlm4_probe_testargs_t Structure

The nlm4_probe_testargs_t structure contains a pointer to raw header.

typedef struct nlm4_probe_testargs
	nlm4_testargs *ta; /* pointer to raw header */
} nlm4_probe_testargs_t;
nlm4_probe_testres_t Structure

The nlm4_probe_testres_t structure contains a pointer to raw header.

typedef struct nlm4_probe_testres
	nlm4_testres *tr; /* pointer to raw header */
} nlm4_probe_testres_t;
nlm4_probe_unlockargs_t Structure

The nlm4_probe_unlockargs_t structure contains a pointer to raw header.

typedef struct nlm4_probe_unlockargs
	nlm4_unlockargs *ua; /* pointer to raw header */
} nlm4_probe_unlockargs_t;

nlmv4 Stability

The nlmv4 provider uses stability mechanism of DTrace to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 89  Stability Mechanism for the nlmv4 Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

scsi Provider

The scsi provider provides probes for tracing SCSI T10 command protocol in an Oracle Solaris host.

SCSI Probes

The scsi probes are described in the following table.

Probes
Description
cmd-request
SCSI command request.
cmd-request-mapin
SCSI command request. Also, executes bp_mapin on the data buffer. For more information, see the bp_mapin(9F) man page.
cmd-response
SCSI command response.
tmf-request
Task Management Function (TMF) request.
tmf-response
Task Management Function (TMF) response.

scsi Probe Arguments

The argument types for the scsi probes are listed in the following table.

Table 90  Probe Arguments for the scsi Provider
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
cmd-request
scsi_addr_t
scsi_cdb_t
scsi_data_t
scsi_id_t
-
cmd-request-mapin
scsi_addr_t
scsi_cdb_t
scsi_data_t
scsi_id_t
-
cmd-response
scsi_addr_t
scsi_cdb_t
scsi_data_t
scsi_id_t
scsi_rsp_t
tmf-request
scsi_addr_t
scsi_tmf_code_t
scsi_id_t
-
-
tmf-response
scsi_addr_t
scsi_tmf_code_t
scsi_id_t
int
-

The description of the data types used by the scsi probes are as follows:

Data Type
Description
scsi_addr_t
Address information
scsi_cdb_t
Command Descriptor Block (CDB)
scsi_data_t
DATA IN/OUT buffer
scsi_id_t
Task ID
scsi_rsp_t
Response / Status, Sense Data
scsi_tmf_code_t
Task Management Function (TMF)
int
TMF Result
scsi_addr_t Structure

The scsi_addr_t structure provides the address information of I_T_L (initiator_target_LUN) nexus.

typedef struct scsi_addr {
                string          addr_ctrl;
                uint16_t        addr_ctrl_inst;
                string          addr_dev;
                string          addr_path;
                string          addr_devid;
} scsi_addr_t
Table 91  scsi_addr_t Members
Member
Description
addr_ctrl
Controller name such as fp, scsi, and scsi_vhci, and so on.
addr_ctrl_inst
Controller instance.
addr_dev
Device address.
For example:
  • g600a0b80005adf90000006a24ea580cc (pHCI enumerated with vHCI)

  • w500110a0008aa98a,0 (not enumerated with vHCI)

addr_path
This is present only for pHCI commands when pHCI is enumerated with vHCI. The addr_path includes target and logical unit number information.
For example: w202500a0b85adf90,1
addr_devid
Oracle Solaris unique device identifier.
For example: id1,sd@n600a0b80005adf90000006a24ea580cc
scsi_cdb_t Structure

The scsi_cdb_t contains Command Descriptor Block (CDB) information.

typedef struct scsi_cdb {
                uint64_t cdb_len;        /* CDB length */
                uint8_t  *cdb_data;      /* CDB data */
} scsi_cdb_t;
scsi_data_t Structure

The scsi_data_t structure is a DATA IN/OUT buffer.

typedef struct scsi_data {
                size_t   data_size;     /* DATA IN/OUT buffer size */
                uint8_t  *data_ptr;     /* DATA IN/OUT buffer */
                int      data_mapped;   /* data_ptr is kernel mapped, boolean */
                buf_t    *data_buf;     /* pointer to buf(9S), internal */
} scsi_data_t;

The data_ptr member points to a DATA IN/OUT payload associated with some SCSI commands such as INQUIRY, READ, WRITE, and REPORT LUNS. You can read the data from the data_ptr only when data_mapped is set to 1.

While using the scsi:::cmd-request probe, the data_ptr is not set in the following conditions:

  • When buf(9S) is sent with the B_PAGEIO flag or the B_MVECTOR flag in b_flags.

  • If the data buffer is not mapped to a kernel virtual address space.

You can overcome this condition by using the scsi:::cmd-request-mapin probe. This probe behaves like cmd-request and also executes the bp_mapin() function, which sets the data_mapped flag to 1. You can then access the data buffer from the kernel memory by using data_ptr.


Caution

Caution  -  The cmd-request-mapin probe might have significant performance impact.


The cmd-request-mapin probe also ensures that the same data buffer is accessible during the cmd-response probe.

For more information, see the buf(9S) and bp_mapin(9F) man pages.

scsi_id_t Structure

The scsi_id_t structure provides command ID information.

typedef struct scsi_id {
        uint64_t  id_cmd;
        uint64_t  id_timestamp;
} scsi_id_t

The id_cmd member represents an identifier of the command. It can be used to match a SCSI request with a SCSI response. The value of the id_cmd member can be reused when the command execution is complete. Therefore, the id_cmd member must be used with the id_timestamp member or you must sort the script output according to the another time stamp.

The id_timestamp member is a time stamp of the SCSI command request in nanoseconds. It can be used with a DTrace build-in variable time stamp to do various time related calculations.

scsi_rsp_t Structure

The scsi_rsp_t structure provides the SCSI response and status information.

typedef struct scsi_rsp {
                uint8_t  rsp_tran_ret;    /* scsi_transport(9F) return value */
                uint8_t  rsp_reason;      /* value of pkt_reason */
                uint8_t  rsp_status;      /* SCSI Status */
                uint8_t  rsp_sense_key;   /* sense key */
                uint8_t  rsp_sense_asc;   /* ASC */
                uint8_t  rsp_sense_ascq;  /* ASCQ */
                size_t  rsp_resid;       /* number of bytes not transferred */
                uint64_t rsp_latency;     /* latency of the SCSI command in nanoseconds */
} scsi_rsp_t

The scsi:::cmd-response fires if one of the following conditions is true:

  • When you receive a response from the target device.

  • When the SCSI command fails to reach the target because of transport failure.

Values in the scsi_rsp_t structure provide details on the failure or success of the SCSI command.

The rsp_tran_ret member provides a return value of scsi_transport(). The return values denote if the SCSI command was accepted by the SCSI transport. Values other than TRAN_ACCEPT indicates that the SCSI command was not accepted and all other members of the structure scsi_rsp were not set.

The rsp_reason member denotes the SCSI command completion reason. The value of CMD_CMPLT is set for a normal completion when the command has reached the target. Then the SCSI status is set by using the rsp_status member. The value of the rsp_status is not set for any other values of rsp_reason. To know the value of TRAN_ACCEPT and CMD_CMPLT, see the scsi_pkt.h file. For more information, see scsi_pkt(9S).

scsi_tmf_code_t Structure

The scsi_tmf_code_t specifies Task Management Function.

typedef enum scsi_tmf_code {SCSI_TMF_UNKNOWN, SCSI_TMF_ABORT_TASK,
            SCSI_TMF_ABORT_TASK_SET, SCSI_TMF_CLEAR_ACA,
            SCSI_TMF_CLEAR_TASK_SET, SCSI_TMF_I_T_NEXUS_RESET,
            SCSI_TMF_LOGICAL_UNIT_RESET, SCSI_TMF_TARGET_RESET,
            SCSI_TMF_WAKEUP, SCSI_TMF_QUERY_TASK} scsi_tmf_code_t

Task ID is used only with ABORT_TASK and QUERY_TASK TMF to specify I_T_L_Q nexus with scsi_addr_t information.

If the TMF result is 1 then the function execution is a success. If the TMF result is 0 then the function execution is a failure.


Note -  Only HBA drivers which implement SCSI_HBA_ADDR_COMPLEX addressing method and utilize scsi_hba_pkt_comp(9F) are supported by the scsi provider. All SCSAv3 HBA drivers, scsi_vhci, fcp, and iscsi comply to this requirement.

The probe cmd-response might not fire and some addressing information might not be available with non-complying HBA drivers.


Note -  The scsi_data_t is provided only when HBA drivers implement tran_setup_pkt(9E). For example, SCSAv3 HBA drivers and fcp comply to this requirement.

Some simple examples of scsi provider usage follow.

Using the scsi Provider

Some simple examples of scsi provider usage follow.

Tracing SCSI Commands

This DTrace command traces all the SCSI command requests:

# dtrace -qn 'scsi:::cmd-request {
      printf("%s#%d %s: Opcode: %02x\n",
          args[0]->addr_ctrl, args[0]->addr_ctrl_inst,
          args[0]->addr_dev, args[1]->cdb_data[0]);}'

In this example, traced commands can appear twice in the output. First, as commands issued to scsi_vhci (vHCI) and second, as commands issued to a physical HBA driver (pHCI).

Tracing Target Resets

This DTrace command traces Target Resets and paths to where the resets are sent:

# dtrace -n 'scsi:::tmf-request
    /(args[1] == SCSI_TMF_TARGET_RESET) &&
     (args[0]->addr_path != "NULL")/ {
          printf("Target Reset sent to %s", args[0]->addr_path);}'
Displaying TPGS bits Received

The following script displays TPGS bits received in standard INQUIRY data.

 #!/usr/sbin/dtrace -s
 /*
  * Response to INQUIRY with EVPD bit not set, i.e. DATA IN
  * contains standard INQUIRY data.
  * Tracing only on vHCI enumerated paths (addr_path != "NULL")
  * with DATA IN buffer mapped to a kernel virtual memory space.
  */
 scsi:::cmd-response
 /(args[1]->cdb_data[0] == 0x12) &&
  !(args[1]->cdb_data[1] & 0x01) &&
  (args[0]->addr_path != "NULL") &&
  (args[2]->data_mapped)/
 {
        printf("TPGS bits 0x%x received from %s\n",
            (args[2]->data_ptr[5] & 0x30) >> 4,
            args[0]->addr_path);
 }
Tracing Reservation Keys

The following script can be used to trace reservation keys sent in the PERSISTENT RESERVE OUT command to any path associated with a particular LU.

#!/usr/sbin/dtrace -qs
/*
 * Explanation of predicate expressions:
 *  * 0x5f is an opcode of PERSISTENT RESERVE OUT
 *  * tracing on pHCI level only, i.e. addr_path must be set
 *  * DATA OUT buffer must be mapped to a kernel virtual memory
 *  * Sanity check for the data buffer size
 *  * Specification of the logical unit GUID
 */
scsi:::cmd-request
/(args[1]->cdb_data[0] == 0x5f) &&
 (args[0]->addr_path != "NULL") &&
 args[2]->data_mapped &&
 (args[2]->data_size >= 16) &&
 (args[0]->addr_dev == "g600144f0ecf10e000000562915af0001")/
{
      printf("%s#%d:%s\n", args[0]->addr_ctrl,
          args[0]->addr_ctrl_inst, args[0]->addr_path);

      printf("Service Action: %02x\n",
          args[1]->cdb_data[1] & 0x1f);

      printf("Reservation Key: ");
          tracemem(&args[2]->data_ptr[0], 8);

      printf("SA Reservation Key: ");
          tracemem(&args[2]->data_ptr[8], 8);

          printf("\n\n");
}

scsi Stability

The scsi provider uses stability mechanism of DTrace to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 92  Stability Mechanism for the scsi Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

sctp Provider

The sctp provider provides probes for tracing the Stream Control Transmission Protocol (SCTP).

SCTP Probes

The sctp probes are described in the following table:

Probes
Description
state-change
Probe that fires when an SCTP session changes its SCTP state. Previous state is noted in the sctplsinfo_t * probe argument. The sctpinfo_t * and ipinfo_t * arguments are NULL.
send
Probe that fires whenever SCTP sends a segment (either control or data).
receive
Probe that fires whenever SCTP receives a segment (either control or data).

The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by the sctp provider.

SCTP Probe Arguments

The argument types for the sctp probes are listed in the following table. The arguments are described in the following section. All probes expect state-change have 5 arguments. The state-change probe has 6 arguments.

Table 93  sctp Probe Arguments
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
args[5]
state-change
null
csinfo_t *
null
sctpsinfo_t
null
sctplsinfo_t*
send
pktinfo_t *
csinfo_t *
ipinfo_t *
sctpsinfo_t
sctpinfo_t*
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
sctpsinfo_t
sctpinfo_t*
pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
        uint64_t cs_cid;
        pid_t cs_pid;
        zoneid_t cs_zoneid;
 } csinfo_t;
Table 94  csinfo_t Members
Member
Description
cs_addr
Address of translated ip_xmit_attr_t *
cs_cid
Connection ID. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ipinfo_t Structure

The ipinfo_t structure contains common IP information for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;
Table 95  ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
sctpsinfo_t Structure

The sctpsinfo_t structure contains informations about stable SCTP details from sctp_t.

typedef struct sctpsinfo {
        uintptr_t sctps_addr;           /* pointer to sctp_t */
        int sctps_num_raddrs;           /* number of remote addresses*/
        uintptr_t sctps_raddrs;         /* pointer to sctp_faddrs */
        int sctps_num_laddrs;
        uintptr_t *sctps_laddrs;        /* pointer to sctp_saddrs */
        uint16_t sctps_lport;           /* local port */
        uint16_t sctps_rport;           /* remote port */
        string sctps_laddr;             /* local address, as a string */
        string sctps_raddr;             /* remote address, as a string */
        int32_t sctps_state;
} sctpsinfo_t;

It might seem redundant to supply the local and remote ports and addresses here and in the sctpinfo_t, but the sctp:::state-change probes do not have associated sctpinfo_t data. To map the state change to a specific port, we need the associated sctpinfo_t data.

The following table contains the members of scptsinfo_t structure:

Table 96  sctpsinfo_t Members
Member
Description
sctps_addr
Address of translated sctp_t *.
sctps_num_raddrs
Number of remote addresses.
sctps_raddrs
Pointer to first sctp_faddr_t *.
sctps_num_laddrs
Number of local addresses.
sctps_laddrs
Pointer to sctp_saddrs* address.
sctps_lport
Local port associated with the SCTP connection.
sctps_rport
Remote port associated with the SCTP connection.
sctps_laddr
Local address associated with the SCTP connection, as a string.
sctps_raddr
Remote address associated with the SCTP connection, as a string.
sctps_state
SCTP state. Inline definitions are provided for the various SCTP states such as SCTP_STATE_CLOSE and SCTP_STATE_IDLE. Use inline sctp_state_string[] to convert state to a string.
sctplsinfo_t Structure

The sctplsinfo_t structure contains the previous sctp state during a state change.

typedef struct sctplsinfo {
        int32_t sctps_state;              /* SCTP state */
} sctplsinfo_t;

The sctps_state member is the previous SCTP state. Inline definitions are provided for the various SCTP states such as SCTP_STATE_CLOSED and SCTP_STATE_IDLE. Use inline sctp_state_string[] to convert state to a string.

sctpinfo_t Structure

The sctpinfo_t structure is a DTrace translated version of the SCTP header.

typedef struct sctpinfo {
        uint16_t sctp_sport;              /* source port */
        uint16_t sctp_dport;              /* destination port */
        uint32_t sctp_verify;             /* verification tag */
        uint32_t sctp_checksum;           /* headers + data checksum */
        sctp_hdr_t *sctp_hdr;             /* raw SCTP header */
} sctpinfo_t;

The following table contains the members of scptrinfo_t structure:

Table 97  sctpinfo_t Members
Name of Member
Description
sctp_sport
SCTP source port.
sctp_dport
SCTP destination port.
sctp_verify
SCTP verification tag.
sctp_checksum
Checksum of SCTP header and payload.
sctp_chunk_hdr
Pointer to SCTP chunk header.
sctp_hdr
Pointer to raw SCTP header at time of tracing.

Using the sctp Provider

The following example shows a simple example for the sctp provider.

sctpstate.d Tracing State Changes

This following DTrace script demonstrates the capability to trace SCTP state changes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

int last[int];

dtrace:::BEGIN
{
        printf(" %3s %12s  %-20s    %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW");
        last = timestamp;
}

sctp:::state-change
/ last[args[1]->cs_cid] /
{
        this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000;
        printf(" %3d %12d  %-20s -> %-20s\n", cpu, this->elapsed,
            sctp_state_string[args[5]->sctps_state], sctp_state_string[args[3]->sctps_state]);
        last[args[1]->cs_cid] = timestamp;
}

sctp:::state-change
/ last[args[1]->cs_cid] == 0 /
{
        printf(" %3d %12s  %-20s -> %-20s\n", cpu, "-",
            sctp_state_string[args[5]->sctps_state],
            sctp_state_string[args[3]->sctps_state]);
        last[args[1]->cs_cid] = timestamp;
}

The fields printed are as follows:

Field
Description
CPU
CPU ID of the event
DELTA
Time since previous event for that connection, microseconds
OLD
Old SCTP state
NEW
New SCTP state
sctpio.d Traces SCTP Packets

The following DTrace script traces SCTP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %15s:%-5s      %15s:%-5s\n", "CPU",
            "LADDR", "LPORT", "RADDR", "RPORT");
}

sctp:::send
{
        printf(" %3d %16s:%-5d -> %16s:%-5d\n", cpu,
            args[2]->ip_saddr, args[4]->sctp_sport,
            args[2]->ip_daddr, args[4]->sctp_dport);
}

sctp:::receive
{
        printf(" %3d %16s:%-5d <- %16s:%-5d\n", cpu,
            args[2]->ip_daddr, args[4]->sctp_dport,
            args[2]->ip_saddr, args[4]->sctp_sport);
}

The fields printed are as follows:

Field
Description
CPU
CPU ID that the event occurred on
LADDR
Local IP address
LPORT
Local SCTP port
RADDR
Remote IP address
RPORT
Remote SCTP port

sctp Stability

The sctp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 98  Stability Mechanism for the sctp Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

srp Provider

The srp provider provides probes for tracing srp port provider activity.

This is a kernel provider built into the COMSTAR srp target port provider.

srp Probes

srp Probes Overview
Header
Header
Service up/down
srp:::service-up, srp:::service-down
Remote Port login/logout
srp:::login-command, srp:::login-response, srp:::logout-command
SRP command/response
srp:::task-command, srp:::task-response
SCSI command/response
srp:::scsi-command, srp:::scsi-response
Data transfer
srp:::xfer-start, srp:::xfer-done

For the following providers, string fields that are not known contain the string <unknown>. Integer fields that are not known contain 0.

Service Up/Down Event Probes

srp:::service-up and srp:::service-down trace SRP target online and offline events. Remote port information ci_remote is unavailable for both probes.

Probes
Variable
Type
Description
srp:::service-up
srp:::service-down
args[0]
conninfo_t *
Connection information
srp:::service-up
srp:::service-down
args[1]
srp_portinfo_t *
Local and remote port information
Remote Port Login/Logout Event Probes
Probes
Variable
Type
Description
srp:::login-command
srp:::login-response
srp:::logout-command
args[0]
conninfo_t *
Connection information
srp:::login-command
srp:::login-response
srp:::logout-command
args[1]
srp_portinfo_t *
Local and remote port information
srp:::login-command
srp:::login-response
args[2]
srp_logininfo_t *
Login command/response information
SRP Command Event Probes
Probes
Variable
Type
Description
srp:::task-command
srp:::task-response
args[0]
conninfo_t *
Connection information
srp:::task-command
srp:::task-response
args[1]
srp_portinfo_t *
Local and remote port information
srp:::scsi-response
srp:::scsi-command
args[2]
srp_taskinfo_t *
srp task information
SCSI Command Event Probes
Probes
Variable
Type
Description
srp:::scsi-command
srp:::scsi-response
args[0]
conninfo_t *
Connection information
srp:::scsi-command
srp:::scsi-response
args[1]
srp_portinfo_t *
Local and remote port information
srp:::scsi-command
args[2]
scsicmd_t *
SCSI command block (cdb)
srp:::scsi-response
args[2]
srp_taskinfo_t *
srp task information
srp:::scsi-command
args[3]
srp_taskinfo_t *
srp task information
Data Transfer Probes
Probes
Variable
Type
Description
srp:::xfer-start
srp:::xfer-done
args[0]
conninfo_t *
Connection information
srp:::xfer-start
srp:::xfer-done
args[1]
fc_port_info_t *
Local port information
srp:::xfer-start
srp:::xfer-done
args[2]
xferinfo_t *
RDMA transfer information
srp:::xfer-start
srp:::xfer-done
args[3]
srp_taskinfo_t *
srp task information

SRP Argument Types

scsicmd_t, conninfo_t, and xferinfo_t are common types that are used by other providers.

scsicmd_t Structure
typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t  *ic_cdb;       /* CDB data */
} scsicmd_t;
conninfo_t Structure
typedef struct conninfo {
        string ci_local;        /* GID of the local HCA */
        string ci_remote;       /* GID of the remote HCA */
        string ci_protocol;     /* protocol ("ib") */
} conninfo_t;
srp_portinfo_t Structure
typedef struct srp_portinfo {
        /* initiator */
        string  pi_initiator;   /* Initiator: eui.xxxxxxxxxxxxxxx */
        string  pi_i_sid;       /* Initiator seiion id */

        /* target */
        string  pi_target;      /* Target: eui.xxxxxxxxxxxxxxx */
        string  pi_t_sid;       /* Target session id */

        uintptr_t pi_chan_id;   /* Channel identifier */
} srp_portinfo_t;
srp_logininfo_t Structure
typedef struct srp_logininfo {
        uint64_t li_task_tag;      /* SRP task tag */
        uint32_t li_max_it_iu_len; /* Maximum iu length that initiator can
        send to target */
        uint32_t li_max_ti_iu_len; /* Maximum iu length that target can
        send to initiator */
        uint32_t li_request_limit; /* Maximun number of SRP requests
        that initiator can send on a channel */
        uint32_t reason_code;      /* Reason code */
} srp_logininfo_t;
srp_taskinfo_t Structure
typedef struct srp_taskinfo {
        uint64_t ti_task_tag;   /* SRP task tag */
        uint64_t ti_lun;        /* Target logical unit number */
        uint8_t  ti_function;   /* Task management function */
        uint32_t ti_req_limit_delta; /* Increment of channel's request limit */
        uint8_t  ti_flag;            /* bit 2:DOOVER 3:DOUNDER 4:DIOVER 5:DIUNDER */
        uint32_t ti_do_resid_cnt;    /* Data-out residual count */
        uint32_t ti_di_resid_cnt;    /* Data-in residual count */
        uint8_t  ti_status;     /* Status of this task */
} srp_taskinfo_t;
xferinfo_t Structure
typedef struct xferinfo {
        uintptr_t xfer_laddr;        /* Local buffer address */
        uint32_t  xfer_loffset;      /* Relative offset from local buffer */
        uint32_t  xfer_lkey;         /* Access control to local memory */
        uintptr_t xfer_raddr;        /* Remote virtual address */
        uint32_t  xfer_roffset;      /* Offset from the remote address */
        uint32_t  xfer_rkey;         /* Access control to remote address */
        uint32_t  xfer_len;          /* Transfer length */
        uint8_t   xfer_type;         /* 0: read; 1: write; */
} xferinfo_t;

Using the srp Provider

service.d Reports Events

This is a script to produce a report of target online or offline events.

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n\n");
        printf("%-14s   %-35s %-20s\n", "SRP EVENT", 

\
"LOCAL PORT", "EUI NAME");
};

srp:::service-up
{
        printf("%-14s   %-35s %-20s\n", probename, 

\
args[0]->ci_local, args[1]->pi_target);
}

srp:::service-down
{
        printf("%-14s   %-35s %-20s\n", probename, 

\
args[0]->ci_local, args[1]->pi_target);
}

This output shows the host and the number of iSCSI operations:

# dtrace -s ~/src/service.d
Tracing... Hit Ctrl-C to end.
^C
SRP EVENT        LOCAL PORT                          EUI NAME
service-down     fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-down     fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
#

The following table describe the fields in the output.

Field
Description
SRP EVENT
srp event type
LOCAL PORT
GID of the local port
EUI NAME
EUI name of the local port
srpwho.d Reports SRP Events on a Remote HCA Port

This simple script produces a report of the remote HCA port and a count of srp events. This is intended to provide a quick summary of srp activity when run on the SRP target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

srp:::login-command,
srp:::login-response,
srp:::task-command,
srp:::task-response,
srp:::scsi-command,
srp:::scsi-response,
srp:::xfer-start,
srp:::xfer-done
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-33s %14s %8s\n", "REMOTE GID", "iSCSI EVENT", "COUNT");
        printa("   %-33s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

# dtrace -s ./srpwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE GID                           iSCSI EVENT    COUNT
   fe80000000000000:0003ba000100386d  login-command        1
   fe80000000000000:0003ba000100386d login-response        1
   fe80000000000000:0003ba0001003851  login-command        2
   fe80000000000000:0003ba0001003851 login-response        2
   fe80000000000000:0003ba0001003852  login-command        2
   fe80000000000000:0003ba0001003852 login-response        2
   fe80000000000000:0003ba0001004d32      xfer-done        9
   fe80000000000000:0003ba0001004d32     xfer-start        9
   fe80000000000000:0003ba0001004d31      xfer-done       18
   fe80000000000000:0003ba0001004d31     xfer-start       18
   fe80000000000000:0003ba0001004d32   scsi-command       22
   fe80000000000000:0003ba0001004d32  scsi-response       22
   fe80000000000000:0003ba0001004d32   task-command       22
   fe80000000000000:0003ba0001004d32  task-response       22
   fe80000000000000:0003ba0001004d31   scsi-command       42
   fe80000000000000:0003ba0001004d31  scsi-response       42
   fe80000000000000:0003ba0001004d31   task-command       42
   fe80000000000000:0003ba0001004d31  task-response       42

The following table describes the command output fields.

Field
Description
REMOTE GID
GID of the client HCA port
SRP EVENT
srp event type
COUNT
Number of events traced
srpsnoop.d Snoops Local Events on a Server

This simple script snoops srp events when run on a srp target server.

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf("%17s %3s  %-40s %-14s %6s %10s  %6s\n", "TIMESTAMP",
            "CPU", "REMOTE GID", "EVENT", "BYTES", "TAG", "SCSIOP");

        /*
         * SCSI opcode to string translation hash. This is from
         * /usrp/include/sys/scsi/generic/commands.h. If you would
         * rather all hex, comment this out.
         */
        scsiop[0x08] = "read";
        scsiop[0x0a] = "write";
        scsiop[0x0b] = "seek";
        scsiop[0x28] = "read(10)";
        scsiop[0x2a] = "write(10)";
        scsiop[0x2b] = "seek(10)";
}

srp:::login-*
{
        printf("%17d %3d  %-40s %-14s %17d  -\n", timestamp, cpu, 
						 args[0]->ci_remote,
            probename, args[2]->li_task_tag);
}
srp:::task-command,
srp:::task-response,
srp:::scsi-response
{
        printf("%17d %3d  %-40s %-14s %6d %10d  -\n", timestamp, cpu,  
					args[0]->ci_remote,
            probename, 0, args[2]->ti_task_tag);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] != NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  %s\n", timestamp, cpu,  
						args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, scsiop[args[2]->ic_cdb[0]]);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] == NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  0x%x\n", timestamp, cpu,
					args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, args[2]->ic_cdb[0]);
}

srp:::xfer-start,
srp:::xfer-done
{
        printf("%17d %3d  %-40s %-14s %6d %10d   %s\n", timestamp, 
					cpu,  args[0]->ci_remote,
            probename,args[2]->xfer_len, args[3]->ti_task_tag,
            args[2]->xfer_type > 0 ? "READ" : "WRITE");
}

This output shows the snoop on dd commands executed by the initiator.

# dtrace -s ./srpsnoop.d
TIMESTAMP       CPU  REMOTE GID                          EVENT           BYTES   TAG  SCSIOP
22644410404019   3  fe80000000000000:0003ba0001004d31   task-command        0   26  -
22644410493068   3  fe80000000000000:0003ba0001004d31   scsi-command        0   26  read(10)
22644410511422   3  fe80000000000000:0003ba0001004d31   task-command        0   30  -
22644410541494   3  fe80000000000000:0003ba0001004d31   scsi-command        0   30  read(10)
22644410621049   0  fe80000000000000:0003ba0001004d31   xfer-start       2048   26   READ
22644410720486   1  fe80000000000000:0003ba0001004d31   xfer-start      49152   30   READ
22644410681390   3  fe80000000000000:0003ba0001004d31   xfer-done        2048   26   READ
22644410694719   3  fe80000000000000:0003ba0001004d31   scsi-response       0   26  -
22644410703358   3  fe80000000000000:0003ba0001004d31   task-response       0   26  -
22644410895424   3  fe80000000000000:0003ba0001004d31   xfer-done       49152   30   READ
22644410901576   3  fe80000000000000:0003ba0001004d31   scsi-response       0   30  -
22644410905717   3  fe80000000000000:0003ba0001004d31   task-response       0   30  -
22727363721107   3  fe80000000000000:0003ba0001004d31   task-command        0   59  -
22727363919179   0  fe80000000000000:0003ba0001004d31   xfer-start      10240   59   WRITE
22727364095164   0  fe80000000000000:0003ba0001004d31   scsi-response       0   59  -
22727364105406   0  fe80000000000000:0003ba0001004d31   task-response       0   59  -
22727363812953   3  fe80000000000000:0003ba0001004d31   scsi-command        0   59  write(10)
22727363986185   3  fe80000000000000:0003ba0001004d31   xfer-done       10240   59   WRITE

The following table describes the output fields.

Field
Description
CPU
CPU event occurred on
REMOTE GID
GID of the client HCA port
EVENT
srp event type
BYTES
Data bytes
TAG
Initiator task tag
SCSIOP
SCSI opcode as a description, as hex, or '-'

tcp Provider

The tcp provider provides probes for tracing the TCP protocol.

tcp Probes

The tcp probes are described in the following table.

Table 99  tcp Probes
Probe
Description
state-change
Fires a TCP session changes its TCP state. Previous state is noted in the tcplsinfo_t * probe argument. The tcpinfo_t * and ipinfo_t * arguments are NULL.
send
Fires when TCP sends a segment (either control or data).
receive
Fires when TCP receives a segment (either control or data).
connect-request
Fires when a TCP active open is initiated by sending an initial SYN segment. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the initial SYN segment sent.
connect-established
Fires when either of the following occurs: either a TCP active OPEN succeeds - the initial SYN has been sent and a valid SYN,ACK segment has been received in response. TCP enters the ESTABLISHED state, and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the SYN,ACK segment received; or a simultaneous active OPEN succeeds and a final ACK is received from the peer TCP. TCP has entered the ESTABLISHED state and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers of the final ACK received. The common thread in these cases is that an active-OPEN connection is established at this point, in contrast with tcp:::accept-established which fires on passive connection establishment. In both cases above, the TCP segment that is presented via the tcpinfo_t * is the segment that triggers the transition to ESTABLISHED - the received SYN,ACK in the first case and the final ACK segment in the second.
connect-refused
A TCP active OPEN connection attempt was refused by the peer - a RST segment was received in acknowledgment of the initial SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST,ACK segment received.
accept-established
A passive open has succeeded - an initial active OPEN initiation SYN has been received, TCP responded with a SYN,ACK and a final ACK has been received. TCP has entered the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the final ACK segment received.
accept-refused
An incoming SYN has arrived for a destination port with no listening connection, so the connection initiation request is rejected by sending a RST segment acknowledging the SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST segment sent.

The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Oracle Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.

Argument Types for the tcp Provider

The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have five arguments, state-change has six arguments.

Probe
args[0]
args[1]
args[2]
args[3]
args[4]
args[5]
state-change
null
csinfo_t *
null
tcpsinfo_t *
null
tcplsinfo_t *
send
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-request
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-established
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-refused
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
accept-established
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
accept-refused
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique system wide connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
	uint64_t cs_cid;
	pid_t cs_pid;
	zoneid_t cs_zoneid;
 } csinfo_t;
Element
Description
cs_addr
Address of translated ip_xmit_attr_t *.
cs_cid
Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length. However, the IP layer may add headers such as AH and ESP, which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 100  ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
tcpsinfo_t Structure

The tcpsinfo_t structure contains tcp state information.

typedef struct tcpsinfo {
        uintptr tcps_addr;
        int tcps_local;       /* is delivered locally, boolean */
        int tcps_active;       /* active open (from here), boolean */
        uint16_t tcps_lport;      /* local port */
        uint16_t tcps_rport;      /* remote port */
string tcps_laddr;		/* local address, as a string */
string tcps_raddr;		/* remote address, as a string */
int32_t tcps_state;/* TCP state. Use inline tcp_state_string[]to convert to string */
        uint32_t tcps_iss;     /* initial sequence # sent */
        uint32_t tcps_suna;     /* sequence # sent but unacked */
        uint32_t tcps_snxt;     /* next sequence # to send */
        uint32_t tcps_rack;     /* sequence # acked */
        uint32_t tcps_rnxt;     /* next sequence # expected */
        uint32_t tcps_swnd;     /* send window size */
        uint32_t tcps_snd_ws;   /* send window scaling */
        uint32_t tcps_rwnd;     /* receive window size */
        uint32_t tcps_rcv_ws;   /* receive window scaling */
	uint32_t tcps_cwnd;		/* congestion window */
	uint32_t tcps_cwnd_ssthresh;	/* threshold for congestion avoidance */
	uint32_t tcps_sack_fack;	/* SACK sequence # acked */
	uint32_t tcps_sack_snxt;	/* next SACK seq # for retransmission */
        uint32_t tcps_rto;              /* round-trip timeout, msec */
	uint32_t tcps_mss;		/* max segment size */
        int tcps_retransmit;            /* retransmit send event, boolean */
} tcpsinfo_t;

It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, you require this data here.

Table 101  tcpsinfo_t Members
Member
Description
tcps_addr
Address of translated tcp_t *.
tcps_local
Local, boolean. 0: is not delivered locally and uses a physical network interface, 1: is delivered locally including loopback interfaces, such as lo0.
tcps_active
Active open, boolean. 0: TCP connection was created from a remote host, 1: TCP connection was created from this host.
tcps_lport
Local port associated with the TCP connection.
tcps_rport
Remote port associated with the TCP connection.
tcps_laddr
Local address associated with the TCP connection, as a string.
tcps_raddr
Remote address associated with the TCP connection, as a string.
tcps_state
The following states are available for a tcps_state member:
  • TCP_STATE_CLOSED

  • TCP_STATE_IDLE

  • TCP_STATE_BOUND

  • TCP_STATE_LISTEN

  • TCP_STATE_SYN_SENT

  • TCP_STATE_SYN_RECEIVED

  • TCP_STATE_ESTABLISHED

  • TCP_STATE_CLOSE_WAIT

  • TCP_STATE_FIN_WAIT_1

  • TCP_STATE_CLOSING

  • TCP_STATE_LAST_ACK

  • TCP_STATE_FIN_WAIT_2

  • TCP_STATE_TIME_WAIT

Use inline tcp_state_string[] to convert state to a string.
tcps_iss
Initial sequence number sent.
tcps_suna
Lowest sequence number for which you have sent data but not received acknowledgement.
tcps_snxt
Next sequence number to send. tcps_snxt - tcps_suna gives the number of bytes pending acknowledgement for the TCP connection.
tcps_rack
Highest sequence number for which you have received and sent acknowledgement.
tcps_rnxt
Next sequence number expected on receive side. tcps_rnxt - tcps_rack gives the number of bytes you have received but not yet acknowledged for the TCP connection.
tcps_swnd
TCP send window size.
tcps_snd_ws
TCP send window scale. tcps_swnd << tcp_snd_ws gives the scaled window size if window scaling options are in use.
tcps_rwnd
TCP receive window size.
tcps_rcv_ws
TCP receive window scale. tcps_rwnd << tcp_rcv_ws gives the scaled window size if window scaling options are in use.
tcps_cwnd
TCP congestion window size. tcps_cwnd_ssthresh TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins.
tcps_cwnd_ssthresh
TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins.
tcps_sack_fack
Highest SACK-acked sequence number.
tcps_sack_snxt
Next sequence num to be retransmitted using SACK.
tcps_rto
Round-trip timeout. If you do not receive acknowledgement of data sent tcps_rto msec ago, retransmit is required.
tcps_mss
Maximum segment size.
tcps_retransmit
Send is a retransmit, boolean. 1 for tcp:::send events that are retransmissions, 0 for tcp events that are not send events, and for send events that are not retransmissions.
tcplsinfo_t Structure

The tcplsinfo_t structure contains the previous tcp state during a state change.

typedef struct tcplsinfo {
        int32_t tcps_state;              /* TCP state */
} tcplsinfo_t;

The tcps_state member of tcplsinfo_t contains the previous TCP state. Inline definitions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, and so on. Use inline tcp_state_string[] to convert state to a string.

tcpinfo_t Structure

The tcpinfo_t structure is a DTrace translated version of the TCP header.

typedef struct tcpinfo {
        uint16_t tcp_sport;             /* source port */
        uint16_t tcp_dport;             /* destination port */
        uint32_t tcp_seq;               /* sequence number */
        uint32_t tcp_ack;               /* acknowledgment number */
        uint8_t tcp_offset;             /* data offset, in bytes */
        uint8_t tcp_flags;              /* flags */
        uint16_t tcp_window;            /* window size */
        uint16_t tcp_checksum;          /* checksum */
        uint16_t tcp_urgent;            /* urgent data pointer */
        tcph_t *tcp_hdr;                /* raw TCP header */
} tcpinfo_t;
Table 102  tcpinfo_t Members
Member
Description
tcp_sport
TCP source port.
tcp_dport
TCP destination port.
tcp_seq
TCP sequence number.
tcp_ack
TCP acknowledgment number.
tcp_offset
Payload data offset, in bytes not 32-bit words.
tcp_flags
TCP flags. See the tcp_flagstable below for available macros.
tcp_window
TCP window size, bytes.
tcp_checksum
Checksum of TCP header and payload.
tcp_urgent
TCP urgent data pointer, bytes.
tcp_hdr
Pointer to raw TCP header at time of tracing.
Table 103  tcp_flags Values
Value
Description
TH_FIN
No more data from sender (finish).
TH_SYN
Synchronize sequence numbers (connect).
TH_RST
Reset the connection.
TH_PUSH
TCP push function.
TH_ACK
Acknowledgment field is set.
TH_URG
Urgent pointer field is set.
TH_ECE
Explicit congestion notification echo. For more information, see RFC 3168.
TH_CWR
Congestion window reduction.

See RFC 793 for a detailed explanation of the standard TCP header fields and flags.

Using the tcp Provider

Some simple examples of tcp provider usage follow.

Connections by Host Address

This DTrace one-liner counts inbound TCP connections by source IP address:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  127.0.0.1                                                         1
  192.0.2.35/27                                                      1
  fe80::214:4fff:fe8d:59aa                                          1
  192.0.2.14/27                                                     3

The output above shows there were 3 TCP connections from 192.0.2.14, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa.

Connections by TCP Port

This DTrace one-liner counts inbound TCP connections by local TCP port:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

 40648                1
    22                3

The output above shows there were 3 TCP connections for port 22 ssh, a single TCP connection for port 40648 (an RPC port).

Who is Connecting to What

Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:

# dtrace -n 'tcp:::accept-established \
   { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' 
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  192.0.2.35/27                                       40648                1
  fe80::214:4fff:fe8d:59aa                              22                1
  192.0.2.14/27                                         22                3

The output above shows there were three TCP connections from 192.0.2.14/27 to port 22 (ssh).

Who is not Connecting to What

It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC 793:

# dtrace -n 'tcp:::accept-refused \
{ @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' 
dtrace: description 'tcp:::receive ' matched 1 probes
^C

  192.0.2.14/27                                         23                2

Here you traced two failed attempts by host 192.0.2.14 to connect to port 23 (telnet).

Packets by Host Address

This DTrace one-liner counts TCP received packets by host address:

# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.0.2.65/27                                                     43
  192.0.2.14/27                                                    44
  192.0.2.35/27                                                   3722

The output above shows that 7 TCP packets were received from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa.

Packets by Local Port

This DTrace one-liner counts TCP received packets by the local TCP port:

# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

 42303                3
 42634                3
  2049               27
 40648               36
    22              162

The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.

Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:

# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'tcp:::send ' matched 3 probes
^C

  192.0.2.14/27                                     
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    14       
             128 |@@@                                      1        
             256 |                                         0        

  192.0.2.30/27                                      
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@                     7        
              64 |@@@@@@@@@                                3        
             128 |@@@                                      1        
             256 |@@@@@@                                   2        
             512 |@@@                                      1        
            1024 |                                         0        
tcpstate.d Reports TCP State Changes

This DTrace script demonstrates the capability to trace TCP state changes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

int last[int];

dtrace:::BEGIN
{
        printf(" %3s %12s  %-20s    %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW");
}

tcp:::state-change
/ last[args[1]->cs_cid] /
{
        this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000;
        printf(" %3d %12d  %-20s -> %-20s\n", cpu, this->elapsed,
            tcp_state_string[args[5]->tcps_state], 
					 tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;
}

tcp:::state-change
/ last[args[1]->cs_cid] == 0 /
{
        printf(" %3d %12s  %-20s -> %-20s\n", cpu, "-",
            tcp_state_string[args[5]->tcps_state],
            tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;

Run this script on a system for couple of minutes:

# ./tcpstate.d 

 CPU    DELTA(us)  OLD                     NEW                 
   0            -  state-listen         -> state-syn-received  
   0          613  state-syn-received   -> state-established   
   0            -  state-idle           -> state-bound         
   0           63  state-bound          -> state-syn-sent      
   0          685  state-syn-sent       -> state-bound         
   0           22  state-bound          -> state-idle          
   0          114  state-idle           -> state-closed  

In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to established. An outbound connection attempt is also made to a closed port. It takes 63 us to go from bound to syn-sent, 685 us to go from syn-sent to bound.

The following table describes the output fields.

Field
Description
CPU
CPU id for the event
DELTA(us)
time since previous event for that connection, microseconds
OLD
old TCP state
NEW
new TCP state
tcpio.d Reports TCP Packet Details

The following DTrace script traces TCP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %15s:%-5s      %15s:%-5s %6s  %s\n", "CPU",
            "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS");
}

tcp:::send
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d -> %16s:%-5d %6d  (", cpu,
            args[2]->ip_saddr, args[4]->tcp_sport,
            args[2]->ip_daddr, args[4]->tcp_dport, this->length);
}

tcp:::receive
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d <- %16s:%-5d %6d  (", cpu,
            args[2]->ip_daddr, args[4]->tcp_dport,
            args[2]->ip_saddr, args[4]->tcp_sport, this->length);
}

tcp:::send,
tcp:::receive
{
        printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : "");
        printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : "");
        printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : "");
        printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : "");
        printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : "");
        printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : "");
        printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : "");
        printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : "");
        printf("%s", args[4]->tcp_flags == 0 ? "null " : "");
        printf("\b)\n");
}

This example output has captured a TCP handshake:

# ./tcpio.d
 CPU            LADDR:LPORT               RADDR:RPORT  BYTES  FLAGS
   1     192.0.2.8/27:22    ->    192.0.2.40/27:60337    464  (PUSH|ACK)
   1     192.0.2.8/27:22    ->    192.0.2.40/27:60337     48  (PUSH|ACK)
   2     192.0.2.8/27:22    ->    192.0.2.40/27:60337     20  (PUSH|ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337      0  (SYN)
   3     192.0.2.8/27:22    ->    192.0.2.40/27:60337      0  (SYN|ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337      0  (ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337      0  (ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337     20  (PUSH|ACK)
   3     192.0.2.8/27:22    ->    192.0.2.40/27:60337      0  (ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337      0  (ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337    376  (PUSH|ACK)
   3     192.0.2.8/27:22    ->    192.0.2.40/27:60337      0  (ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337     24  (PUSH|ACK)
   2     192.0.2.8/27:22    ->    192.0.2.40/27:60337    736  (PUSH|ACK)
   3     192.0.2.8/27:22    <-    192.0.2.40/27:60337      0  (ACK)

The following table describes the output fields.

Field
Description
CPU
CPU id that event occurred on
LADDR
Local IP address
LPORT
Local TCP port
RADDR
Remote IP address
RPORT
Remote TCP port
BYTES
TCP payload bytes
FLAGS
TCP flags

Note -  The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.

tcp Stability

The tcp provider uses stability mechanism of DTrace to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 104  Stability Mechanism for the tcp Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

udp Provider

The udp provider provides probes for tracing the UDP protocol.

udp Probes

The udp probes are described in the table below.

Table 105  udp Probes Overview
Probe
Description
send
Fires whenever UDP sends a datagram.
receive
Fires whenever UDP receives a datagram.

The send and receive probes trace datagrams on physical interfaces and also packets on loopback interfaces that are processed by udp.

udp Probe Arguments

The argument types for the udp probes are listed in the table below. The arguments are described in the following section.

Table 106  udp Probe Arguments
Probe
args[0]
args[1]
args[2]
args[3]
args[4]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
udpsinfo_t *
udpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
udpsinfo_t *
udpinfo_t *
pktinfo_t Structure

The pktinfo structure is where packet ID info can be made available for deeper analysis, if packet IDs become supported by the kernel. The pkt_addr member is a pointer to the mblk holding the packet, with b_rptr pointing at the start of the relevant protocol specified by pkt_pcap to support packet capture.

typedef struct pktinfo {
  mblk_t *pkt_addr;
  int pkt_pcap;
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
	uint64_t cs_cid;
	pid_t cs_pid;
	zoneid_t cs_zoneid;
 } csinfo_t;
Table 107  csinfo_t Members
Member
Description
cs_addr
Address of translated ip_xmit_attr_t *.
cs_cid
Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in UDP, and so ip_plength is the expected IP payload length. However, the IP layer may add headers such as AH and ESP, which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 108  ipinfo_t Members
Member
Description
ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC 1884 convention 2 with lower case hexadecimal digits.
udpsinfo_t Structure

The udpsinfo_t structure contains udp state info.

typedef struct udpsinfo {
        uintptr_t       udps_addr;
	uint16_t	upds_lport;	/* local port */
	uint16_t	udps_rport;	/* remote port */
	string	        udps_laddr;	/* local address, as a string */
	string	        udps_raddr;	/* remote address, as a string */
} udpsinfo_t;
Table 109  udpsinfo_t Members
Member
Description
udps_addr
Address of translated udp_t *.
udps_lport
Local port associated with the UDP connection.
udps_rport
Remote port associated with the UDP connection.
udps_laddr
Local address associated with the UDP connection, as a string
udps_raddr
Remote address associated with the UDP connection, as a string
udpsinfo_t Structure

The udpinfo_t structure is a DTrace translated version of the UDP header.

typedef struct udpinfo {
	uint16_t udp_sport;             /* source port */
	uint16_t udp_dport;             /* destination port */
	uint16_t udp_length;            /* total length */
	uint16_t udp_checksum;          /* headers + data checksum */
	udpha_t *udp_hdr;               /* raw UDP header */
} udpinfo_t;
Table 110  udpinfo_t Members
Member
Description
udp_sport
UDP source port.
udp_dport
UDP destination port.
udp_length
Payload length in bytes.
udp_checksum
Checksum of UDP header and payload.
udp_hdr
Pointer to raw UDP header at time of tracing.

See RFC 768 for a detailed explanation of the standard UDP header fields and flags.

Using the udp Provider

Some simple examples of udp provider usage follow.

Count of Packets by Host Address

This DTrace one-liner counts UDP received packets by host address:

# dtrace -n 'udp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'udp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.0.2.35/27                                                    43
  192.0.2.14/27                                                    44
  192.0.2.45/27                                                  3722

The output above shows that 7 UDP packets were received from 127.0.0.1, 14 UDP packets from the IPv6 host fe80::214:4fff:fe8d:59aa.

Count of Packets by Local Port

This DTrace one-liner counts UDP received packets by the local UDP port:

# dtrace -n 'udp:::receive { @[args[4]->udp_dport] = count(); }'
dtrace: description 'udp:::receive ' matched 1 probe
^C

 33294                1
 33822                1
 38961                1
 44433                1
 46258                1
 46317                1
 47511                1
 50581                1
 54685                1
 56491                1
 59056                1
 62171                1
 62769                1
 64231                1

The output above shows that 1 packet was received for port 33294, 1 packet was received for port 33822, and so on.

IP Payload Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for UDP sends:

# dtrace -n 'udp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'udp:::send ' matched 6 probes
^C

  198.51.100.5                                     
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14       
              64 |                                         0        

udp Stability

The udp provider uses stability mechanism of DTrace to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 111  Stability Mechanism for the udp Provider
Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA