C H A P T E R  5

Feedback Configuring the nxge Device Driver Parameters

The nxge device driver controls the Sun x8 Express Dual 10 Gigabit Ethernet interfaces. You can manually set the nxge driver parameters to customize each device in your system.

This chapter lists the available device driver parameters and describes how you can set these parameters.


nxge Hardware and Software Overview

The Sun Dual 10 GbE XFP PCI Express Card provides two 10-Gigabit Full Duplex networking interfaces. The device driver automatically sets the link speed to 10000 Mbit/sec and conforms to the IEEE 802.3 Ethernet standard. Each interface has 8 Receive DMA Channels and 12 Transmit DMA Channels to allow for parallel processing of the packets. The Sun Dual 10 GbE XFP PCI Express Card extends CPU and OS parallelism to networking with its support for hardware-based flow classification and multiple DMAs. Using CPU thread affinity to bind a given flow to a specific CPU thread, it enables a one-to-one correlation of Rx and Tx packets across the same TCP connection. This can help avoid cross-calls and context switching to deliver greater performance while reducing the need for CPU resources to support I/O processing. The Sun 10-Gigabit Ethernet Adapter utilizes Sun’s own innovative MAC Controller to map the 10-Gigabit XAUI interface onto the PCI Express form factor. It supports 10 Gb/sec bandwidth using eight transmit and eight receive lanes.


Setting nxge Driver Parameters on an Oracle Solaris Platform

You can set the nxge device driver parameters in two ways:

If you use the ndd utility, the parameters are valid only until you reboot the system. This method is good for testing parameter settings.

To set parameters so they remain in effect after you reboot the system, create a
/platform/sun4u/kernel/drv/nxge.conf file and add parameter values to this file when you need to set a particular parameter for a device in the system.


Setting Parameters Using the ndd Utility

Use the ndd utility to configure parameters that are valid until you reboot the system.

The following sections describe how you can use the nxge driver and the ndd utility to modify (with the -set option) or display (without the -set option) the parameters for each nxge device.

Noninteractive and Interactive Modes

You can use the ndd utility in two modes:

In noninteractive mode, you invoke the utility to execute a specific command. Once the command is executed, you exit the utility. In interactive mode, you can use the utility to get or set more than one parameter value. Refer to the ndd(1M) man page for more information.


procedure icon  To Specify Device Instances for the ndd Utility

Before you use the ndd utility to get or set a parameter for a nxge device, you must specify the device instance for the utility.

1. Check the /etc/path_to_inst file to identify the instance associated with a particular device.


# grep nxge /etc/path_to_inst
"/pci@7c0/pci@0/pci@9/network@0" 0 "nxge"
"/pci@7c0/pci@0/pci@9/network@0,1" 1 "nxge" 


procedure icon  To Specify Parameter Values Using the ndd Utility

This section describes how to modify and display parameter values.

1. To modify a parameter value, use the -set option.

If you invoke the ndd utility with the -set option, the utility passes value, which must be specified, down to the named /dev/nxgedriver_instance, and assigns the value to the parameter:


# ndd -set /dev/nxgeX parameter-value

Where X is the driver instance, for example /dev/nxge0, /dev/nxge1.

2. To display the value of a parameter, specify the parameter name and omit the value.

When you omit the -set option, the utility queries the named driver instance, retrieves the value associated with the specified parameter, and prints it:


# ndd /dev/nxgeX parameter


procedure icon  To Use the ndd Utility in Interactive Mode

1. To modify a parameter value in interactive mode, specify ndd /dev/nxgeX:


# ndd /dev/nxge0
name to get/set? (Enter the parameter name or ? to view all parameters)

After you enter the parameter name, the ndd utility prompts you for the parameter value.

2. To list all the parameters supported by the nxge driver, type ?.


# ndd /dev/nxge1 name to get/set ?
?                             (read only)
function_number               (read only)
adv_autoneg_cap               (read and write)
adv_10gfdx_cap                (read and write)
adv_1000fdx_cap               (read and write)
adv_100fdx_cap                (read and write)
adv_10fdx_cap                 (read and write)
adv_pause_cap                 (read and write)
accept_jumbo                  (read and write)
rxdma_intr_time               (read and write)
rxdma_intr_pkts               (read and write)
class_opt_ipv4_tcp            (read and write)
class_opt_ipv4_udp            (read and write)
class_opt_ipv4_ah             (read and write)
class_opt_ipv4_sctp           (read and write)
class_opt_ipv6_tcp            (read and write)
class_opt_ipv6_udp            (read and write)
class_opt_ipv6_ah             (read and write)
class_opt_ipv6_sctp           (read and write)
 


Setting Parameters Using the nxge.conf File

Specify the driver parameter properties for each device by creating a nxge.conf file in the /kernel/drv directory. Use a nxge.conf file when you need to set a particular parameter for a device in the system.

The man pages for prtconf(1M) and driver.conf(4) include additional details. The next procedure shows an example of setting parameters in a nxge.conf file.

single-step bullet  To access any man page, type the man command plus the name of the man page.

For example, to access man pages for prtconf(1M), type:


% man prtconf


procedure icon  To Set Driver Parameters Using an nxge.conf File

1. Obtain the hardware path names for the nxge devices in the device tree.

a. Check the /etc/driver_aliases file to identify the name associated with a particular device:


# grep nxge /etc/driver_aliases
nxge "pciex108e,abcd"

b. Locate the path names and the associated instance numbers in the
/etc/path_to_inst file.


# grep nxge/etc/path_to_inst
"/pci@780/pci@0/pci@8/network@0" 0 "nxge"
"/pci@780/pci@0/pci@8/network@0,1" 1 "nxge"

To identify a PCI-E device unambiguously in the nxge.conf file, use the name, parent name, and the unit-address for the device. Refer to the pci(4) man page for more information about the PCI-E device specification.

In this example:

2. Set the parameters for the nxge devices in the /platform/sun4u/kernel/drv/nxge.conf file.

a. The following parameters can be set using the /platform/sun4u/kernel/drv/nxge.conf file.


#
#---------------Link Configuration ----------------------
#       The link parameters depend on the type of the card
#       and the port.
#       10 gigabit related parameters ( i.e adv_10gfdx_cap)
#       apply only to 10gigabit ports.
#       Half duplex is not supported on any NIU card.
#
#       adv-autoneg-cap
#               Advertise auto-negotiation capability.
#               default is 1
# adv-autoneg-cap = 1;
#
#       adv_10gfdx_cap
#               Advertise 10gbps Full duplex  capability.
#               default is 1
# adv_10gfdx_cap = 1;
#
#       adv_1000fdx_cap
#               Advertise 1gbps Full duplex  capability.
#               default is 1
# adv_1000fdx_cap = 1;
#
#       adv_100fdx_cap
#               Advertise 100mbps Full duplex  capability.
#               default is 1
# adv_100fdx_cap = 1;
#
#       adv_10fdx_cap
#               Advertise 10mbps Full duplex  capability.
#               default is 1
# adv_10fdx_cap = 1;
#
#       adv_asmpause_cap
#               Advertise Asymmetric pause capability.
#               default is 0
# adv_asmpause_cap = 0;
#
#       adv_pause_cap
#               Advertise pause capability.
#               default is 1
# adv_pause_cap = 1;
#
#
#------- Jumbo frame support ---------------------------------
# To enable jumbo support for all nxge interfaces,
# accept_jumbo = 1;
#
# To disable jumbo support for all nxge interfaces,
# accept_jumbo = 0;
#
# Default is 0.  See the example at the end of this file for
# enabling or disabling jumbo for a particular nxge interface.
#
#
#------- Receive DMA Configuration ----------------------------
#
#  rxdma-intr-time
#       Interrupts after this number of NIU hardware ticks have
#       elapsed since the last packet was received.
#       A value of zero means no time blanking (Default = 8).
#
# rxdma-intr-pkts
#       Interrupt after this number of packets have arrived since
#       the last packet was serviced. A value of zero indicates
#       no packet blanking (Default = 20).
#
# Default Interrupt Blanking parameters.
#
# rxdma-intr-time = 8;
# rxdma-intr-pkts = 20;
#
#
#------- Classification and Load Distribution Configuration ------
#
# class-opt-****-***
#       These variables define how each IP class is configured.
#       Configuration options range from whether TCAM lookup ie
#       is enabled to flow hash generation.
#       This parameters also control how the flow template is
#        constructed and how packet is distributed within RDC
#       groups.
#
#       supported classes:
#       class-opt-ipv4-tcp class-opt-ipv4-udp class-opt-ipv4-sctp
#       class-opt-ipv4-ah class-opt-ipv6-tcp class-opt-ipv6-udp
#       class-opt-ipv6-sctp class-opt-ipv6-ah
#
#       Configuration bits (The following bits will be decoded
#       by the driver as hex format).
#
#       0010:           use MAC Port (for flow key)
#       0020:           use L2DA (for flow key)
#       0040:           use VLAN (for flow key)
#       0080:           use proto (for flow key)
#       0100:           use IP src addr (for flow key)
#       0200:           use IP dest addr (for flow key)
#       0400:           use Src Port (for flow key)
#       0800:           use Dest Port (for flow key)
#
# class-opt-ipv4-tcp = fe0;
#

b. The following parameters operate on a per port basis and can be set using the /platform/sun4u/kernel/drv/nxge.conf file.


#
# ------- How to set parameters for a particular interface --------
# The example below shows how to locate the device path and set a
# parameter for a particular nxge interface. (Using jumbo support as
# an example)
#
# Use the following command to find out the device paths for nxge,
#       more /etc/path_to_inst | grep nxge
#
# For example, if you see,
#       "/pci@7c0/pci@0/pci@8/network@0" 0 "nxge"
#       "/pci@7c0/pci@0/pci@8/network@0,1" 1 "nxge"
#       "/pci@7c0/pci@0/pci@8/network@0,2" 2 "nxge"
#       "/pci@7c0/pci@0/pci@8/network@0,3" 3 "nxge"
#
# then you can enable jumbo for ports 0 and 1 and disable jumbo for ports 2
# and 3 as follows,
#
# name = "pciex108e,abcd" parent = "/pci@7c0/pci@0/pci@8/" unit-address
= "0"
# accept_jumbo = 1;
# name = "pciex108e,abcd" parent = "/pci@7c0/pci@0/pci@8/" unit-address
= "0,1"
# accept_jumbo = 1;
# name = "pciex108e,abcd" parent = "/pci@7c0/pci@0/pci@8/" unit-address
= "0,2"
# accept_jumbo = 0;
# name = "pciex108e,abcd" parent = "/pci@7c0/pci@0/pci@8/" unit-address
= "0,3"
# accept_jumbo = 0;

c. In the following example, the ports of all the Sun Dual 10 GbE XFP PCI Express Card are being set for load balancing Rx traffic based on IP source address. The default value is F80 indicating Rx load balancing based on IP 5-tuple. Notice the semi-colon at the end of the last parameter.


class-opt-ipv4-tcp = 100;
class-opt-ipv4-udp = 100;

d. The following example shows ports on two different cards being set. Only one node needs to be specified.


name = "pciex108e,abcd" parent = "/pci@780/pci@0/pci@8/" unit-address = "0" class-opt-ipv4-tcp = 0x100;
 
name = "pciex108e,abcd" parent = "/pci@7c0/pci@0/pci@9/" unit-address = "0" class-opt-ipv4-tcp = 0x40;

3. Save the nxge.conf file.


Tuning for Maximum Performance on an Oracle Solaris Platform

Tuning for maximum performance in an Oracle Solaris platform depends on whether you are using an UltraSPARC CPU based platform or an AMD CPU based platform.


procedure icon  To Improve Performance on an UltraSPARC CPU Based Sun Platform

1. Improve performance by adding the following /etc/system file:


# set ddi_msix_alloc_limit=4

Increasing the MSI improves the Rx performance. The default value for MSI is 2, but changing it to 4 improves performance (8 can be used for UltraSparc-T1 based systems).

2. Reboot the system:


# reboot -r

3. Add the following to a startup script, or use ndd before plumbing the interface:


# ndd -set /dev/ip ip_soft_rings_cnt 8

Utilizing more soft-rings provided by the Oracle Solaris TCP/IP stack significantly improves bulk throughput for Rx. The default number of soft-rings is 2, but changing it to 8 improves performance. (You can increase the number to 16 in UltraSparc-T1 based systems).


procedure icon  To Improve Performance on an AMD CPU Based Sun Platform

1. Enable soft-rings and change to a higher value than the default of 2 by adding the following to the /etc/system file:


set ip:ip_squeue_fanout=1
set ip_squeue_soft_ring=1

Bulk throughput for Rx can be significantly improved by utilizing more soft-rings provided by the Oracle Solaris TCP/IP stack. Soft-rings are disabled by default in previous and current releases of the Oracle Solaris-x86 Operating System.

2. Reboot the system:


# reboot -r

3. Set the MSI to 1 on AMD platforms by adding following to the /etc/system file:


set ddi_msix_alloc_limit=1

4. Reboot the system:


# reboot -r


procedure icon  To Obtain Higher Throughput Using the Generic Tunables for the Oracle Solaris TCP/IP Stack

single-step bullet  To obtain higher throughput, add the following to a startup script:


ndd -set /dev/tcp tcp_conn_req_max_q 8192
ndd -set /dev/tcp tcp_conn_req_max_q0 8192
ndd -set /dev/tcp tcp_max_buf 4194304
ndd -set /dev/tcp tcp_cwnd_max 2097152
ndd -set /dev/tcp tcp_recv_hiwat 400000
ndd -set /dev/tcp tcp_xmit_hiwat 400000


Setting Parameters on a Linux Platform


procedure icon  To Set Parameters Using the ethtool Utility

1. Determine which parameters are available using the ethtool utility:


# ethtool -help eth4
ethtool version 1.8
Usage:
       ethtool DEVNAME
       ethtool -a DEVNAME
       ethtool -A DEVNAME \
               [ autoneg on|off ] \
               [ rx on|off ] \
               [ tx on|off ]
       ethtool -c DEVNAME
       ethtool -C DEVNAME \
               [adaptive-rx on|off] \
               [adaptive-tx on|off] \
               [rx-usecs N] \
               [rx-frames N] \
               [rx-usecs-irq N] \
               [rx-frames-irq N] \
               [tx-usecs N] \
               [tx-frames N] \
               [tx-usecs-irq N] \
               [tx-frames-irq N] \
               [stats-block-usecs N] \
               [pkt-rate-low N] \
               [rx-usecs-low N] \
               [rx-frames-low N] \
               [tx-usecs-low N] \
               [tx-frames-low N] \
               [pkt-rate-high N] \
               [rx-usecs-high N] \
               [rx-frames-high N] \
               [tx-usecs-high N] \
               [tx-frames-high N] \
               [sample-interval N]
       ethtool -g DEVNAME
       ethtool -G DEVNAME \
               [ rx N ] \
               [ rx-mini N ] \
               [ rx-jumbo N ] \
               [ tx N ]
       ethtool -i DEVNAME
       ethtool -d DEVNAME
       ethtool -e DEVNAME \
               [ raw on|off ] \
               [ offset N ] \
               [ length N ]
       ethtool -E DEVNAME \
               [ magic N ] \
               [ offset N ] \
               [ value N ]
       ethtool -k DEVNAME
       ethtool -K DEVNAME \
               [ rx on|off ] \
               [ tx on|off ] \
               [ sg on|off ] \
               [ tso on|off ]
       ethtool -r DEVNAME
       ethtool -p DEVNAME [ %d ]
       ethtool -t DEVNAME [online|(offline)]
       ethtool -s DEVNAME \
               [ speed 10|100|1000 ] \
               [ duplex half|full ]    \
               [ port tp|aui|bnc|mii|fibre ] \
               [ autoneg on|off ] \
               [ phyad %d ] \
               [ xcvr internal|external ] \
               [ wol p|u|m|b|a|g|s|d... ] \
               [ sopass %x:%x:%x:%x:%x:%x ] \
               [ msglvl %d ]
       ethtool -S DEVNAME

Following are some common parameters that can be changed:


# ethtool -c eth8
Coalesce parameters for eth8:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
 
rx-usecs: 8
rx-frames: 512
rx-usecs-irq: 0
rx-frames-irq: 512
 
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
 
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
 
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame--high: 0

rx-usecs and rx-frames control the RX interrupt rate per RX DMA channel. RX interrupt will be generated after rx-frames have been received or after rx-usecs time interval if fewer than rx-frames have been received within the interval. For low latency applications, it is recommended to set rx-usecs to smaller value. For bulk traffic, it is recommended to use larger values of rx-usecs and control the rate with rx-frames.

rx-frames-irq controls the maximum number of rx packets processed with a single RX interrupt.

2. To change RX interrupt Coalesce parameters use the ethtool -C command:


# ethtool -C eth4 rx-usecs 20
# ethtool -c eth4
Coalesce parameters for eth4:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
 
rx-usecs: 20
rx-frames: 512
rx-usecs-irq: 0
rx-frames-irq: 512
 
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
 
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
 
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

3. To get status of L4 HW checksumming, use the ethtool -k command:


# ethtool -k eth4
Offload parameters for eth4:
Cannot get device tcp segmentation offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off


procedure icon  To Set Parameters Using the Bundled configtool Utility

1. To get a list of tunable parameters, use the nxge_config if_name get command:


# /usr/local/bin/nxge_config eth4 get
The tunable parameters exported by this device are:
 
class_opt_ipv4_tcp                              Read-Write
class_opt_ipv4_udp                              Read-Write
class_opt_ipv4_ah                               Read-Write
class_opt_ipv4_sctp                             Read-Write
class_opt_ipv6_tcp                              Read-Write
class_opt_ipv6_udp                              Read-Write
class_opt_ipv6_ah                               Read-Write
class_opt_ipv6_sctp                             Read-Write

These classification variables define how each IP class is configured. This parameter also controls how the flow template is constructed and how packets are distributed within RDC groups.


Configuration bits:
 
       0x0010:         use MAC Port (for flow key)
       0x0020:         use L2DA (for flow key)
       0x0040:         use VLAN (for flow key)
       0x0080:         use proto (for flow key)
       0x0100:         use IP src addr (for flow key)
       0x0200:         use IP dest addr (for flow key)
       0x0400:         use Src Port (for flow key)
       0x0800:         use Dest Port (for flow key)



Note - The classification variables are modified on an adapter basis, that is, if any of these variables is modified for one port, the change carries over to all other ports of that adapter.


2. To get a particular variable use the nxge_config if_name get param_name:


# /usr/local/bin/nxge_config eth4 get class_opt_ipv4_udp
  class_opt_ipv4_udp                0xfe3

3. To set a particular variable, use the /usr/local/bin/nxge_config if_name set param_name value:


# /usr/local/bin/nxge_config eth4 set class_opt_ipv4_tcp 0xfe0


Tuning for Maximum Performance on a Linux Platform

The following tuning improves the performance of the Sun x8 Express Dual 10 Gigabit Ethernet device driver on a system running the Linux operating system.

1. Create the conf file (for example, sysctl_nxge.conf) that will be called by the sysctl utility.


### IPV4 specific settings
# turns TCP timestamp support off, default 1, reduces CPU use
net.ipv4.tcp_timestamps = 0
# turn SACK support off, default on systems with a VERY fast bus ->
# memory interface this is the big gainer
net.ipv4.tcp_sack = 0
# sets min/default/max TCP read buffer, default 4096 87380 174760
net.ipv4.tcp_rmem = 10000000 10000000 10000000
# sets min/pressure/max TCP write buffer, default 4096 16384 131072
net.ipv4.tcp_wmem = 10000000 10000000 10000000
# sets min/pressure/max TCP buffer space, default 31744 32256 32768
net.ipv4.tcp_mem = 10000000 10000000 10000000
 
### CORE settings (mostly for socket and UDP effect)
# maximum receive socket buffer size, default 131071
net.core.rmem_max = 524287
# maximum send socket buffer size, default 131071
net.core.wmem_max = 524287
# default receive socket buffer size, default 65535
net.core.rmem_default = 524287
# default send socket buffer size, default 65535
net.core.wmem_default = 524287
# maximum amount of option memory buffers, default 10240
net.core.optmem_max = 524287
# number of unprocessed input packets before kernel starts dropping
# them, default 300
net.core.netdev_max_backlog = 300000

2. Set up the sysctl utility.


# sysctl -p /etc/sysctl_nxge.conf

 

 

Feedback