Receive Packet Classification

This chapter describes the basic functions of the Receive Packet Classification and the Sun Netra DPS software interface. Topics include:

Receive Packet Classification Introduction

The Sun multithreaded 10GbE with network interface unit (NIU) networking hardware consists of a Receive Packet Classifier that performs L2/L3/L4 header parsing, matching and searching functions. Sun Netra DPS provides the software interface to utilize this hardware mechanism.

This classification spreads traffic flows across multiple CPUs so that each CPU hardware strand shares the load of 10 Gbps processing. By spreading the load across at least eight pipelines, packets are processed at 10Gbps preventing overloading of processing power on a particular processing unit.

This classification refers to blocking, re-routing, or to perform special processing to certain traffic types from the incoming traffic stream.

This classification sustains forwarding of 10Gbps of incoming traffic with a relatively small packet size from the 10Gbps Ethernet ingress port to the 10Gbps egress port. Traffic must be spread into multiple DMA channels for processing.

Supported Networking Interfaces

Sun Multithreaded 10GbE and NIU Receive Packet Classifier

Sun multithreaded PCIe 10GbE, PCIe 4GbE, and 10GbE NIU supports two ways to spread input packets:

Determines the target DMA channel based on a L2 RDC group and then a hash algorithm applied on the defined values of L2/L3/L4 header fields.

Determines the target DMA channel based on the values of L2/L3/L4 header fields with the help of hardware lookup tables and TCAM preprogrammed with matching rules.

Receive DMA Channel Selection

In Sun Multithreaded 10-Gb Ethernet and NIU, there are a total of 16 Receive DMA Channels (RDCs) in hardware. These Receive DMA channels are organized into Receive DMA Channel Groups (RDC Groups). Each RDC Channel Group can have up to 16 RDC entries. During receive, a RDC group (identified by the RDC group number) is selected to be used. For packets that pass through classification successfully, with no L2 CRC error or IP checksum error, the Receive DMA Group number and the offset from the hardware classifier will be used to select the DMA channel. For packets with checksum errors, the offset will be changed to zero to select the default within the group. A RDC hardware RDC table holds the content of each RDC group. Each table consists of the following entries:

TABLE 8-1 RDC Table
Table Entry	RDC Number
0	RDCn
1	RDCn
2	RDCn
3	RDCn
4	RDCn
5	RDCn
6	RDCn
7	RDCn
8	RDCn
9	RDCn
10	RDCn
11	RDCn
12	RDCn
13	RDCn
14	RDCn
15	RDCn

In the default configuration, each Ethernet port is associated with a default RDC table and all classification results will be based on the value of this RDC table. The RDC used for receive is determined by the RDC table entry that is indexed by the offset value generated by the classifier.

The following tables show the contents of the default RDC table for each reference configuration:

TABLE 8-2 Default RDC Table Content for NIU 1-Port x 10-Gb Configuration
RDC Table #0 at `port0`
0	0
1	1
2	2
3	3
4	4
5	5
6	6
7	7
8	0
9	1
10	2
11	3
12	4
13	5
14	6
15	7

In this configuration, the RDC table entry 0 is bound to port0 as the default RDC table entry. All classification results will end up in one of the table entries in this table. The target RDC used to carry traffic will be in a range from RDC#0 to RDC#7.

TABLE 8-3 Default RDC Table Content for NIU 2-Port x 10-Gb Configuration
RDC Table #0 at `port0`	RDC Table#8 at `port1`
0	0	0	0
1	1	1	1
2	2	2	2
3	3	3	3
4	4	4	4
5	5	5	5
6	6	6	6
7	7	7	7
8	0	8	0
9	1	9	1
10	2	10	2
11	3	11	3
12	4	12	4
13	5	13	5
14	6	14	6
15	7	15	7

In this configuration, entry 0 is bound to port0 as the default RDC table entry for port0. Entry 8 is bound to port1 as the default RDC table entry. All classification results will end up in one of the table entries in these two table. The target RDC used to carry traffic will be in a range from RDC#0 to RDC#7.

TABLE 8-4 4-Port x 1-G Default Configuration
RDC Table#0 at `port0`	RDC Table#2 at `port2`	RDC Table#3 at `port3`
1	1	1	1	2	1	3
2	2	1	2	2	2	3
3	3	1	3	2	3	3
4	4	1	4	2	4	3
5	5	1	5	2	5	3
6	6	1	6	2	6	3
7	7	1	7	2	7	3
8	8	1	8	2	8	3
9	9	1	9	2	9	3
10	10	1	10	2	10	3
11	11	1	11	2	11	3
12	12	1	12	2	12	3
13	13	1	13	2	13	3
14	14	1	14	2	14	3
15	15		15	2	15	3

In this configuration, entry 0 is binded to port0 as the default RDC table entry for port0. Entry 1 is binded to port1 as the default RDC table entry, and so on, up to 4 ports. All classification results end up in one of the table entries in these four table. Only one RDC is used for each port.

The following I/O control functions can be used to override the default RDC configuration:

The following I/O control functions show the current RDC group contents and configuration:

Hashing Based on Layer 2, Layer 3, and Layer 4 Header Classification

The procedure of hashing includes a hash lookup table based on the hash key. The hash key is created by applying a hash algorithm to a flow key and the flow key is generated from extracting certain fields from Layer 2, Layer 3, and Layer 4 (L2/L3/L4) packet headers.

The header fields in the flow key selections consist of the following individual header fields:

Hash Algorithm

The hashing algorithm is based on polynomial hashing with CRC-32C. The algorithm is a 32-bit hash value. The last four bits of the value is used to index into a hardware hash table to lookup a DMA channel. In a Sun Netra DPS environment, one RDC table is used. The DMA channel number is one-to-one corresponding to the RDC table entry number, the value of the last four bits, therefore, equals the DMA channel number.

X³² + x²⁸ + X²⁷ + X²⁶ + X²⁵ + X²³ + X²² + X²⁰ + X¹⁹ + X¹⁸ + X¹⁴ + X¹³ + X¹¹ + X¹⁰ + X⁹ + X⁸ + X⁶ + 1

Hash Key

The hash key is generated by a seed value. The following driver parameter can be used to modify the hash key:

Application

Use hashing for general load spreading and load balancing applications. The traffic load of each DMA channel depends on the value in the header fields used for the hash. Since the target DMA channel is determined by a polynomial, the correlation between the header value and the target DMA channel cannot be easily determined. How balance of the DMA channels are spread also depends on the value and range of the header fields. Hashing is considered a general purpose load spreading scheme.

Hash Policy

Hashing is enabled by default. The hash policy is determined by setting the FLOW_POLICY to one of the values shown in TABLE 8-5:

TABLE 8-5 Hash Policy Values
Value	Meaning
`HASH_IP_ADDR`	Hash on IP destination and source addresses
`HASH_IP_DA`	Hash on IP destination address
`HASH_IP_SA`	Hash on IP source address
`HASH_VLAN_ID`	Hash on VLAN ID
`HASH_PORTNUM`	Hash on physical MAC port number
`HASH_L2DA`	Hash on L2 destination address
`HASH_PROTO`	Hash on protocol number
`HASH_SRC_PORT`	Hash on L4 source port number
`HASH_DST_PORT`	Hash on L4 destination port number
`HASH_ALL`	Hash on all of the above fields
`TCAM_CLASSIFY`	Perform TCAM lookup

The default FLOW_POLICY is set to HASH_ALL, meaning that the hash hardware hash algorithm is applied on all of the above header fields. To disable hash, set FLOW_POLICY to 0 or TCAM_CLASSIFY. When set to 0, no traffic spreading is performed. All traffic ends up at a default DMA channel. When set to TCAM_CLASSIFY, traffic spreading is determined by predefined flow specifications.

Flow Match Based on Layer 2, Layer 3, and Layer 4 Header Classification

Layer 2 (L2) Classification

The layer 2 parser (part of the classification hardware) parses the following information from an Ethernet frame:

Upon receiving this information, the classifier selects a RDC table to be used for further classification. L2 classification can be based on the following criteria:

For VLAN frames, the VLAN ID is used to index into a VLAN table to determine the RDC table number to be used for further classification. The VLAN table consists of 4-K entries. Each entry specifies a VLAN ID and its corresponding target RDC table number.

The target RDC table can also be determined based on the MAC address information. This information includes the MAC address type (for example, unicast, multicast, self address, address filter, or flow control) and the address.

Because both VLAN table and MAC address table can set the preference, the arbitration between VLAN table and MAC address table is done by setting the priority field in each of these two tables.

Layer 3 and Layer 4 (L3/L4) Classification

L3/L4 header classification relies on the TCAM hardware to determine how traffic flows are distributed. There are multiple TCAM hardware entries (256 in Sun multithreaded 10GbE, 128 in NIU) for specifying flow specification. The CAM lookup table key generation use the concept of classes of packets to assemble a key. With the CAM key, a packet goes through a single CAM lookup table for an associative search. The L3/L4 header classification starts when the header parse identifies the incoming L2/L3 packet type.

Applications

Use flow tables and TCAM to direct a particular type of traffic flow (with different traffic classes) into particular DMA channels. Flow tables and TCAM are ideal for use in load balancing applications.

Classification Programming Interface

The interface to the Flow Matching scheme is the ETH_IOC_SET_CLASSSIFY
“IO Control” command of the Sun Netra DPS Ethernet interface. The following shows the calling convention of the interface:

ihdlnet[] is an array of device driver handle indexed by the Ethernet port number [port]. ETH_IOC_SET_CLASSIFY is the set classifier command.

typedef struct classify_ioc_s { uint_t opcode; uint_t action; flow_spec_t flow_spec; } classify_ioc_t;

opcode

opcode specifies what to do about a new traffic flow. TABLE 8-6 shows possible opcode values:

TABLE 8-6 `opcode` Values
Value	Meaning
`IOC_ADD_CLASSIFY`	Add a flow.
`IOC_INVALIDATE_CLASSIFY`	Invalidate a flow.

action

action specifies what action to take when there is a match. TABLE 8-7 shows possible action values:

TABLE 8-7 `action` Values
Value	Meaning
`IOC_FLOW_ACCEPT`	Accept when a match.
`IOC_FLOW_DISCARD`	Discard when a match.

flow_spec

flow_spec is the flow specification specifying the characteristics of the IPv4 and IPv6 flow. The following shows the flow_spec structure:

typedef struct flow_spec_ipv4_s {
        uint8_t protocol;
        uint8_t tos;
        union {
                port_t tcp;
                port_t udp;
                spi_port_t spi;
        } port;
        uint32_t src;
        uint32_t dst;
} flow_spec_ipv4_t;
 
typedef struct flow_spec_ipv6_s {
        uint8_t protocol;
        uint8_t tos;
        union {
                port_t tcp;
                port_t udp;
                spi_port_t spi;
        } port;
        struct in6_addr src;
        struct in6_addr dst;
} flow_spec_ipv6_t;

fs_type

TABLE 8-8 `fs_type` Possible Values
Value	Meaning
`FSPEC_TCPIP4`	TCP over IPv4
`FSPEC_UDPIP4`	UDP over IPv4
`FSPEC_AHIP4`	IPSEC/AH over IPv4
`FSPEC_ESPIP4`	IPSEC/ESP over IPv4
`FSPEC_SCTPIP4`	SCTP over IPv4
`FSPEC_TCPIP6`	TCP over IPv6
`FSPEC_UDPIP6`	UDP over IPv6
`FSPEC_AHIP6`	IPSEC/AH over IPv6
`FSPEC_ESPIP6`	IPSEC/ESP over IPv6
`FSPEC_SCTPIP6`	SCTP over IPv6

index

This is the index into the TCAM entries (for L3/L4 TCAM classification) or index into the MAC or VLAN table (for L2 MAC/VLAN classification).

channel

ue or um

ue is the 5-tuple for IPv4 or 4-tuple for IPv6 structure for L3/L4 TCAM classification. For L2 classification, it is the L2 header structure. um is the bit-mask corresponding to the ue. Set 1 to bit-mask for don’t care (not to compare). Set 0 in bit-mask to compare.

hd

flow_spec_ipv4_tab_s

typedef struct flow_spec_ip4_tab_s {
        int             index;
        uint8_t          protocol;
        uint8_t          tos;
        uint8_t          tos_mask;
        uint16_t         src_port;
        uint16_t         src_port_mask;
        uint16_t         dst_port;
        uint16_t         dst_port_mask;
        char            *src_addr;
        char            *src_addr_mask;
        char            *dst_addr;
        char            *dst_addr_mask;
        int             action;
        uint8_t          dma_chan;
} flow_spec_ip4_tab_t;

flow_spec_ipv6_t

typedef struct flow_spec_ipv6_s {
        uint8_t protocol;
        union {
                    port_t tcp;
                    port_t udp;
                    spi_port_t spi;
        } port;
        uint8_t src[16];
        uint8_t dst[16];
} flow_spec_ipv6_t;u

flow_spec_l2_t

typedef struct flow_spec_l2_s {
                   uint8_t dst[6];         /* MAC address */
                uint8_t src[6];         /* MAC address */
                uint16_t type;          /* Ether type */
                uint16_t vlantag;       /* VLANID|CFI|PRI */
} flow_spec_l2_t;

Examples

% gmake .... FLOW_POLICY=HASH_ALL

This command tells Sun multithreaded 10GbE with NIU hardware to hash on all L2/L3/L4 header fields.

typedef struct flow_spec_ip4_tab_s {
        int             index;
        uint8_t          protocol;
        uint8_t          tos;
        uint8_t          tos_mask;
        uint16_t         src_port;
        uint16_t         src_port_mask;
        uint16_t         dst_port;
        uint16_t         dst_port_mask;
        char            *src_addr;
        char            *src_addr_mask;
        char            *dst_addr;
        char            *dst_addr_mask;
        int             action;
        uint8_t          dma_chan;
} flow_spec_ip4_tab_t;

flow_spec_ip4_tab_t ip4_flow_tab[] = {
        {0, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.1", "255.255.255.0",
                                FLOW_ACCEPT, 0},
        {1, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.2", "255.255.255.0",
                                FLOW_ACCEPT, 1},
        {2, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.3", "255.255.255.0",
                                FLOW_ACCEPT, 2},
        {3, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.4", "255.255.255.0",
                                FLOW_ACCEPT, 3},
        {4, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.5", "255.255.255.0",
                                FLOW_ACCEPT, 4},
        {5, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.6", "255.255.255.0",
                                FLOW_ACCEPT, 5},
        {6, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.7", "255.255.255.0",
                                FLOW_ACCEPT, 6},
        {7, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.8", "255.255.255.0",
                                FLOW_ACCEPT, 7},
        {-1, 0, 0, 0, 0, 0, 0, 0, "", "", "", "", 0, 0}
};

3. Write a parsing function to parse the entries in the table as shown in the below example.

void
classify_parse_entries(uint_t flow_cfg, uin8_t port,
                        uint8_t chan, flow_spec_ip4_tab_t *fe)
{
        classify_ioc_t clsfy_ioc;
        int status;
        int i;
 
        for (i = 0; fe[i].index != -1; i++) {
                if (fe[i].dma_chan != chan)
                        continue;
                clsfy_ioc.opcode = IOC_ADD_CLASSIFY;
                clsfy_ioc.flow_spec.fs_type = FSPEC_UDPIP4;
                clsfy_ioc.flow_spec.index = fe[i].index;
                clsfy_ioc.flow_spec.channel = fe[i].dma_chan;
                clsfy_ioc.flow_spec.ue.ip4.protocol = fe[i].protocol;
                clsfy_ioc.flow_spec.ue.ip4.tos = fe[i].tos;
                clsfy_ioc.flow_spec.ue.ip4.port.udp.src = fe[i].src_port;
                clsfy_ioc.flow_spec.ue.ip4.port.udp.dst = fe[i].dst_port;
                status = inet_pton(AF_INET, (char *)fe[i].src_addr,
                                (char *)&clsfy_ioc.flow_spec.ue.ip4.src);
                if (status != 1)
                        goto fail;
                status = inet_pton(AF_INET, (char *)fe[i].dst_addr,
                                (char *)&clsfy_ioc.flow_spec.ue.ip4.dst);
                if (status != 1)
                        return;
                clsfy_ioc.flow_spec.um.ip4.tos = ~fe[i].tos_mask;
                clsfy_ioc.flow_spec.um.ip4.port.udp.src = ~fe[i].src_port_mask;
                clsfy_ioc.flow_spec.um.ip4.port.udp.dst = ~fe[i].dst_port_mask;
                status = inet_pton(AF_INET, (char *)fe[i].src_addr_mask,
                                (char *)&clsfy_ioc.flow_spec.um.ip4.src);
                if (status != 1)
                        goto fail;
                clsfy_ioc.flow_spec.um.ip4.src =
                                        ~clsfy_ioc.flow_spec.um.ip4.src;
                status = inet_pton(AF_INET, (char *)fe[i].dst_addr_mask,
                                (char *)&clsfy_ioc.flow_spec.um.ip4.dst);
                if (status != 1)
                        goto fail;
                clsfy_ioc.flow_spec.um.ip4.dst =
                                        ~clsfy_ioc.flow_spec.um.ip4.dst;
                if (fe[i].action == FLOW_ACCEPT)
                        clsfy_ioc.action = IOC_FLOW_ACCEPT;
                else
                        clsfy_ioc.action = IOC_FLOW_DISCARD;
 
                /* Program the TCAM HW */
                (void) eth_ioc(ihdlnet[port], ETH_IOC_SET_CLASSIFY,
                                                        (void *)&clsfy_ioc);
        }
}

4. During the build, enable TCAM classification and disable hashing. To do this, type:

gmake .... FLOW_POLICY=TCAM_CLASSIFY

This command enables Sun multithreaded 10-Gb Ethernet with NIU hardware to enable TCAM classification with matching rules as shown in Step 1 to Step 3.