C H A P T E R  8

Receive Packet Classification

This chapter describes the basic functions of the Receive Packet Classification and the Sun Netra DPS software interface. Topics include:


Receive Packet Classification Introduction

The Sun multithreaded 10GbE with network interface unit (NIU) networking hardware consists of a Receive Packet Classifier that performs L2/L3/L4 header parsing, matching and searching functions. Sun Netra DPS provides the software interface to utilize this hardware mechanism.

Classification is needed for the following reasons:

This classification spreads traffic flows across multiple CPUs so that each CPU hardware strand shares the load of 10 Gbps processing. By spreading the load across at least eight pipelines, packets are processed at 10Gbps preventing overloading of processing power on a particular processing unit.

This classification refers to blocking, re-routing, or to perform special processing to certain traffic types from the incoming traffic stream.

This classification sustains forwarding of 10Gbps of incoming traffic with a relatively small packet size from the 10Gbps Ethernet ingress port to the 10Gbps egress port. Traffic must be spread into multiple DMA channels for processing.


Supported Networking Interfaces

The following network interfaces support classification:


Sun Multithreaded 10GbE and NIU Receive Packet Classifier

Sun multithreaded PCIe 10GbE, PCIe 4GbE, and 10GbE NIU supports two ways to spread input packets:

Determines the target DMA channel based on a L2 RDC group and then a hash algorithm applied on the defined values of L2/L3/L4 header fields.

Determines the target DMA channel based on the values of L2/L3/L4 header fields with the help of hardware lookup tables and TCAM preprogrammed with matching rules.


Receive DMA Channel Selection

In Sun Multithreaded 10-Gb Ethernet and NIU, there are a total of 16 Receive DMA Channels (RDCs) in hardware. These Receive DMA channels are organized into Receive DMA Channel Groups (RDC Groups). Each RDC Channel Group can have up to 16 RDC entries. During receive, a RDC group (identified by the RDC group number) is selected to be used. For packets that pass through classification successfully, with no L2 CRC error or IP checksum error, the Receive DMA Group number and the offset from the hardware classifier will be used to select the DMA channel. For packets with checksum errors, the offset will be changed to zero to select the default within the group. A RDC hardware RDC table holds the content of each RDC group. Each table consists of the following entries:


TABLE 8-1 RDC Table

Table Entry

RDC Number

0

RDCn

1

RDCn

2

RDCn

3

RDCn

4

RDCn

5

RDCn

6

RDCn

7

RDCn

8

RDCn

9

RDCn

10

RDCn

11

RDCn

12

RDCn

13

RDCn

14

RDCn

15

RDCn


Where n is any number between 0 and 15.

In the default configuration, each Ethernet port is associated with a default RDC table and all classification results will be based on the value of this RDC table. The RDC used for receive is determined by the RDC table entry that is indexed by the offset value generated by the classifier.

The following tables show the contents of the default RDC table for each reference configuration:


TABLE 8-2 Default RDC Table Content for NIU 1-Port x 10-Gb Configuration

RDC Table #0 at port0

Table Entry

RDC Number

0

0

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

0

9

1

10

2

11

3

12

4

13

5

14

6

15

7


In this configuration, the RDC table entry 0 is bound to port0 as the default RDC table entry. All classification results will end up in one of the table entries in this table. The target RDC used to carry traffic will be in a range from RDC#0 to RDC#7.


TABLE 8-3 Default RDC Table Content for NIU 2-Port x 10-Gb Configuration

RDC Table #0 at port0

RDC Table#8 at port1

Table Entry

RDC Number

Table Entry

RDC Number

0

0

0

0

1

1

1

1

2

2

2

2

3

3

3

3

4

4

4

4

5

5

5

5

6

6

6

6

7

7

7

7

8

0

8

0

9

1

9

1

10

2

10

2

11

3

11

3

12

4

12

4

13

5

13

5

14

6

14

6

15

7

15

7


In this configuration, entry 0 is bound to port0 as the default RDC table entry for port0. Entry 8 is bound to port1 as the default RDC table entry. All classification results will end up in one of the table entries in these two table. The target RDC used to carry traffic will be in a range from RDC#0 to RDC#7.


TABLE 8-4 4-Port x 1-G Default Configuration

RDC Table#0 at port0

RDCTable#1 at port1

RDC Table#2 at port2

RDC Table#3 at port3

0

0

0

1

0

2

0

3

1

0

1

1

1

2

1

3

2

0

2

1

2

2

2

3

3

0

3

1

3

2

3

3

4

0

4

1

4

2

4

3

5

0

5

1

5

2

5

3

6

0

6

1

6

2

6

3

7

0

7

1

7

2

7

3

8

0

8

1

8

2

8

3

9

0

9

1

9

2

9

3

10

0

10

1

10

2

10

3

11

0

11

1

11

2

11

3

12

0

12

1

12

2

12

3

13

0

13

1

13

2

13

3

14

0

14

1

14

2

14

3

15

0

15

 

15

2

15

3


In this configuration, entry 0 is binded to port0 as the default RDC table entry for port0. Entry 1 is binded to port1 as the default RDC table entry, and so on, up to 4 ports. All classification results end up in one of the table entries in these four table. Only one RDC is used for each port.

The following I/O control functions can be used to override the default RDC configuration:

The following I/O control functions show the current RDC group contents and configuration:


Hashing Based on Layer 2, Layer 3, and Layer 4 Header Classification

The procedure of hashing includes a hash lookup table based on the hash key. The hash key is created by applying a hash algorithm to a flow key and the flow key is generated from extracting certain fields from Layer 2, Layer 3, and Layer 4 (L2/L3/L4) packet headers.

The header fields in the flow key selections consist of the following individual header fields:

Hash Algorithm

The hashing algorithm is based on polynomial hashing with CRC-32C. The algorithm is a 32-bit hash value. The last four bits of the value is used to index into a hardware hash table to lookup a DMA channel. In a Sun Netra DPS environment, one RDC table is used. The DMA channel number is one-to-one corresponding to the RDC table entry number, the value of the last four bits, therefore, equals the DMA channel number.

X32 + x28 + X27 + X26 + X25 + X23 + X22 + X20 + X19 + X18 + X14 + X13 + X11 + X10 + X9 + X8 + X6 + 1

Hash Key

The hash key is generated by a seed value. The following driver parameter can be used to modify the hash key:

nxge_fflp_h1

It is set to 0xffffffff by default.

Application

Use hashing for general load spreading and load balancing applications. The traffic load of each DMA channel depends on the value in the header fields used for the hash. Since the target DMA channel is determined by a polynomial, the correlation between the header value and the target DMA channel cannot be easily determined. How balance of the DMA channels are spread also depends on the value and range of the header fields. Hashing is considered a general purpose load spreading scheme.

Hash Policy

Hashing is enabled by default. The hash policy is determined by setting the FLOW_POLICY to one of the values shown in TABLE 8-5:


TABLE 8-5 Hash Policy Values

Value

Meaning

HASH_IP_ADDR

Hash on IP destination and source addresses

HASH_IP_DA

Hash on IP destination address

HASH_IP_SA

Hash on IP source address

HASH_VLAN_ID

Hash on VLAN ID

HASH_PORTNUM

Hash on physical MAC port number

HASH_L2DA

Hash on L2 destination address

HASH_PROTO

Hash on protocol number

HASH_SRC_PORT

Hash on L4 source port number

HASH_DST_PORT

Hash on L4 destination port number

HASH_ALL

Hash on all of the above fields

TCAM_CLASSIFY

Perform TCAM lookup


The default FLOW_POLICY is set to HASH_ALL, meaning that the hash hardware hash algorithm is applied on all of the above header fields. To disable hash, set FLOW_POLICY to 0 or TCAM_CLASSIFY. When set to 0, no traffic spreading is performed. All traffic ends up at a default DMA channel. When set to TCAM_CLASSIFY, traffic spreading is determined by predefined flow specifications.


Flow Match Based on Layer 2, Layer 3, and Layer 4 Header Classification

Layer 2 (L2) Classification

The layer 2 parser (part of the classification hardware) parses the following information from an Ethernet frame:

1. If the frame is VLAN packet, the VLAN ID

2. Ethernet format, whether there is a LLC/SNAP field.

Upon receiving this information, the classifier selects a RDC table to be used for further classification. L2 classification can be based on the following criteria:

For VLAN frames, the VLAN ID is used to index into a VLAN table to determine the RDC table number to be used for further classification. The VLAN table consists of 4-K entries. Each entry specifies a VLAN ID and its corresponding target RDC table number.

The target RDC table can also be determined based on the MAC address information. This information includes the MAC address type (for example, unicast, multicast, self address, address filter, or flow control) and the address.

The following I/O Control functions are used for L2 classification setup:

Because both VLAN table and MAC address table can set the preference, the arbitration between VLAN table and MAC address table is done by setting the priority field in each of these two tables.



Note - In Sun multithreaded 10-Gb Ethernet technology, L2 classification can be seen as a coarse classification mechanism in which the output of the classification is the RDC table number. Further fine classification (such as L3/L4 classification) needs to be performed to obtain the target RDC number for the RDC to be used to carry the receive traffic.


Layer 3 and Layer 4 (L3/L4) Classification

L3/L4 header classification relies on the TCAM hardware to determine how traffic flows are distributed. There are multiple TCAM hardware entries (256 in Sun multithreaded 10GbE, 128 in NIU) for specifying flow specification. The CAM lookup table key generation use the concept of classes of packets to assemble a key. With the CAM key, a packet goes through a single CAM lookup table for an associative search. The L3/L4 header classification starts when the header parse identifies the incoming L2/L3 packet type.

The following packet classes are supported in Sun Netra DPS:

Applications

Use flow tables and TCAM to direct a particular type of traffic flow (with different traffic classes) into particular DMA channels. Flow tables and TCAM are ideal for use in load balancing applications.

Classification Programming Interface

The interface to the Flow Matching scheme is the ETH_IOC_SET_CLASSSIFY
“IO Control” command of the Sun Netra DPS Ethernet interface. The following shows the calling convention of the interface:

eth_ioc(ihdlnet[port], ETH_IOC_SET_CLASSIFY, (void *)&clsfy_ioc);

ihdlnet[] is an array of device driver handle indexed by the Ethernet port number [port]. ETH_IOC_SET_CLASSIFY is the set classifier command.

The clsfy_ioc structure is defined as follows:

typedef struct classify_ioc_s {
uint_t opcode;
uint_t action;
flow_spec_t flow_spec;
} classify_ioc_t;

opcode

opcode specifies what to do about a new traffic flow. TABLE 8-6 shows possible opcode values:


TABLE 8-6 opcode Values

Value

Meaning

IOC_ADD_CLASSIFY

Add a flow.

IOC_INVALIDATE_CLASSIFY

Invalidate a flow.


action

action specifies what action to take when there is a match. TABLE 8-7 shows possible action values:


TABLE 8-7 action Values

Value

Meaning

IOC_FLOW_ACCEPT

Accept when a match.

IOC_FLOW_DISCARD

Discard when a match.


flow_spec

flow_spec is the flow specification specifying the characteristics of the IPv4 and IPv6 flow. The following shows the flow_spec structure:


typedef struct flow_spec_ipv4_s {
        uint8_t protocol;
        uint8_t tos;
        union {
                port_t tcp;
                port_t udp;
                spi_port_t spi;
        } port;
        uint32_t src;
        uint32_t dst;
} flow_spec_ipv4_t;
 
typedef struct flow_spec_ipv6_s {
        uint8_t protocol;
        uint8_t tos;
        union {
                port_t tcp;
                port_t udp;
                spi_port_t spi;
        } port;
        struct in6_addr src;
        struct in6_addr dst;
} flow_spec_ipv6_t;

fs_type

TABLE 8-8 shows the possible values of the traffic flow spec types (fs_type):


TABLE 8-8 fs_type Possible Values

Value

Meaning

FSPEC_TCPIP4

TCP over IPv4

FSPEC_UDPIP4

UDP over IPv4

FSPEC_AHIP4

IPSEC/AH over IPv4

FSPEC_ESPIP4

IPSEC/ESP over IPv4

FSPEC_SCTPIP4

SCTP over IPv4

FSPEC_TCPIP6

TCP over IPv6

FSPEC_UDPIP6

UDP over IPv6

FSPEC_AHIP6

IPSEC/AH over IPv6

FSPEC_ESPIP6

IPSEC/ESP over IPv6

FSPEC_SCTPIP6

SCTP over IPv6


index

This is the index into the TCAM entries (for L3/L4 TCAM classification) or index into the MAC or VLAN table (for L2 MAC/VLAN classification).



Note - The software application must keep track of the index number.


channel

This is the target DMA channel ranges 0 ~ 15.

ue or um

ue is the 5-tuple for IPv4 or 4-tuple for IPv6 structure for L3/L4 TCAM classification. For L2 classification, it is the L2 header structure. um is the bit-mask corresponding to the ue. Set 1 to bit-mask for don’t care (not to compare). Set 0 in bit-mask to compare.

hd

This is the entire 64-bit header.

flow_spec_ipv4_tab_s

The following is the IPv4 flow specification structure:


typedef struct flow_spec_ip4_tab_s {
        int             index;
        uint8_t          protocol;
        uint8_t          tos;
        uint8_t          tos_mask;
        uint16_t         src_port;
        uint16_t         src_port_mask;
        uint16_t         dst_port;
        uint16_t         dst_port_mask;
        char            *src_addr;
        char            *src_addr_mask;
        char            *dst_addr;
        char            *dst_addr_mask;
        int             action;
        uint8_t          dma_chan;
} flow_spec_ip4_tab_t;

flow_spec_ipv6_t

The following is the IPv6 flow specification structure:


typedef struct flow_spec_ipv6_s {
        uint8_t protocol;
        union {
                    port_t tcp;
                    port_t udp;
                    spi_port_t spi;
        } port;
        uint8_t src[16];
        uint8_t dst[16];
} flow_spec_ipv6_t;u

flow_spec_l2_t

This is the L2 header structure as shown below:


typedef struct flow_spec_l2_s {
                   uint8_t dst[6];         /* MAC address */
                uint8_t src[6];         /* MAC address */
                uint16_t type;          /* Ether type */
                uint16_t vlantag;       /* VLANID|CFI|PRI */
} flow_spec_l2_t;


Examples


procedure icon  To Use Hash Flow

single-step bullet  Set FLOW_POLICY to a desired policy. For example:


% gmake .... FLOW_POLICY=HASH_ALL

This command tells Sun multithreaded 10GbE with NIU hardware to hash on all L2/L3/L4 header fields.


procedure icon  To Use TCAM Classification

This example shows how a flow table can be established in the application.

1. Set up an array of flow table entries.

For example, use entries with the following structure:


typedef struct flow_spec_ip4_tab_s {
        int             index;
        uint8_t          protocol;
        uint8_t          tos;
        uint8_t          tos_mask;
        uint16_t         src_port;
        uint16_t         src_port_mask;
        uint16_t         dst_port;
        uint16_t         dst_port_mask;
        char            *src_addr;
        char            *src_addr_mask;
        char            *dst_addr;
        char            *dst_addr_mask;
        int             action;
        uint8_t          dma_chan;
} flow_spec_ip4_tab_t;

2. Populate the flow table as shown in the below example.


flow_spec_ip4_tab_t ip4_flow_tab[] = {
        {0, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.1", "255.255.255.0",
                                FLOW_ACCEPT, 0},
        {1, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.2", "255.255.255.0",
                                FLOW_ACCEPT, 1},
        {2, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.3", "255.255.255.0",
                                FLOW_ACCEPT, 2},
        {3, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.4", "255.255.255.0",
                                FLOW_ACCEPT, 3},
        {4, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.5", "255.255.255.0",
                                FLOW_ACCEPT, 4},
        {5, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.6", "255.255.255.0",
                                FLOW_ACCEPT, 5},
        {6, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.7", "255.255.255.0",
                                FLOW_ACCEPT, 6},
        {7, IPPROTO_UDP, 0, 0xFF, 0, 0xFFFF, 0, 0xFFFF,
                                "192.30.50.0", "255.255.255.255",
                                "192.31.50.8", "255.255.255.0",
                                FLOW_ACCEPT, 7},
        {-1, 0, 0, 0, 0, 0, 0, 0, "", "", "", "", 0, 0}
};

3. Write a parsing function to parse the entries in the table as shown in the below example.


void
classify_parse_entries(uint_t flow_cfg, uin8_t port,
                        uint8_t chan, flow_spec_ip4_tab_t *fe)
{
        classify_ioc_t clsfy_ioc;
        int status;
        int i;
 
        for (i = 0; fe[i].index != -1; i++) {
                if (fe[i].dma_chan != chan)
                        continue;
                clsfy_ioc.opcode = IOC_ADD_CLASSIFY;
                clsfy_ioc.flow_spec.fs_type = FSPEC_UDPIP4;
                clsfy_ioc.flow_spec.index = fe[i].index;
                clsfy_ioc.flow_spec.channel = fe[i].dma_chan;
                clsfy_ioc.flow_spec.ue.ip4.protocol = fe[i].protocol;
                clsfy_ioc.flow_spec.ue.ip4.tos = fe[i].tos;
                clsfy_ioc.flow_spec.ue.ip4.port.udp.src = fe[i].src_port;
                clsfy_ioc.flow_spec.ue.ip4.port.udp.dst = fe[i].dst_port;
                status = inet_pton(AF_INET, (char *)fe[i].src_addr,
                                (char *)&clsfy_ioc.flow_spec.ue.ip4.src);
                if (status != 1)
                        goto fail;
                status = inet_pton(AF_INET, (char *)fe[i].dst_addr,
                                (char *)&clsfy_ioc.flow_spec.ue.ip4.dst);
                if (status != 1)
                        return;
                clsfy_ioc.flow_spec.um.ip4.tos = ~fe[i].tos_mask;
                clsfy_ioc.flow_spec.um.ip4.port.udp.src = ~fe[i].src_port_mask;
                clsfy_ioc.flow_spec.um.ip4.port.udp.dst = ~fe[i].dst_port_mask;
                status = inet_pton(AF_INET, (char *)fe[i].src_addr_mask,
                                (char *)&clsfy_ioc.flow_spec.um.ip4.src);
                if (status != 1)
                        goto fail;
                clsfy_ioc.flow_spec.um.ip4.src =
                                        ~clsfy_ioc.flow_spec.um.ip4.src;
                status = inet_pton(AF_INET, (char *)fe[i].dst_addr_mask,
                                (char *)&clsfy_ioc.flow_spec.um.ip4.dst);
                if (status != 1)
                        goto fail;
                clsfy_ioc.flow_spec.um.ip4.dst =
                                        ~clsfy_ioc.flow_spec.um.ip4.dst;
                if (fe[i].action == FLOW_ACCEPT)
                        clsfy_ioc.action = IOC_FLOW_ACCEPT;
                else
                        clsfy_ioc.action = IOC_FLOW_DISCARD;
 
                /* Program the TCAM HW */
                (void) eth_ioc(ihdlnet[port], ETH_IOC_SET_CLASSIFY,
                                                        (void *)&clsfy_ioc);
        }
}

4. During the build, enable TCAM classification and disable hashing. To do this, type:


gmake .... FLOW_POLICY=TCAM_CLASSIFY

This command enables Sun multithreaded 10-Gb Ethernet with NIU hardware to enable TCAM classification with matching rules as shown in Step 1 to Step 3.