17 Performing a Datagram Test for Network Performance

Included with Coherence is a Datagram Test utility which can be used to test and tune network performance between two or more machines. The Datagram test operates in one of three modes, either as a packet publisher, a packet listener, or both. When run a publisher will transmit UDP packets to the listener who will measure the throughput, success rate, and other statistics.

To achieve maximum performance it is suggested that you tune your environment based on the results of these tests. See Chapter 20, "Performance Tuning" for more information.

17.1 Running the Datagram Test Utility

The Datagram test supports a large number of configuration options, though only a few are required for basic operation. To run the Datagram Test utility use the following syntax from the command line:

java com.tangosol.net.DatagramTest <command value ...> <addr:port ...>

Table 17-1 describes the available command line options for the Datagram Test utility.

Table 17-1 Command Line Options for the Datagram Test Utility

Command	Required/Optional	Applicability	Description	Default
-local	Optional	Both	The local address to bind to, specified as `addr:port`	localhost:9999
-packetSize	Optional	Both	The size of packet to work with, specified in bytes.	1468
-processBytes	Optional	Both	The number of bytes (in multiples of 4) of each packet to process.	4
-rxBufferSize	Optional	Listener	The size of the receive buffer, specified in packets.	1428
-txBufferSize	Optional	Publisher	The size of the transmit buffer, specified in packets.	16
-txRate	Optional	Publisher	The rate at which to transmit data, specified in megabytes.	unlimited
-txIterations	Optional	Publisher	Specifies the number of packets to publish before exiting.	unlimited
-txDurationMs	Optional	Publisher	Specifies how long to publish before exiting.	unlimited
-reportInterval	Optional	Both	The interval at which to output a report, specified in packets.	100000
-tickInterval	Optional	Both	The interval at which to output tick marks.	1000
-log	Optional	Listener	The name of a file to save a tabular report of measured performance.	none
-logInterval	Optional	Listener	The interval at which to output a measurement to the log.	100000
-polite	Optional	Publisher	Switch indicating if the publisher should wait for the listener to be contacted before publishing.	off
arguments	Optional	Publisher	Space separated list of addresses to publish to, specified as `addr:port`.	none

17.1.1 Sample Commands for a Listener and a Publisher

The following command line is for a listener:

java -server com.tangosol.net.DatagramTest -local box1:9999 -packetSize 1468

The following command line is for a publisher:

java -server com.tangosol.net.DatagramTest -local box2:9999 -packetSize 1468 box1:9999

For ease of use, datagram-test.sh and datagram-test.cmd scripts are provided in the Coherence bin directory, and can be used to execute this test.

17.2 Datagram Test Example

Presume that you want to test network performance between two servers— Server A with IP address 1{{95.0.0.1}} and Server B with IP address 195.0.0.2. One server will act as a packet publisher and the other as a packet listener, the publisher will transmit packets as fast as possible and the listener will measure and report performance statistics. First start the listener on Server A.

Example 17-1 Command to Start a Listener

datagram-test.sh

After pressing ENTER, you should see the Datagram Test utility showing you that it is ready to receive packets.

Example 17-2 Output from Starting a Listener

starting listener: at /195.0.0.1:9999
packet size: 1468 bytes
buffer size: 1428 packets
  report on: 100000 packets, 139 MBs
    process: 4 bytes/packet
        log: null
     log on: 139 MBs

As you can see by default the test will try to allocate a network receive buffer large enough to hold 1428 packets, or about 2 MB. If it is unable to allocate this buffer it will report an error and exit. You can either decrease the requested buffer size using the -rxBufferSize parameter or increase your operating system network buffer settings. For best performance it is recommended that you increase the operating system buffers. See the following forum post for details on tuning your operating system for Coherence.

When the listener process is running you may start the publisher on Server B, directing it to publish to Server A.

Example 17-3 Command to Start a Publisher

datagram-test.sh servera

After pressing ENTER, you should see the new Datagram test instance on Server B start both a listener and a publisher. Note in this configuration Server B listener will not be used. The output illustrates in Example 17-4 should appear in the Server B command window.

Example 17-4 Datagram Test—Starting a Listener and a Publisher on a Server

starting listener: at /195.0.0.2:9999
packet size: 1468 bytes
buffer size: 1428 packets
  report on: 100000 packets, 139 MBs
    process: 4 bytes/packet
        log: null
     log on: 139 MBs

starting publisher: at /195.0.0.2:9999 sending to servera/195.0.0.1:9999
packet size: 1468 bytes
buffer size: 16 packets
  report on: 100000 packets, 139 MBs
    process: 4 bytes/packet
      peers: 1
       rate: no limit

no packet burst limit
oooooooooOoooooooooOoooooooooOoooooooooOoooooooooOoooooooooOoooooooooOoooooooooO

The series of o and O tick marks appear as data is (O)utput on the network. Each o represents 1000 packets, with O indicators at every 10,000 packets.

On Server A you should see a corresponding set of i and I tick marks, representing network (I)nput. This indicates that the two test instances are communicating.

17.3 Reporting

Periodically, each side of the test (publisher and listener) will report performance statistics.

17.3.1 Publisher Statistics

The publisher simply reports the rate at which it is publishing data on the network. A typical report is as follows:

Example 17-5 Sample Publisher Report

Tx summary 1 peers:
   life: 97 MB/sec, 69642 packets/sec
    now: 98 MB/sec, 69735 packets/sec

The report includes both the current transmit rate (since last report) and the lifetime transmit rate.

17.3.2 Listener Statistics

Table 17-2 describes the statistics that can be reported by the listener.

Table 17-2 Listener Statistics

Element	Description
Elapsed	The time interval that the report covers.
Packet size	The received packet size.
Throughput	The rate at which packets are being received.
Received	The number of packets received.
Missing	The number of packets which were detected as lost.
Success rate	The percentage of received packets out of the total packets sent.
Out of order	The number of packets which arrived out of order.
Average offset	An indicator of how out of order packets are.

As with the publisher both current and lifetime statistics are report. Example 17-6 displays a typical report:

Example 17-6 Sample Lifetime Statistics

Lifetime:
Rx from publisher: /195.0.0.2:9999
             elapsed: 8770ms
         packet size: 1468
          throughput: 96 MB/sec
                      68415 packets/sec
            received: 600000 of 611400
             missing: 11400
        success rate: 0.9813543
        out of order: 2
          avg offset: 1


Now:
Rx from publisher: /195.0.0.2:9999
             elapsed: 1431ms
         packet size: 1468
          throughput: 98 MB/sec
                      69881 packets/sec
            received: 100000 of 100000
             missing: 0
        success rate: 1.0
        out of order: 0
          avg offset: 0

The primary items of interest are the throughput and success rate. The goal is to find the highest throughput while maintaining a success rate as close to 1.0 as possible. On a 100 Mb network setup you should be able to achieve rates of around 10 MB/sec. On a 1 Gb network you should be able to achieve rates of around 100 MB/sec. Achieving these rates will likely require some tuning (see below).

17.3.2.1 Throttling

The publishing side of the test may be throttled to a specific data rate expressed in megabytes per second, by including the -txRate M parameter when M represents the maximum MB/sec the test should put on the network.

17.3.2.2 Bidirectional Testing

You may also run the test in a bidirectional mode where both servers act as publishers and listeners. To do this simply restart test instances, supplying the instance on Server A with Server B's address, by running the command in Example 17-7 on Server A.

Example 17-7 Running Datagram Test in Bi-Directional Mode

datagram-test.sh -polite serverb

And then run the same command as before on Server B. The -polite parameter instructs this test instance to not start publishing until it is starts to receive data.

17.3.2.3 Distributed Testing

You may also use more then two machines in testing, for instance you can setup two publishers to target a single listener. This style testing is far more realistic then simple one-to-one testing, and may identify bottlenecks in your network which you were not otherwise aware of.

Assuming you intend to construct a cluster consisting of four machines, you can run the datagram test among all of them as follows:

On Servera:

datagramtest.sh -txRate 100 -polite serverb serverc serverd

On Serverb:

datagramtest.sh -txRate 100 -polite servera serverc serverd

On Serverc:

datagramtest.sh -txRate 100 -polite servera serverb serverd

On Serverd:

datagramtest.sh -txRate 100 servera serverb serverc

This test sequence will cause all nodes to send a total of 100MB per second to all other nodes (that is, 33MB/node/sec). On a fully switched network 1GbE network this should be achievable without packet loss.

To simplify the execution of the test all nodes can be started with an identical target list, they will obviously transmit to themselves as well, but this loopback data can easily be factored out. It is important to start all but the last node using the -polite switch, as this will cause all other nodes to delay testing until the final node is started.