Benchmarking Cloud Deployable DSR

3 Benchmarking Cloud Deployable DSR

This chapter is divided into the following sections:

Infrastructure Environment
This section provides details of the infrastructures used for the benchmark testing, including the hardware and software. It also describes key settings and attributes, and some recommendations on configuration.
Benchmark section for each DSR server type
Each DSR server type is treated independently for benchmarking. Each section describes the traffic setup, and the observed results. It also provides metrics and guidelines for assessing performance on any infrastructure.

Data Usage

This data is intended to provide guidance. Recommendations may need to be adapted to the conditions in a given operator’s network. Each of the following sections include metrics that provide feedback on the running performance of the application.

When planning to deploy a DSR into any cloud environment, a few steps are recommended:

Understand the initial deployment scenario for the DSR.
- Which features are planned?
- How much of what type of traffic?
  This may change once deployed, and the DSR can be grown or shrunk to meet the changing needs.
Use the DSR Cloud Dimensioning tool to get an estimate of the types of DSR virtual servers needed and an initial estimate of the quantity of the virtual machines and resources. Oracle Sales Consultant can run this tool based on DSR requirements:
- The tool allows for a very detailed model to be built of your DSR requirements, including:
  - Required MPS by Diameter Application ID (S6a, Sd, Gx, Rx, so on).
  - Required DSR applications such as Full Address Based Resolution (FABR), Policy DRA (PDRA), and any required sizing information such as the number of subscribers supported for each application.
  - Any required DSR features such as Topology Hiding, Message Copy, IPSEC, or Mediation that can affect performance.
  - Network-level redundancy requirements, such as mated pair DSR deployments, where one DSR needs to support full traffic, when one of the DSRs is unavailable.
  - Infrastructure information, such as OpenStack or KVM, and Server parameters.
- The tool then generates a recommended number of VMs for each of the required VM types.
Note:
These recommendations are just guidelines. Since the actual performance of the DSR can vary significantly based on the details of the infrastructure.
Based on the initial deployment scenario, determine if additional benchmarking is warranted:
- For labs and trials, there is no need to benchmark performance and capacity if the goal of the lab is to test DSR functionality.
- If the server hardware is different from the hardware used in this document then the performance differences can likely be estimated using industry standard metrics. This is done by comparing single-threaded processor performance of the CPUs used in this document with respect to the CPUs used in the customer’s infrastructure. This approach is most accurate for small differences in hardware (for instance, different clock speeds for the same generation of Intel processors) and least accurate across processor generations where other architectural differences such as networking interfaces could also affect the comparison.
- It is the operator’s decision to determine if additional benchmarking in the operator’s infrastructure is desired. Here is a few things to consider when deciding:
  - Benchmark infrastructure is similar to the operator’s infrastructure, and the operator is satisfied with the benchmark data provided by Oracle.
  - Initial turn-up of the DSR is handling a relatively small amount of traffic and the operator prefers to measure and adjust once deployed.
  - Operator is satisfied with the high-availability and geo-diversity of the DSR, and is willing to risk initial overload conditions, and adjusts once the DSR is in production.
If required, perform benchmark testing on the target cloud infrastructure. Perform benchmark only on those types of DSR servers required for the deployment.
For example, if full address resolution is not planned, do not waste time benchmarking the SDS, SDS SOAM, or DPs.
- When the benchmark testing is complete, observe the data for each server type, and compare it with the baseline used for the estimate from the DSR Cloud Dimensioning tool.
  - If the performance estimate for a given DSR function is X and the observed performance is Y, then adjust the performance for that DSR function to Y.
  - Re-calculate the resources needed for deployment based on the updated values.
Deploy the DSR.
Monitor the DSR performance and capacity as described later in the document. As the network changes additional resources may be required. If needed, increase the DSR resources as described later in this document.

3.1 Infrastructure Environment

This section describes the infrastructure that was used for benchmarking. In general, the defaults or recommendations for hypervisor settings are available from the infrastructure vendors.

Whenever possible the DSR recommendations align with vendor defaults and recommendations. Benchmarking was performed with the settings described in this section. Operators may choose different values, better or worse performance compared to the benchmarks might be observed. When recommendations other than vendor defaults or recommendations are made, additional explanations are included in the applicable section.

There is a sub-section included for each infrastructure environment used in benchmarking.

3.1.1 General Rules for All Infrastructures

3.1.1.1 Hyper-Threading and CPU Over-Subscription

All of the tests were conducted with Hyper-Threading enabled, and with a 1:1 subscription ratio for vCPUs in the hypervisor. The hardware used for the testing were dual-processor servers with 32 physical cores each (Oracle X9-2). Thus, each server had:

(2 CPUs) x (32 cores per CPU) x (2 threads per core) = 128 vCPUs

It is not recommended to use over-subscribed vCPUs (for instance 4:1) in the hypervisor. Not only is the performance lower, but it makes the performance more dependent on the other loads running on each physical server.

Turning off Hyper-Threading is also not recommended. There is a small increase in performance of a given VM without Hyper-Threading for a given number of vCPUs. But since the number of vCPUs for each processor drops in half without Hyper-Threading, the overall throughput for each server also drops almost by half.

The vCPU sizing for each VM is provided in the DSR VM Configurations section.

Note:

The recommended configuration is: Hyper-Threading is enabled with 1:1 CPU subscription ratio.

CPU Technology

The CPUs in the servers used for the benchmarking were the Oracle X9-2. Servers with different processors does give different results. In general there are the following issues when mapping the results of the benchmarking data in this document to other CPUs:

The per-thread performance of a CPU is the main attribute that determines VM performance. The number of threads is fixed in the VM sizing as shown in DSR VM Configurations section. A good metric for comparing the per-thread performance of different CPUs is the integer performance measured by the SPECint2006 (CINT2006) defined by SPEC.ORG.
The mapping of SPECint2006ratios to DSR VM performance ratios isn’t exact, but it’s a good measure to determine whether a different CPU is likely to run the VMs faster or slower than the benchmark results in this document.

Conversely CPU clock speeds are a relatively poor indicator of relative CPU performance. Within a given Intel CPU generation (v2, v3, v4, so on) there are other factors that affect per-thread performance, such as potential turbo speeds of the CPU in comparison with the cooling solution in a given server.

Comparing between Intel CPU generations, there is a generation over generation improvement of CPU throughput in comparison with the clock speed. This means that even a newer generation chip with a slower clock speed may run a DSR VM faster.
The processors must have enough cores that a given VM can fit entirely into a NUMA node. Splitting a VM across NUMA nodes greatly reduces the performance of that VM. The largest VM size (refer DSR VM Configurations section) is 18 vCPUs. Thus, the smallest processor that should be used is a 9-core processor. Using processors with more cores typically makes it easier to pack VMs more efficiently into NUMA nodes but should not affect individual VM CPU-related performance otherwise.
One caveat about CPUs with very high core counts is that the user must be aware of potential bottlenecks caused by many VMs contending for shared resources such as network interfaces and ephemeral storage on the server. These tests were run on relatively large CPUs (32 physical cores for each chip), and no such bottlenecks were encountered while running strictly DSR VMs. In clouds with VMs from other applications potentially running on the same physical server as DSR VMs, or in future processor generations with much higher core counts. This potential contention for shared server resources has to be watched closely.

Note:

The selected VM sizes should fit within a single NUMA node, for instance 9 physical cores for the VMs that required 18 vCPUs. Check the performance of the target CPU type against the benchmarked CPU using per-thread integer performance metrics.

3.1.1.2 VM Packing Rate

The DSR doesn’t require or use CPU pinning. Thus, the packing of the DSR VMs onto the physical servers is under the control of OpenStack using the affinity or anti-affinity rules given in DSR VM Configurations. Typically, the VMs do not fit exactly into the number of vCPUs available in each NUMA node, leaving some un-allocated vCPUs. The ratio of the allocated to the unallocated vCPUs is the VMPacking Ratio. For instance, on a given server if 102 out of 128 vCPUs on a server were allocated by the OpenStack, that server would have a packing ratio of ~80%. The achieved packing in a deployment depends on a lot of factors, including the mix of large VMs (DA-MPs, SBRs) with the smaller VMs, and whether the DSR is sharing the servers with other applications that have a lot or large or small VMs.

When planning the number of physical servers required for an DRS a target packing ratio of 80% is a good planning number. A packing ratio of 100% is hard to achieve and may affect the performance numbers shown in the benchmarks. Some amount of server capacity is necessary to run the Host OS for the VMs. While performing functions such as interrupt handling, a packing ratio of 95% or lower is desirable.

Note:

When planning for physical server capacity a packing ratio of 80% is a good guideline. Packing ratios of greater than 95% might affect the benchmark numbers since there aren’t sufficient server resources to handle the overhead of Host OSs.

3.1.1.3 Infrastructure Tuning

The following parameters should be set in the infrastructure to improve DSR VM performance. The instructions for setting them for a given infrastructure is including the DSR Cloud Installation Guide.

Txqueuelen: The default of 500 is too small. Recommendation is to set this parameter to 120000.
- Tuned on the compute hosts.
- Default value of 500 is too small. Our recommendation is to set to 120000. This increases the network throughput of a VM.
Ring buffer increase on the physical Ethernet interfaces: The default is too small. The recommendation is to set both receive and transmit values to 4096.
Multiqueue: Multiqueue should be enabled on any IPFE VMs to improve performance.

Note:

Refer to instructions in the DSR Cloud Installation Guide.

3.1.2 KVM (QEMU)/Oracle X9-2 – Infrastructure Environment

There are a number of settings that affects performance of the hosted virtual machines. A number of tests were performed to maximize the performance of the underlying virtual machines for the DSR application.

Host Hardware

Oracle Server X9-2
- CPU Model: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
- 2 CPUs
- 32 physical cores per CPU
- RAM: 768 GB
- HDD: 3.8 TB of NVMe storage (with Software RAID-1 configured)
- NIC:
  - Oracle Quad Port 10G Base-T Adapter

Hypervisor

QEMU-KVM Version: QEMU 1.5.3, libvirt 4.5.0, API QEMU 4.5.0

3.1.2.1 Device Drivers

VirtIO is a virtualizing standard for network and disk device drivers where just the guest’s device driver knows it is running in a virtual environment and cooperates with the hypervisor. This enables guests to get high performance network and disk operations and gives most of the performance benefits of para-virtualization.

Vhost-net provides improved network performance over Virtio-net by totally by passing QEMU as a fast path for interruptions. The vhost-net runs as a kernel thread and interrupts with less overhead providing near native performance. The advantages of using the vhost-net approach are reduced copy operations, lower latency, and lower CPU usage.

Note:

The VirtIO driver was used for Test Bed setting.

3.1.2.2 BIOS Power Settings

Typical BIOS power settings (hardware vendor dependent, see relevant infrastructure hardware vendor documentation for details) provide three options for power settings:

Power Supply Maximum: The maximum power the available PSUs can draw.
Allocated Power: The power is allocated for installed and hot pluggable components.
Peak Permitted: The maximum power the system is permitted to consume.

Note:

Set to Allocated Power or equivalent for your Hardware vendor.

Disk Image Formats

The preferred disk image file formats available when deploying a KVM virtual machine:

QCOW2: Disk format supported by the QEMU emulator that can expand dynamically and supports Copy-on-write.

QCOW2 provides a number of benefits, such as:

Smaller file size, even on file systems which don’t support holes (such as, sparse files)
Copy-on-write support, where the image only represents changes made to an underlying disk image
Snapshot support, where the image can contain multiple snapshots of the images history

Test Bed Setting: QCOW2

3.1.2.3 Guest Caching Modes

The operating system maintains a page cache to improve the storage I/O performance. With the page cache, write operations to the storage system are considered completed after the data has been copied to the page cache. Read operations can be satisfied from the page cache if the data requested is in the cache. The page cache is copied to permanent storage using fsync. Direct I/O requests bypass the page cache. In the KVM environment, both the host and guest operating systems can maintain their own page caches, resulting in two copies of data in memory.

The following caching modes are supported for KVM guests:

Writethrough: I/O from the guest is cached on the host but written through to the physical medium. This mode is slower and prone to scaling problems. Best used for a small number of guests with lower I/O requirements. Suggested for guests that do not support a writeback cache (such as, Red Hat Enterprise Linux 5.5 and earlier), where migration is not needed.
Writeback (Selected): With caching set to writeback mode, both the host page cache and the disk write cache are enabled for the guest. Due to this, the I/O performance for applications running in the guest is good, but the data is not protected in a power failure. As a result, this caching mode is recommended only for temporary data where potential data loss is not a concern.
None: With caching mode set to none, the host page cache is disabled, but the disk write cache is enabled for the guest. In this mode, the write performance in the guest is optimal because write operations bypass the host page cache and go directly to the disk write cache. If the disk write cache is battery-backed, or if the applications or storage stack in the guest transfer data properly (either through fsync operations or file system barriers), then data integrity can be ensured. However, because the host page cache is disabled, the read performance in the guest would not be as good as in the modes where the host page cache is enabled, such as write through mode.
Unsafe: The host may cache all disk I/O, and sync requests from guest are ignored.

Caching mode None is recommended for remote NFS storage, because direct I/O operations (O_DIRECT) perform better than synchronous I/O operations (with O_SYNC). Caching mode None effectively turns all guest I/O operations into direct I/O operations on the host, which is the NFS client in this environment. Moreover, it is the only option to support migration.

Note:

For Test Bed Setting, set Caching Mode to Writeback.

3.1.2.4 Memory Tuning Parameters

Swappiness

The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Since disks are much slower than RAM, this can lead to slower response times for system and applications if processes are too aggressively moved out of memory.

vm.swappiness = 0: The kernel swaps only to avoid an out of memory condition.
vm.swappiness = 1: Kernel version 3.5 and over, as well as kernel version 2.6.32-303 and over; Minimum amount of swapping without disabling it entirely.
vm.swappiness = 10: This value is recommended to improve performance when sufficient memory exists in a system.
vm.swappiness = 60: Default
vm.swappiness = 100: The kernel swaps aggressively

Note:

For Test Bed Setting, set vm.swappiness to 10.

Kernel Same Page Merging

Kernel Same-page Merging (KSM), used by the KVM hypervisor, allows KVM guests to share identical memory pages. These shared pages are usually common libraries or other identical, high-use data. KSM allows for greater guest density of identical or similar guest operating systems by avoiding memory duplication. KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy-on-write. If the contents of the page is modified by a guest virtual machine, a new page is created for that guest.

This is useful for virtualization with KVM. When a guest virtual machine is started, it only inherits the memory from the host qemu-kvm process. Once the guest is running, the contents of the guest operating system image can be shared when guests are running the same operating system or applications. KSM allows KVM to request that these identical guest memory regions be shared.

KSM provides enhanced memory speed and utilization. With KSM, common process data is stored in cache or in main memory. This reduces cache misses for the KVM guests, which can improve performance for some applications and operating systems. Secondly, sharing memory reduces the overall memory usage of guests, which allows for higher densities and greater utilization of resources.

The following 2 services controls KSM:

KSM Service: When the KSM service is started, KSM shares up to half of the host system's main memory. Start the KSM service to enable KSM to share more memory.
KSM Tuning Service: The ksmtuned service loops and adjusts KSM. The ksmtuned service is notified by libvirt, when a guest virtual machine is created or destroyed

Note:

For Test Bed Setting, set KSM service to active and ensure ksmtuned service running on KVM hosts.

Zone Reclaim Mode

When an operating system allocates memory to a NUMA node, but the NUMA node is full, the operating system reclaims memory for the local NUMA node rather than immediately allocating the memory to a remote NUMA node. The performance benefit of allocating memory to the local node outweighs the performance drawback of reclaiming the memory. However, in some situations reclaiming memory decreases performance to the extent that the opposite is true. In other words, in these situations, allocating memory to a remote NUMA node generates better performance than reclaiming memory for the local node.

A guest operating system causes zone to reclaim in the following situations:

When you configure the guest operating system to use huge pages.
When you use KSM to share memory pages between guest operating systems.

Configuring huge pages and running KSM are both best practices for KVM environments. Therefore, to optimize performance in KVM environments, it is recommended to disable zone reclaim.

Note:

For Test Bed Setting, disable zone reclaim.

Transparent Huge Pages

Transparent huge pages (THP) automatically optimize system settings for performance. By allowing all free memory to be used as cache, performance is increased.

Note:

For Test Bed Setting, enable THP.

3.2 Benchmark Testing

The way the testing was performed and the benchmark test set-up is the same for each benchmark infrastructure. Each section describes the common set-up and procedures used to benchmark, and then the specific results for the benchmarks are provided for each benchmark infrastructure.

3.2.1 DA-MP Relay Benchmark

This benchmarking case illustrates conditions for an overload of a DSR DA MP.

3.2.1.1 Topology

The below figure illustrates the logical topology used for this testing. Diameter traffic is generated by an MME simulator and sent to an HSS simulator.

Figure 3-1 DA-MP Relay Testing Topology

The dsr.cpu utilization can be further increased to higher levels by means of configuration changes with DOC/CL1/CL2 discards set to 0 and multi-queuing enabled on all hosts. With this configuration, it must be noted that all the discards are at one step CL3 for all incoming and outgoing messages.

3.2.1.2 Message Flow

The following figure illustrates the Message sequence for this benchmark case.

Figure 3-2 DA-MP Relay Message Sequence

Table 3-1 Relay Performance Benchmarking

Scenario	Call Flow Model	DSR MPS Achieved	DA-MP Flavor	DA-MP Profile	Avg Msg Size	CPU Peak	RAM Utilization Peak
Relay	100% Relay	288K	12 vCPU (Regular)	30K_MPS	2.0 K	23%	31%
Relay (with Multique enable, configuration set to DOC/CL1/CL2 discards set to 0 and multi queuing enabled on all hosts)	100% Relay	576K	12 vCPU (Regular)	40K_MPS_FABR	2.0 K	36%	32%
Relay	100% Relay	560K	18 vCPU (Large)	35K_MPS	2.0 K	32%	28%

3.2.1.3 Indicative Alarms or Events

During benchmark testing the following alarms or events were observed when it reaches congestion.

Table 3-2 DA-MP Relay Alarms or Events

Number	Severity	Server	Name	Description
22008	Info	DA-MP	Orphan Answer Response Received	An answer response was received for which no pending request transaction existed resulting in the Answer message being discarded.
22201	Minor	DA-MP	MpRxAllRate	DA-MP ingress message rate threshold crossed.
22225	Minor	DA-MP	MpRxDiamAllLen	DA-MP diameter average ingress message length threshold crossed.

3.2.2 RBAR Benchmark

Range Based Address Resolution (RBAR) is a DSR-enhanced routing application that allows the routing of Diameter end-to-end transactions based on Diameter Application ID, Command Code, Routing Entity Type, and Routing Entity Addresses (range and individual) as a Diameter Proxy Agent.

A Routing Entity can be:

A User Identity:
- International Mobile Subscriber Identity (IMSI)
- Mobile Subscriber Integrated Services Digital Network (Number) (MSISDN)
- IP Multimedia Private Identity (IMPI)
- IP Multimedia Public Identity (IMPU)
An IP Address associated with the User Equipment:
- IPv4 (based upon the full 32-bit value in the range of 0x00000000 to 0xFFFFFFFF)
- IPv6-prefix (1 to 128 bits)
A general-purpose data type: UNSIGNED16 (16-bit unsigned value)

Routing resolves to a Destination that can be configured with any combination of a Realm and Fully Qualified Domain Name (FQDN); Realm-only, FQDN-only, or Realm and FQDN.

When a message successfully resolves to a destination, RBAR replaces the destination information (Destination-Host and/or Destination-Realm) in the ingress message with the corresponding values assigned to the resolved destination and forwards the message to the (integrated) Diameter Relay Agent for egress routing into the network.

3.2.2.1 Topology

The following figure illustrates the logical topology used for this testing. Diameter traffic is generated by an MME simulator and sent to an HSS simulator.

Figure 3-3 RBAR Testing Topology

3.2.2.2 Message Flow

The following figure illustrates the Message sequence for this benchmark case.

Figure 3-4 DA-MP RBAR Message Sequence

Table 3-3 RBAR Performance Benchmarking

Scenario	Call Flow Model	DSR MPS Achieved	DA-MP Flavor	DA-MP Profile	Avg Msg Size	CPU Peak	RAM Utilization Peak
RBAR	100% RBAR	256K	12 vCPU (Regular)	30K_MPS	2.0 K	30%	30%

3.2.2.3 Indicative Alarms or Events

During benchmark testing the following alarms or events were observed when it reaches into congestion.

Table 3-4 DA-MP RBAR Alarms or Events

Number	Severity	Server	Name	Description
22008	Info	DA-MP	Orphan Answer Response Received	An answer response was received for which no pending request transaction existed resulting in the Answer message being discarded.
22225	Minor	DA-MP	MpRxDiamAllLen	DA-MP diameter average ingress message length threshold crossed.

3.2.3 Full Address Based Resolution (FABR - SDS) Capacity

The FABR application adds a Database Processor (DP) server to perform database lookups with a user defined key (IMSI, MSISDN, or Account ID and MSISDN or IMSI). If the key is contained in the database, the DP returns the realm and FQDN associated with that key. The returned realm and FQDN can be used by the DSR Routing layer to route the connection to the desired endpoint. Since there is additional work done on the DA-MP to query the DP, running the FABR application has an impact on the DA-MP performance. This section contains the performance of the DA-MP while running FABR as well as benchmark measurements on the DP itself.

3.2.3.1 Topology

Figure 3-5 SDS DP Testing Topology

SDS DB Details

The SDS database was first populated with subscribers. This population simulates real-world scenarios likely encountered in a production environment and ensure the database is of substantial size to be queried against.

SDS DB Size: 780 Million Routing Entities (260 Million Subscribers having 2 IMSI, 1 MSISDN)
AVP Decoded: User-Name for IMSI

3.2.3.2 Message Flow

Figure 3-6 SDS DP Message Sequence

Table 3-5 SDS DP Performance Benchmarking

Scenario	Call Flow Model	DP MPS Achieved	DA-MP MPS Achieved	DA-MP Flavor	DA-MP Profile	Avg Msg Size	CPU Peak	RAM Utilization Peak
FABR	100% FABR	80K	160K	12 vCPU (Regular)	30K_MPS	2.0 K	26%	44%

3.2.3.3 Indicative Alarms or Events

Table 3-6 SDS DP Alarms or Events

Number	Severity	Server	Name	Description
19814	Info	DA-MP	Communication Agent Peer has not responded to heartbeat	Communication Agent Peer has not responded to heartbeat.
19825	Major, Critical	DA-MP	Communication Agent Transaction Failure Rate	The number of failed transactions during the sampling period has exceeded configured thresholds.
19832	Info	DA-MP	Communication Agent Reliable Transaction Failed	Communication Agent Reliable Transaction Failed.
22008	Info	DA-MP	Orphan Answer Response Received	An answer response was received for which no pending request transaction existed resulting in the Answer message being discarded.
22225	Minor	DA-MP	MpRxDiamAllLen	DA-MP diameter average ingress message length threshold crossed.
22606	Info	DA-MP	Database or DB connection error	FABR application received service notification indicating Database (DP) or DB connection (COM Agent) Errors (DP timeout, errors or COM Agent internal errors) for the sent database query.
31000	Critical	DA-MP	S/W Fault	Program impaired by s/w fault

3.2.4 Full Address Based Resolution (FABR-UDR) Capacity

The FABR is a DSR application that provides an enhanced DSR routing capability to enable network operators to resolve the designated Diameter server (IMS HSS, LTE HSS PCRF, OCS, OFCS, and AAA) addresses based on individual user identity addresses in the incoming Diameter request messages. It offers enhanced functionalities with User Data Repository (UDR), which is used to store subscriber data. FABR routes the message as a Diameter Proxy Agent based on request message parameter content.

FABR use the services of the Diameter Plug-In for sending and receiving Diameter messages from or to the network. It uses Communication Agent to interact with off-board data repository (UDR) for address resolution. This section contains the performance of the DA-MP while running FABR.

3.2.4.1 Topology

Figure 3-7 FABR with UDR Testing Topology

UDR DB Details

The UDR database was first populated with subscribers. This population simulates real-world scenarios likely encountered in a production environment and ensure the database is of substantial size to be queried against.

UDR DB Size: Tested with 40 Million records
AVP Decoded: User-Name for IMSI

Following UDR profile is used for benchmarking.

Table 3-7 UDR Profile

vCPU	RAM (GB)	HDD (GB)
18	70	400

3.2.4.2 Message Flow

Figure 3-8 FABR with UDR Message Sequence

Table 3-8 FABR with UDR Performance Benchmarking

Scenario	Call Flow Model	DSR MPS Achieved	DA-MP Flavor	DA-MP Profile	Avg Msg Size	CPU Peak	RAM Utilization Peak
FABR + Relay	70% FABR + 30% Relay	288K (18K/MP)	12 vCPU (Regular)	30K_MPS	2.0 K	34%	46%

3.2.4.3 Indicative Alarms or Events

Table 3-9 FABR with UDR Alarms or Events

Number	Severity	Server	Name	Description
22008	Info	DA-MP	Orphan Answer Response Received	An Answer response was received for which no pending request transaction existed resulting in the Answer message being discarded.
22225	Minor	DA-MP	MpRxDiamAllLen	DA-MP diameter average ingress message length threshold crossed.

3.2.5 vSTP MP

The vSTP-MP server type is a virtualized STP that supports M2PA, M3UA, and TDM. It can be deployed either with other DSR functionality as a combined DSR or vSTP, or as a standalone virtualized STP without any DSR functionality.

3.2.5.1 vSTP MP Benchmarking

The following table describes the feature wise vSTP MP benchmarking.

Table 3-10 Feature-wise vSTP MP Benchmarking

Scenario	Call Flow Model	vSTP MPS Achieved	SS7-MP Flavor	CPU Peak
SFAPP + MNP + GTT	2K (MNP + SFAPP) + 6K GTT	18K/MP	8 vCPU	22
SFAPP + MNP + GFLEX + GTT	2K (MNP + SFAPP) + 1K GFLEX + 4K GTT	18K/MP	8 vCPU	19
TIF + GTT	5K MNP+ 10K GTT	20K/MP	8 vCPU	32
vMNP + GTT	5K MNP+ 10K GTT	20K/MP	8 vCPU	29
GFLEX + GTT	5K MNP+ 10K GTT	20K/MP	8 vCPU	46
INPQ + GTT	5K MNP+ 10K GTT	20K/MP	8 vCPU	9
GTT + MTP Routing with MTP screening (M2PA & M3UA)	16K MPS	16K/MP	8 vCPU	45
GTT + MTP Routing	20K MPS	20K/MP	8 vCPU	51
vEIR	5K	5K/MP	8 vCPU	21
Elynx (E1/T1 Card) – GTT Relay	10K TDM + 10K GTT	20K/MP	8 vCPU	29
ENUM	5K	5K/MP	8 vCPU	11
DNS	10K	10K/MP	8 vCPU	2
vSTP – Home SMS	MO-FSM AllowList + BlockList Traffic (10K + 10K)	10K/SS7 (2 MPs) 20K /Proxy MP	8 vCPU	29

Note:

For ENUM, new vENUM-MP is introduced. vENUM sends messages to UDR over ComAgent interface.
Default timer values are supported when vSTP is configured to operate at 10K MPS for each MP.
When vSTP is configured to operate at 20K MPS, then the t1Timer to t5Timer values has to be updated. For information about the updated timer values, see MMI API Specification.

3.2.6 Policy DRA (PDRA) Benchmarking

The Policy DRA (PDRA) application adds two additional database components, the SBR(session) (SBR-s) and the SBR (binding) (SBR-b). The DA-MP performance was also measured since the PDRA application puts a different load on the DA-MP than either running Relay or FABR traffic. There are two sizing metrics when determining how many SBR-s or SBR-g server groups (for example, horizontal scaling units) are required. The first is the MPS traffic rate seen at the DA-MPs. This is the metric that is benchmarked in this document. The second factor is the number of bindings (SBR-b) or sessions (SBR-s) that can be supported. This session or binding capacity is set primarily by the memory sizing of the VM and is fixed at a maximum of 16 million per SBR from the DSR 8.3 release. The number of bindings and sessions required for a given network are customer dependent. But a good starting place for engineering is to assume:

The number of bindings is equal to the number of subscribers supported by the PCRFs.
The number of sessions is equal to number of subscribers times the number of IPCAN sessions required on average for each subscriber. For instance, a subscriber might have one IPCAN session for LTE, and one for VoLTE.

Note:
The number of sessions is equal to or greater than the number of bindings.

3.2.6.1 Topology

Figure 3-9 SBR Testing Topology

3.2.6.2 Message Flow

Figure 3-10 PDRA Message Sequence

The following table shows the call model used for the testing. The message distribution is Oracle’s baseline benchmarking may differ significantly from customer distributions based on factors such as the penetration of LTE support in comparison with VoLTE support. The Traffic Details shows the configured PDRA options. For more details on these options, see Oracle Communications Diameter Signaling Router Policy and Charging Application User Guide.

Table 3-11 PDRA Test Call Model

Messages			Traffic Details
Message	Count	Distribution	Message	Distribution
CCR-I, CCA-I	1	7.14%	Gx with MSISDN Alternative Key, Gx Topology Hiding	100%
CCR-U, CCA-U	3	21.42%	Gx Topology Hiding	100%
CCR-T, CCA-T	1	7.14%	Gx Topology Hiding	100%
Gx RAR, RAA	3	21.42%	Gx Topology Hiding	100%
AAR, AAA Initial	2	14.29%	Rx Topology Hiding	100%
STR, STA	2	14.29%	Rx Topology Hiding	100%
Rx RAR, RAA	2	14.29%	Rx Topology Hiding	100%

Table 3-12 PDRA Performance Benchmarking

Scenario	Call Flow Model	SBR MPS Achieved	DA-MP Flavor	DA-MP Profile	Avg Msg Size	DA-MPCPU Peak	RAM Utilization Peak
Single Server group (1 SBR(s), 1 SBR(b))	100% PDRA	50K	12 vCPU (Regular)	30K_MPS	600	52%	29%
Single Server group (4 SBR(s), 4 SBR(b))	100% PDRA	200K	12 vCPU (Regular)	30K_MPS	648	56%	34%

3.2.6.3 Indicative Alarms or Events

Table 3-13 PDRA Alarms or Events

Number	Severity	Server	Name	Description
19814	Info	DA-MP	Communication Agent Peer has not responded to heartbeat	Communication Agent Peer has not responded to heartbeat.
19825	Major, Critical	DA-MP	Communication Agent Transaction Failure Rate	The number of failed transactions during the sampling period has exceeded configured thresholds.
19832	Info	DA-MP	Communication Agent Reliable Transaction Failed	Communication Agent Reliable Transaction Failed
22008	Info	DA-MP	Orphan Answer Response Received	An answer response was received for which no pending request transaction existed resulting in the Answer message being discarded.
22328	Minor	DA-MP	IcRate	Connection ingress message rate threshold crossed
22704	Info	DA-MP	Communication Agent Error	Policy and Charging server to SBR server communication failure
22705	Info	DA-MP	SBR Error Response Received	Policy and Charging server received response from SBR server indicating SBR errors
22714	Info	SBR	SBR RAR Initiation Error	SBR encountered an error while processing PCA initiated RAR requests
22716	Info	SBR	SBR Audit Statistics Report	SBR Audit Statistics Report
22718	Info	DA-MP	Binding Not Found for Binding Dependent Session Initiate Request	Binding record is not found for the configured binding keys in the binding dependent session-initiation request message
22741	Info	DA-MP	Failed to route PCA generated RAR	RAA with Unable To Deliver (3002) error is received at PCA for the locally generated RAR
31232	Critical, Minor	DA-MP	HA Late Heartbeat Warning	High availability server has not received a message on specified path within the configured interval
31236	Major	DA-MP	HA Link Down	High availability TCP link is down

3.2.7 Diameter Security Application (DSA) Benchmarking

Diameter Security application (DSA) applies counter measures for ingress messages received from external foreign network and for egress messages sent to external foreign network. Different counter measure profiles can be created for different IPX or roaming partners by enabling or disabling counter measures individually for different IPX provider or roaming partner Diameter Peers. DSA application is enabled on DA-MP and it uses vUDR to store context information.

3.2.7.1 Topology

Figure 3-11 DSA Testing Topology

The following stateful and stateless counter measure application configuration and the modes of operations used in benchmarking tests.

Table 3-14 Stateful and Statelss Counter Measures

Application Configuration Data		General Options Settings
Table Name	Count of Configured Entries	Options	Values
AppCmdCst_Config	2	Opcodes Accounting	Disabled
AppIdWL_Config	1	Max. UDR Queries per Message	5
AVPInstChk_Config	48	Max. Size of Application State	4800
Foreign_WL_Peers_Cfg_Sets	14	Logging of Vulnerable Messages	Enabled
MCC_MNC_List	11
MsgRateMon_Config	1
Realm_List	6
Security_Countermeasure_Config	19
SpecAVPScr_Config	1	Application Threads
System_Config_Options	1	Request	6
TimeDistChk_Config	2000	Answer	4
TTL_Config	5	SbrEvent	4
VplmnORCst_Config	1	AsyncEvent	2
TimeDistChk_Country_Config	2
TimeDistChk_Exception_List	0
TimeDistChk_Continent_Config	15
VplmnORCst_Config	1
RealmIMSICst_Config	210
Exception_Rule_Config	0
All Exception Types Table IMSI_Exception_Config MCC_MNC_Exception_Config Origin_Host_Exception_Config Realm_Exception_Config VPLMN_ID_Exception_Config	0

Note:

The following error is received during performance run, if the call rate is more than 1.7k in each MP DSA:

UDR Internal Error: Create record failed. Error Code = SendError

This is caused due to comagent connection getting timeout due to ttl expired.

Communication Agent Reliable Transaction Failed}  .. GN_INFO/INF Failure reason = Time to live limit exceeded

To avoid this, run the following commands from Active DSR NOAM before running performance traffic:

iset -fvalue=400 ComAgtConfigParams where "name='IntraNe Maximum Timeout Value'"
iset -fvalue=3 ComAgtConfigParams where "name='Maximum Number Of Retries'"

Table 3-15 DSA Performance Benchmarking

Counter Measure (CM)	Call Flow Model	DSR MPS Achieved (Per DA-MP)	DA-MP Flavor	UDR Flavor	DA-MP Profile	Avg Msg Size	CPU Peak	RAM Utilization Peak
Previous_Location_Check	15% Vulnerable and 85% Non Vulnerable traffic	9.2K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	29%	63%
Time_Distance_Check	15% Vulnerable and 85% Non Vulnerable traffic	9.8K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	34%	63%
Source_Host_Validation_Hss	15% Vulnerable and 85% Non Vulnerable traffic	9.6K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	34%	62%
Source_Host_Validation_Mme	15% Vulnerable and 85% Non Vulnerable traffic	10K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	33%	63%
Message_Monitoring_Rate	15% Vulnerable and 85% Non Vulnerable traffic	10K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	30%	63%
Session_Integrity_Validation_Chk	15% Vulnerable and 85% Non Vulnerable traffic	10K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	39%	63%
All Stateful	15% Vulnerable and 85% Non Vulnerable traffic	7.02K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	27%	51%
All Stateful + All stateless	15% Vulnerable and 85% Non Vulnerable traffic	5.25K	12 vCPU (Regular)	18vCPU	30K_MPS	2.0 K	23%	54%