Diameter Policy Server High Availability

External Policy Server HA Cluster

When using TCP transport, the Oracle Communications Session Border Controller (OCSBC) can provide external policy server redundancy through a combination of multiple servers being returned in one FQDN query and maintaining state of these servers.

Note:

Server configurations using SCTP transport (for the Rx interface) use SCTP multihoming configurations on the OCSBC and/or as provided by the server to establish redundancy.

When multiple IP addresses are returned in a response to a DNS query for Diameter-based external policy servers, the OCSBC assembles the IP addresses into an HA cluster which provides redundancy for Diameter-based applications. This feature is enabled by configuring the external policy server's address parameter with an FQDN.

The number of servers maintained in the HA cluster is configured with the max connections parameter. Thus if the max connections parameter is set to 3, the OCSBC maintains 1 active external policy server with 2 back up servers.

Standby Server Prioritization

When using TCP transport, the Oracle Communications Session Border Controller looks at the priority and weight fields in the DNS response to create the preferred order of the primary, secondary, tertiary, and quartiary Policy Servers. These 4 addresses are known as the top-level PSs in the cluster.

The Oracle Communications Session Border Controller uses the priority and weights according to RFC 2782. However, weights only apply when multiple servers share the same priority value. If the priorities are different, then the Oracle Communications Session Border Controller will not use weights. If the priorities are the same, then weight of the two (or more) contested servers is used to determine which one to use.

Server States

An HA cluster can contain up to 4 Policy Servers, where the TCP/Diameter connection is established and monitored. Diameter session traffic is only sent to the active policy server in the cluster. The policy servers exist in the following states:

  • active-TCP and Diameter connection established. Oracle Communications Session Border Controller using this server for policy decisions. The policy server with the highest priority/weight begins in this state.
  • standby- TCP and Diameter connection established. Server in standby mode.
  • inactive - Diameter connection not successfully established. Oracle Communications Session Border Controller tries to reconnect to inactive servers.

HA Cluster Refresh (TCP Transport)

The Oracle Communications Session Border Controller sends follow-up SRV queries to the DNS server to refresh the list of available policy servers in the cluster in the following instances:

  • DNS cache expires after the TTL is exceeded
  • a new policy server with FQDN for an address is configured, saved, and activated on the Oracle Communications Session Border Controller
  • after an SPU switchover, the newly active SPU performs a DNS query

When the Oracle Communications Session Border Controller re-queries the DNS server for Diameter external policy servers, the cluster is refreshed with the new/changed servers. Policy server priority is also reconfigured based on newly returned priorities and weights. Upon a cluster refresh, the Oracle Communications Session Border Controller:

  • closes connections with standby policy servers that are no longer in the cluster
  • creates connections with policy servers which are new in the set

If the currently active policy server remains a member of the cluster after a refresh, it remains active even if its priority has changed. If the active-before-the-refresh policy server is not a member of the cluster after a DNS refresh, the Oracle Communications Session Border Controller gracefully closes the connection to this server. The Oracle Communications Session Border Controller then installs the highest priority server as the active policy server

DNS Failure

If the Oracle Communications Session Border Controller fails to get a response from the DNS server or does not receive at least one IP address in the SRV RR, it continues to send SRV queries periodically, starting with 5 seconds and doubling the interval for every sequential failure, until it receives a valid response. While waiting for a successful DNS response, the Oracle Communications Session Border Controller uses the existing Diameter servers in the DNS cache.

Policy Server Failover

When configured for TCP transport, the active external policy server fails over to the highest priority standby server when:

  • the TCP connection is closed due to a RST or FIN
  • the Diameter connection (CER/CEA exchange) could not be established
  • a configured number of consecutive Diameter message timeouts occur. This number is configured in the max timeouts parameter. The max timeouts parameter refers to all Diameter messages except for the following:
  • You can configure Diameter keepalive message time-outs separately from all other Diameter messages by setting watchdog ka timer in the external policy server configuration element. This will failover the active policy server based on a timeout value for DWR messages.
  • You can configure Diameter STR message time-outs separately from all other Diameter messages by setting the str-retry=<timeout number> option in the external policy server configuration element. This will failover the active policy server based on a unique timeout value for STR messages.

If the Oracle Communications Session Border Controller sends an AAR/STR message to the active policy server and then switches to a different policy server, any new Diameter messages related to that session are sent to the same policy server as long as it is not inactive. If that server becomes inactive, messages will be sent to the new policy server. The new policy server however will not recognize the sessionID and discard the request.

External Policy Server High Availability Configuration

To configure an ext-policy-server that uses TCP transport for high availability clustering:

  1. In Superuser mode, type configure terminal and press Enter.
    ORACLE# configure terminal
  2. Type media-manager and press Enter to access the media-related configurations.
    ORACLE(configure)# media-manager
  3. Type ext-policy-server and press Enter. The system prompt changes to let you know that you can begin configuring individual parameters.
    ORACLE(media-manager)# ext-policy-server
    ORACLE(ext-policy-server)#
  4. address—Enter the IP address or FQDN of the external policy server. To use external policy server redundancy with TCP transport, you must enter the address as an FQDN.
  5. port—If you configure this parameter, it will override the port returned in a DNS reply.
  6. max-connections—Set the number of servers to maintain in an external policy server cluster. Valid values range from 1 - 20.
  7. srv-selection-strategy—Leave this parameter at its default, Failover setting.
  8. max-timeouts—Set the number of request timeouts before the Oracle Communications Session Border Controller sets this external policy server to inactive. You can separately set the number of DWR timeouts that trigger a server to be inactive as an option. Valid values range from 0 - 200.
  9. Save and activate your configuration.

Redundancy for Rx Servers over SCTP

The Oracle Communications Session Border Controller (OCSBC) uses multihomed IPs advertised in the SCTP handshake and the IP configured in the remote-multi-addr-list parameter to establish Rx server redundancy. You configure multi-homing within the ext-policy-server configuration and the applicable timing, on a global basis, in the system-config.

See Configuring the Rx Interface for SCTP for instructions on configuring multi-homing for an external policy server. See Configuring SCTP Support for SIP for global SCTP configuration including timing settings.

You must have a thorough understanding of network and the routing path to the server prior to configuration to achieve connectivity and redundancy. You configure the ext-policy-server's primary IP address and multi-homing addresses based on this understanding. The OCSBC must be able to establish paths between it and the ext-policy-server's address as well as between the multi-homing addresses via separate network-interfaces and unique routes within the same realm.

Note:

Multi-homing configuration fields accept IPv4 and IPv6 IP addresses, but not FQDNs.

If you configure an FQDN for an ext-policy-server address, you must have also configured a DNS server on the realm's first network-interface. When triggered to connect to these policy servers, the OCSBC sends the DNS queries on that network-interface.

The OCSBC connects to servers using the following steps:

  1. The OCSBC attempts to establish a connection between itself and the ext-policy-server's configured address through the first network-interface listed in the realm configuration. If you have configured an FQDN as an address, the OCSBC performs a DNS lookup and uses the first address provided as the server address.
  2. If the first connection attempt fails and you have not configured the remote-multi-addr-list parameter with at least one address, the connection to the server fails.
  3. If you have configured multi-homing addresses and the first connection attempt fails, the OCSBC attempts to establish a connection with the IP you configured in the remote-multi-addr-list parameter. This IP can be reachable through any of the configured network-interfaces present in the realm.
  4. If the OCSBC cannot determine a route, it tries to use a route between the local primary IP and the default gateway.
  5. During the init and init-ack, the remote agent may advertise additional multi-homing address. If it receives these additional addresses, the OCSBC can use these addresses as fail-over addresses should the existing connection subsequently fail.