Rebalancing

Rebalancing, as opposed to balancing, is taking some number of existing endpoints from functioning OCSBCs and redistributing these existing endpoints between current cluster members. Rebalancing can be automatically scheduled when a new OCSBC joins an existing cluster, or immediately invoked with the Acme Packet Command Line Interface (ACLI). When an OCSBC exits a cluster, whatever the reason, all of its endpoints are invalidated on the Oracle Communications Subscriber-Aware Load Balancer (SLB) and those endpoints are essentially balanced when they revisit the SLB.

A new OCSBC joins an existing cluster by initiating the establishment of an IP-in-IP tunnel between itself and the SLB. During an initial handshake the OCSBC designates which SLB service port or ports it is prepared to support. If there are existing OCSBCs supporting these designated service ports, the SLB instructs some or all of these OCSBCs to divest themselves of a specified number of endpoints. The SLB calculates the number of divested endpoints based upon the overall occupancy of that service relative to the SLB’s contribution to that occupancy. Existing cluster members not advertising support for service ports designated by the new cluster member are excluded from the rebalance queue.

The SLB sequences through eligible cluster members one at a time, using a proprietary protocol to request nomination and removal of eligible endpoints. The OCSBC replies with a CCP response that lists candidate endpoints. The SLB removes existing forwarding rules associated with those endpoints, and repeats the CCP request/response process until the cluster member divests itself of the specified number of endpoints.

When the divested endpoints re-engage with the SLB (upon their next scheduled registration refresh, for example), the SLB lacks a forwarding rule that maps them to a specific OCSBC. Consequently, the message is passed up to the software processes running on the SLB’s host, which chooses a new OCSBC destination for that endpoint – presumably, the new cluster member that has the most available capacity.

The cluster member, after being requested to nominate endpoints for rebalancing, uses several criteria for choosing the most attractive candidates. As part of its standard SIP processing performed by SBCs, the cluster member is aware of the expiry times for all of the entries in its SIP registration cache. Therefore, the cluster member can predict with a high degree of accuracy when any given endpoint will be signaling back into the cluster. As the forwarding rules on the cluster member are triggered by endpoint messages, the cluster member considers an endpoint whose registration entry is due to expire shortly an attractive candidate for rebalance. Note, however, that in many cases it is not prudent to nominate endpoints whose SIP registration cache entries are due to expire immediately, as this can cause a race condition between the CCP response and the SIP REGISTER message from the endpoint to the SIP registration function. To avoid this potential dilemma, cluster members are equipped with the ability to skip ahead to candidates whose expiry is not immediate.

Further, each cluster member categorizes the endpoints stored in its cache based upon a priority value that is determined via the SLB’s distribution policy (see Distribution Policy Configuration for more details). It nominates endpoints from its lowest priority buckets first.

Finally, the SLB does not rebalance an active SIP endpoint — an endpoint engaged in a phone conversation.

After removing endpoints from the first cluster member, the SLB moves to the next cluster member in the rebalance queue and uses the same CPP request/response exchange to remove additional endpoints. The same procedure repeats for additional cluster members until the SLB attains the target number of divested endpoints.