PCA Data Auditing

P-DRA Binding/Session Database

In most cases, Binding and Session database records are successfully removed as a result of signaling to terminate Diameter sessions. There are, however, instances in which signaling incorrectly removed a session and did not remove a database record that should have been removed. The following cases can result in stale Binding or Session records:
  • No Diameter session termination message is received when the UE no longer wants the session.
  • IP signaling network issues prevent communication between MPs that would have resulted in one or more records being deleted.
  • SBR congestion could cause stack events to be discarded that would have resulted in removal of a Binding or Session record.

To limit the effects of stale Binding and Session records, all SBRs that own an active part of the database continually audit each table to detect and remove stale records. The audit is constrained by both minimum and maximum audit rates. The actual rate varies based on how busy the SBR server is. Audit has no impact on the engineered rate of signaling.

Binding table audits are confined to confirming with the Session SBR that the session still exists. If the session exists, the record is considered valid and the audit makes no changes. If the session does not exist, however, the record is considered to be an orphan and is removed by the audit.

Session table audits work entirely based on valid session lifetime. When a session is created, it is given a lifetime for which the session will be considered to be valid regardless of any signaling activity. Each time an RAA is processed, the lifetime is renewed for a session. The duration of the lifetime defaults to 7 days, but can be configured in one of two ways:
  • The default duration can be configured using the NOAM Policy and Charging > Configuration > General Options GUI page.
  • A session lifetime can be configured per Access Point Name using the NOAM Policy and Charging > Configuration > Access Point Names GUI page.
If the session initiating message (CCR-I) contains a Called-Station-Id AVP (an Access Point Name) and the Access Point Name is configured in the Access Point Names GUI, the session will use the value associated with that Access Point Name for the session lifetime value. If the session initiating message contains no Called-Station-Id Access Point Name, or contains a Called-Station-Id Access Point Name that is not configured in the Access Point Names GUI, the default session lifetime from Network-Wide Options will be used.

If the audit discovers a session record for which the current time minus the last touched time (either when the session was created, or for P-DRA only, when the last RAA was processed, whichever is more recent) exceeds the applicable session lifetime, the record is considered to be stale. For P-DRA, stale records are scheduled for Policy and Charging initiated RAR messages to query the policy client that created the session to ask if the session is still valid.

Generally, SBR servers are engineered to run at 80% of maximum capacity. The audit is pre-configured to run within the 20% of remaining capacity. Audit will yield to signaling. Audit can use the upper 20% only if signaling does not need it.

The maximum audit rate is configurable (with a default of 12,000) so that the audit maximum rate can be tuned according to the customer's traffic levels. For example, if the SBR servers are using only 50% capacity for signaling, a higher rate could be made available to audit.

If the SBR signaling load plus the audit load cause an SBR server to exceed 100% capacity, that SBR server will report congestion, which will cause an automatic suspension of auditing. Audit will continue to be suspended until no SBR server is reporting congestion. Any SBR on which audit is suspended will have minor alarm 22715 to report the suspension. The alarm is cleared only when congestion abates.

An SBR server determines that it is in congestion by examining the rate of incoming stack events.
  • Local congestion refers to congestion at the SBR server that is walking through Binding or Session table records.
  • Remote congestion refers to congestion at one of the Session SBR servers that a Binding SBR server is querying for the existence of session data (using sessionRef).

A Binding SBR server will suspend audit processing if the server on which it is running is congested (local congestion), or if any of the Session SBR servers to which it is connected through ComAgent connections have reported congestion (remote congestion). Audit processing will remain suspended until both local congestion and all instances of remote congestion have abated.

A Session SBR server will suspend audit processing if the server on which it is running is congested (local congestion). The Session SBR does not have to worry about remote congestion because it does not rely on binding data to perform its auditing function. Recall that session records are removed by audit if they are determined to be stale and the policy client that created the session indicates that the session is no longer needed (or if the session integrity feature has exhausted all attempts to communicate with a policy client that created a session). Session auditing will remain suspended until the local congestion abates.

When an SBR server starts up (i.e. SBR process starts), or when an SBR's audit resumes from being suspended, the audit rate ramps up using an exponential slow-start algorithm. The audit rate starts at 1500 records per second and is doubled every 10 seconds until the configured maximum audit rate is reached.

In addition to the overall rate of record auditing described above, the frequency at which a given table audit can be started is also controlled. This is necessary to avoid needless frequent auditing of the same records when tables are small and can be audited quickly. A given table on an SBR server will be audited no more frequently than once every 10 minutes.

In order to have some visibility into what the audit is doing, the audit generates Event 22716 "SBR Audit Statistics Report" with audit statistics at the end of each pass of a table. The format of the report varies depending on which table the audit statistics are being reported for.

PCA Configuration Database

A number of Policy and Charging configuration database tables, i.e. PCRFs, Policy Clients, OCSs and CTFs are configured at the SOAM but contain data that are required network-wide. The site-wide portions of the data are stored at the SOAM servers. The network-wide portions of the data are stored globally at the NOAM. Due to the distributed nature of this data (the split between SOAM and NOAM), there is a PCA Configuration Database Audit which executes in the background to verify that all the related configuration tables for this data are in sync between SOAMs and the NOAM.

The PCA Configuration Database Audit executes on the SOAM periodically every 30 seconds in the background and will audit all the related configuration tables between SOAM and NOAM for PCRFs, Policy Clients, OCSs and CTFs. If the audit detects that there are any discrepancies among these tables, it will automatically attempt to resolve the discrepancies and validate that they are back in sync.

The configuration database can get out of sync due to a database transaction failure or due to operator actions. If an operator performs a database restore at the NOAM using a database backup that does not have all the network-wide data corresponding to the current SOAM configuration, then the database will not be in sync between SOAM and NOAM. Similarly, if an operator performs a database restore at an SOAM using a database backup that does not have the configuration records corresponding to network-wide data stored at the NOAM, then the database again will not be in sync. The audit is designed to execute without operator intervention and correct these scenarios where configuration data is not in sync between SOAM and NOAM.

If the audit fails to correct the database tables, the audit will assert Alarm 22737 (Configuration Database Not Synced). The audit continues to execute periodically every 30 seconds to attempt to correct the database tables. If the audit successfully corrects and validates the tables during an audit pass, it will clear Alarm 22737.

Note: All statements about database tables in this section only apply to configuration tables related to PCRFs, Policy Clients, OCSs and CTFs because the PCA Configuration Database Audit executes only on the database tables where it is necessary for the data to be split across SOAM and NOAM.

OC-DRA Session Database

The Session Database Audit is enhanced to detect and remove stale binding independent session (i.e., Gy/Ro session) data stored in the Session SBR. Session state maintained in the Session SBR for Gy/Ro session-based credit-control is considered stale when a CCR/CCA-U or RAR/RAA has not been exchanged for the session for a length of time greater than or equal to the Stale Session Timeout value (in hours) as configured by the Network OAM GUI. If the binding independent session is associated with an APN configured in the Network OAM GUI Main Menu > Policy and Charging > Configuration > Access Point Names, then the Stale Session Timeout value associated with the APN is used. Otherwise, the default Stale Session Timeout value configured in the Network OAM GUI Main Menu > Policy and Charging > Configuration > General Options is used.

Stale Gy/Ro sessions can occur for various reasons: