22224 - Average Hold Time Limit Exceeded

Alarm Group:

DIAM

Description:

The average transaction hold time has exceeded its configured limits.

This alarm is generated when KPI #10098 (TmAvgRspTime) exceeds DSR-wide engineering attributes associated with average hold time, defined in the DA-MP profile assigned to the DA- MP server. KPI #10098 is defined as the average time (in milliseconds) from when the routing layer (DRL) receives a request message from a downstream peer to the time that an answer response is sent to that downstream peer. The source measurement of KPI #10098 is the TmResponseTimeDownstreamMp (10093) measurement.

This alarm indicates the average response time (TmAvgRspTime) for messages forwarded by the Relay Agent is larger than what is defined for a deployment as per DA-MP profile assignment. One of these problems could exist:

The IP network may be experiencing problems that are adding propagation delays to the forwarded request message and the answer response.
- Verify the IP network connectivity exists between the MP server and the adjacent nodes.
- View the event history logs for additional events or alarms from this MP server.
One or more upstream nodes may be experiencing traffic overload.
One or more MPs is experiencing traffic overload.
- View the KPI Routing Recv Msgs/Sec.
- View the CPU utilization of MPs by navigating to Main Menu > Status & Manage > Server.

Severity:

Minor, Major, Critical

Instance:

N/A

HA Score:

Normal

Auto Clear Seconds:

0 (zero)

OID:

eagleXgDiameterAvgHoldTimeLimitExceededNotify

Recovery:

The average transaction hold time is exceeding its configured limits, resulting in an abnormally large number of outstanding transactions that may be leading to excessive use of resources like memory.
- Reduce the average hold time by examining the configured Pending Answer Timer values and reducing any values that are unnecessarily large or small.
- Identify the causes for the large average delay between the DSR sending requests to the upstream peers and receiving answers for the requests.
- Confirm the peer node(s) or DSR is in overload by viewing KPI/Measurements/CPU usage and take corrective action.
- Identify the main contributor to increased value of (T2-T1) such as a time difference between the routing layer (DRL) receiving the request to the DRL sending the answer to downstream peer.
The alarm thresholds are configurable on Diameter Common > MPs > Profiles:
- Average hold time minor alarm onset threshold
- Average hold time minor alarm abatement threshold
- Average hold time major alarm onset threshold
- Average hold time major alarm abatement threshold
- Average hold time critical alarm onset threshold
- Average hold time critical alarm abatement threshold
The severity of the alarm (Minor, Major, or Critical) is according to the onset threshold/abatement threshold of each severity level. When the average hold time initially exceeds the average hold time for an alarm onset threshold, a minor, major, or critical alarm is triggered. When the average hold time subsequently exceeds a higher onset threshold, or drops below an abatement threshold, but is still above the minor alarm abatement threshold, the alarm severity changes based on the highest onset threshold crossed by the current average hold time.
If the problem persists, it is recommended to contact My Oracle Support.