Policy Alerts

7 Policy Alerts

This section provides information on policy alerts and their configuration. It includes:

Policy Control Function Alerts

This section includes information about alerts for PCF.

Table 7-1 Common Alerts

Alert Name	Description	Severity
PCF_SERVICES_DOWN	Alert if any PCF service down for 5mins for given namespace in AlertRules file	Critical
IngressErrorRateAbove10PercentPerPod	Alert if ingress error rate on each pod above 10%	Critical

Table 7-2 SM Service Alerts

Alert Name	Description	Severity
SMTrafficRateAboveThreshold	Alert if Ingress traffic on SM service reaches 90% of max MPS in 2mins	Major
SMIngressErrorRateAbove10Percent	Alert if Ingress transaction error rate exceeds 10% of all SM transactions in last 24 hours	Critical
SMEgressErrorRateAbove1Percent	Alert if Egress transaction error rate exceeds 1% of all SM transactions in last 24 hours	Minor

Table 7-3 Diameter Connector Alerts

Alert Name	Description	Severity
DiamTrafficRateAboveThreshold	Alert if Diameter Connector traffic reaches 90% of max MPS	Major
DiamIngressErrorRateAbove10Percent	Alert if error rate exceeds 10% of all Diameter transactions in last 24 hours	Critical
DiamEgressErrorRateAbove1Percent	Alert if Egress transaction error rate exceeds 1% of all Diameter transactions	Minor

Table 7-4 User Service - UDR Alerts

Alert Name	Description	Severity
PcfUdrIngressTrafficRateAboveThreshold	Alert if Ingress traffic from UDR reaches 90% of max MPS	Major
PcfUdrEgressErrorRateAbove10Percent	Alert if error rate exceeds 10% of all UDR transactions	Critical

Table 7-5 User Service - CHF Alerts

Alert Name	Description	Severity
PcfChfIngressTrafficRateAboveThreshold	Alert if Ingress traffic from CHF reaches 90% of max MPS	Major
PcfChfEgressErrorRateAbove10Percent	Alert if error rate exceeds 10% of all CHF transactions	Critical

Table 7-6 PolicyDS Service Alerts

Alert Name	Description	Severity
PolicyDsIngressTrafficRateAboveThreshold	Alert if Ingress traffic reaches 90% of max MPS	Major
PolicyDsIngressErrorRateAbove10Percent	Alert if Ingress error rate exceeds 10% of all PolicyDS transactions	Critical
PolicyDsEgressErrorRateAbove1Percent	Alert if Egress error rate exceeds 10% of all PolicyDS transactions	Minor

Table 7-7 Binding Service Alerts

Alert Name	Description	Severity
BindingServiceIngressTrafficRateAboveThreshold	Alert if Ingress traffic reaches 90% of max MPS	Major
BindingServiceIngressErrorRateAbove10Percent	Alert if Ingress error rate exceeds 10% of all Binding Service transactions	Critical
BindingServiceEgressErrorRateAbove1Percent	Alert if Egress error rate exceeds 10% of all BindingService transactions	Minor

PCF Alert Configuration

This section describes the Measurement based Alert rules configuration for PCF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

PCF Alert Configuration

Note:

The alertmanager and prometheus tools should run in Oracle CNE namespace, for example, occne-infra.
Alert file is packaged with PCF Custom Templates. The PCF Templates.zip file can be downloaded from OHC. Unzip the PCF Templates.zip file to get PcfAlertRules.yaml file.
Edit the value of the following parameters in thePcfAlertRules.yaml file before following the procedure for configuring the alerts:
- [ 90% of Max MPS].
  For Example, if the value of Max MPS is 10000, set [ 90% of Max MPS] as 9000 in yaml file as follows:
```
sum(rate(ocpm_ingress_request_total{servicename_3gpp="npcf-smpolicycontrol"}[2m])) >=9000
```
- kubernetes_namespace.
  For Example,
  If PCF is deployed at more than one site, set kubernetes_namespace in yaml file as follows:
```
expr: up{kubernetes_namespace=~"pcf|ocpcf"} == 0
```
  If PCF is deployed at only one site, set kubernetes_namespace in yaml file as follows:
```
expr: up{kubernetes_namespace="pcf"}==0
```

To Configure PCF alerts in Prometheus:

Find the config map to configure alerts in prometheus server by executing the following command:
```
kubectl get configmap -n <Namespace>
```
where, <Namespace> is the prometheus server namespace used in helm install command.
For Example, assuming prometheus server is under occne-infra namespace, execute the following command to find the config map:
```
kubectl get configmaps -n occne-infra  | grep prometheus-server
```
0utput: occne-prometheus-server 4 46d
Take Backup of current config map of prometheus server by executing the following command:
```
kubectl get configmaps <Name> -o yaml -n <Namespace> > /tmp/t_mapConfig.yaml
```
where, <Name> is the prometheus config map name used in helm install command.
Check if alertspcf is present in the t_mapConfig.yaml file by executing the following command:
```
cat /tmp/t_mapConfig.yaml  | grep alertspcf
```
If alertspcf is present, delete the alertspcf entry from the t_mapConfig.yaml file, by executing the following command:
```
sed -i '/etc\/config\/alertspcf/d' /tmp/t_mapConfig.yaml
```
Note:
This command should be executed only once.
If alertspcf is not present, add the alertspcf entry in the t_mapConfig.yaml file by executing the following command:
```
sed -i '/rule_files:/a\    \- /etc/config/alertspcf'  /tmp/t_mapConfig.yaml
```
Note:
This command should be executed only once.
Reload the config map with the modifed file by executing the following command:
```
kubectl replace configmap <Name> -f /tmp/t_mapConfig.yaml
```
Add PcfAlertRules.yaml file into prometheus config map by executing the following command :
```
kubectl patch configmap <Name> -n <Namespace> --type merge --patch
"$(cat <PATH>/PcfAlertRules.yaml)"
```
where, <PATH> is the location of the PcfAlertRules.yaml file.
Restart prometheus-server pod.
Verify the alerts in prometheus GUI. Below screenshot displays the PCF alerts:

Cloud Native Policy and Charging Rule Function Alerts

This section includes information about alerts for CNPCRF.

Alarm Name	Alarm Description	Severity	App/Metrics
PRE_UNREACHABLE	PRE is unreachable	CRITICAL	Metrics
PDS_DOWN	PDS is down	CRITICAL	Metrics
PDS_UP	PDS is up	INFO	Metrics
DB_UNREACHABLE	Connectivity to DB lost	CRITICAL	Metrics
DB_REACHABLE	Connectivity to DB available	INFO	Metrics
SH_UNREACHABLE	Remote Sh connection is unreachable	CRITICAL	App
SY_UNREACHABLE	Remote Sy connection is unreachable	CRITICAL	App
SOAP_CONNECTOR_DOWN	SOAP Connector is down	CRITICAL	Metrics
SOAP_CONNECTOR_UP	SOAP Connector is up	INFO	Metrics
CONFIG_SERVER_DOWN	Config server is down	CRITICAL	Metrics
CONFIG_SERVER_UP	Config server is up	INFO	Metrics
DIAM_GATEWAY_DOWN	Diameter Gateway is down	CRITICAL	Metrics
DIAM_GATEWAY_UP	Diameter Gateway is up	INFO	Metrics
LDAP_GATEWAY_DOWN	LDAP Gateway is down	CRITICAL	Metrics
LDAP_GATEWAY_UP	LDAP Gateway is up	INFO	Metrics
LDAP_DATASOURCE_UNREACHABLE	LDAP Datasource is unreachable	CRITICAL	App
CM_SERVICE_DOWN	CM Service is down	CRITICAL	Metrics
CM_SERVICE_UP	CM Service is up	INFO	Metrics
CCA_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAI_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-I Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAT_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-T Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAU_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-U Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
ASA_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of ASA Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
RAA_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of RAA Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
STA_SEND_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of STA Send Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCA_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAI_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-I Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAT_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-T Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCAU_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of CCA-U Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
ASA_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of ASA Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
RAA_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of RAA Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
STA_RECV_FAIL_COUNT_EXCEEDS_THRESHOLD	Rate of STA Receive Failure has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCR_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of CCR Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCRI_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of CCR-I Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCRT_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of CCR-T Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
CCRU_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of CCR-U Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
ASR_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of ASR Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
RAR_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of RAR Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics
STR_TIMEOUT_COUNT_EXCEEDS_THRESHOLD	Rate of STR Timeout count has exceeded threshold limit(1000 times) in 1 min	CRITICAL	Metrics

PCRF Alert Configuration

This section describes the Measurement based Alert rules configuration for CNPCRF. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

PCRF Alert Configuration

To configure cnPCRF alerts in Prometheus:

Note:

The alert manager and prometheus tools should run in the default namespace.
The PCRF Templates.zip file can be downloaded from OHC. Unzip the package after downloading to get cnpcrfalertrule.yaml and mib files.

Find the config map to configure alerts in prometheus server by executing the following command:
```
kubectl get configmap -n Namespace
```
where, Namespace is the namespace used in helm install command.
Take Backup of current config map of prometheus server by executing the following command:
```
kubectl get configmaps NAME -o yaml -n Namespace  /tmp/t_mapConfig.yaml
```
where, Name is the release name used in helm install command.
Delete the entry alertscnpcrf under rule_files, if present, in the Alert Manager config map by executing the following command:
```
sed -i '/etc\/config\/alertscnpcrf/d' /tmp/t_mapConfig.yaml
```
Note:
This command should be executed only once.
Add entry alertscnpcrf under rule_files in the prometheus server config map by executing the following command:
```
sed -i '/rule_files:/a\    \- /etc/config/alertscnpcrf'  /tmp/t_mapConfig.yaml
```
Note:
This command should be executed only once.
Reload the modified config map by executing the following command:
```
kubectl replace configmap <_NAME_> -f /tmp/t_mapConfig.yaml
```
Note:
This step is not required for AlertRules.

Add cnpcrfAlertrules in config map by executing the following command :

kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch
"$(cat ~/cnpcrfAlertrules.yaml)"