6 Configuring Alerts
To configure Provisioning Gateway alerts on the Prometheus server:
Note:
In the below procedure, _NAME_ is the Helm Chart Release Name and _Namespace_ is the Prometheus NameSpace.- Execute the following command to take the backup of current config map of
Prometheus.
kubectl get configmaps occne-prometheus-server -o yaml -n occne-infra > /tmp/tempConfig.yaml
- Check and add provisioning gateway alert file name inside Prometheus config
map as shown below:
sed -i '/etc\/config\/alertsprovgw/d' /tmp/tempConfig.yaml sed -i '/rule_files:/a\ \- /etc/config/alertsprovgw' /tmp/tempConfig.yaml
- Execute the following command to update the config map with updated file
name of provgw alert file.
kubectl replace configmap occne-prometheus-server -f /tmp/tempConfig.yaml
- Execute the following command to add provgw alert rules in config map under
file name of provgw alert file.
kubectl patch configmap occne-prometheus-server -n occne-infra --type merge --patch "$(cat ~/ProvgwAlertrules.yaml)"
Note:
Prometheus server takes updated config map, which reloads automatically after sometime (~20 sec).
Provisioning Gateway Alert Config Details
This section shares the alert config details of the ProvgwAlertrules.yaml file.
Note:
The default nameSpace of Provisioning Gateway is provgw. Update it according to the deployment.apiVersion: v1
data:
alertsudr: |
groups:
- name: ProvgwAlerts
rules:
- alert: ProvgwTrafficRateAboveMinorThreshold
annotations:
description: 'Ingress traffic Rate is above minor threshold i.e. 800 requests
per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name=
"ingressgateway",kubernetes_namespace="provgw"}[20m])) >= 800 < 900
labels:
severity: Minor
- alert: ProvgwTrafficRateAboveMajorThreshold
annotations:
description: 'Ingress traffic Rate is above major threshold i.e. 900 requests
per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 90 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name=
"ingressgateway",kubernetes_namespace="provgw"}[20m])) >= 900 < 950
labels:
severity: Major
- alert: ProvgwTrafficRateAboveCriticalThreshold
annotations:
description: 'Ingress traffic Rate is above critical threshold i.e. 950 requests
per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 95 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{app_kubernetes_io_name=
"ingressgateway",kubernetes_namespace="provgw"}[20m])) >= 950
labels:
severity: Critical
- alert: ProvgwTransactionErrorRateAbove0.1Percent
annotations:
description: 'Transaction Error rate is above 0.1 Percent of Total Transactions
(current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 0.1 Percent of Total
Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m])
or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m]))
* 100 >= 0.1 < 1
labels:
severity: Warning
- alert: ProvgwTransactionErrorRateAbove1Percent
annotations:
description: 'Transaction Error rate is above 1 Percent of Total Transactions
(current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 1 Percent of Total Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m])
or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total
{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m]))
* 100 >= 1 < 10
labels:
severity: Warning
- alert: ProvgwTransactionErrorRateAbove10Percent
annotations:
description: 'Transaction Error rate is above 10 Percent of Total Transactions
(current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 10 Percent of Total
Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}
[20m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total
{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m]))
* 100 >= 10 < 25
labels:
severity: Minor
- alert: ProvgwTransactionErrorRateAbove25Percent
annotations:
description: 'Transaction Error Rate detected above 25 Percent of Total
Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 25 Percent of Total
Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m])
or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total
{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m])) *
100 >= 25 < 50
labels:
severity: Major
- alert: ProvgwTransactionErrorRateAbove50Percent
annotations:
description: 'Transaction Error Rate detected above 50 Percent of Total
Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 50 Percent of Total
Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",
app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m]) or
(up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total
{app_kubernetes_io_name="ingressgateway",kubernetes_namespace="provgw"}[20m]))
* 100 >= 50
labels:
severity: Critical
- alert: ProvgwTransientErrorAbove1Percent
annotations:
description: 'Total number of response if subscriber not found is about 1% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 1% of
ingress traffic'
expr: (sum(rate(udr_rest_transient_error{kubernetes_namespace="provgw"}[10m]))
/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="provgw"}
[10m])))*100 >= 1 < 10
labels:
severity: Warning
- alert: ProvgwTransientErrorAbove10Percent
annotations:
description: 'Total number of response if subscriber not found is about 10% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 10% of
ingress traffic'
expr: (sum(rate(udr_rest_transient_error{kubernetes_namespace="provgw"}[10m]))/
sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="provgw"}
[10m])))*100 >= 10 < 25
labels:
severity: Minor
- alert: ProvgwTransientErrorAbove25Percent
annotations:
description: 'Total number of response if subscriber not found is about 25% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 25% of
ingress traffic'
expr: (sum(rate(udr_rest_transient_error{kubernetes_namespace="provgw"}[10m]))
/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="provgw"}
[10m])))*100 >= 25 < 50
labels:
severity: Major
- alert: ProvgwTransientErrorAbove50Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(udr_rest_transient_error{kubernetes_namespace="provgw"}[10m]))
/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="provgw"}
[10m])))*100 >= 50
labels:
severity: Critical
- alert: ProvgwSegmentDownAbove1Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(udr_rest_service_unavailable{kubernetes_namespace="provgw"}
[10m]))/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace=
"provgw"}[10m])))*100 >= 1 < 10
labels:
severity: Warning
- alert: ProvgwSegmentDownAbove10Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(udr_rest_service_unavailable{kubernetes_namespace="provgw"}[10m]))
/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="provgw"}
[10m])))*100 >= 10 < 25
labels:
severity: Minor
- alert: ProvgwSegmentDownAbove25Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(udr_rest_service_unavailable{kubernetes_namespace="provgw"}
[10m]))/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace=
"provgw"}[10m])))*100 >= 25 < 50
labels:
severity: Major
- alert: ProvgwSegmentDownAbove50Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(udr_rest_service_unavailable{kubernetes_namespace="provgw"}
[10m]))/sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace=
"provgw"}[10m])))*100 >= 50
labels:
severity: Critical
- alert: ProvgwAuditMismatchAbove1Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_responsemismatch{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100
>= 1 < 10
labels:
severity: Warning
- alert: ProvgwAuditMismatchAbove10Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_responsemismatch{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100 >=
10 < 25
labels:
severity: Minor
- alert: ProvgwAuditMismatchAbove25Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_responsemismatch{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100
>= 25 < 50
labels:
severity: Major
- alert: ProvgwAuditMismatchAbove50Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_responsemismatch{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100 >= 50
labels:
severity: Critical
- alert: ProvgwAuditTransientErrorAbove1Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_transient_error{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100
>= 1 < 10
labels:
severity: Warning
- alert: ProvgwAuditTransientErrorAbove10Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_transient_error{kubernetes_namespace="provgw"}
[10m]))/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100 >= 10 < 25
labels:
severity: Minor
- alert: ProvgwAuditTransientErrorAbove25Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_transient_error{kubernetes_namespace="provgw"}[10m]))
/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100 >= 25 < 50
labels:
severity: Major
- alert: ProvgwAuditTransientErrorAbove50Percent
annotations:
description: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
summary: 'Total number of response if subscriber not found is about 50% of
ingress traffic'
expr: (sum(rate(provgw_audit_transient_error{kubernetes_namespace="provgw"}[10m]))
/sum(rate(provgw_audit_total{kubernetes_namespace="provgw"}[10m])))*100 >= 50
labels:
severity: Critical