NSSF Alert Configuration
Follow the steps below for NSSF Alert configuration in Prometheus:
Note:
- By default Namespace for OCNSSF is
ocnssf
that must be updated as per the deployment. - The
OCNSSF-config-1.4.0.0.0.zip
file can be downloaded from OHC. Unzip theOCNSSF-config-1.4.0.0.0.zip
package after downloading to getNssfAlertrules-1.4.0.yaml
file.
Procedure
- Take a backup of current configuration map of Prometheus:
kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/ tempConfig.yaml
- Check and add OCNSSF Alert file name inside Prometheus
configuration map:
sed -i '/etc\/config\/alertsnssf/d' /tmp/tempConfig.yaml sed -i '/rule_files:/a\ \- /etc/config/alertsnssf' /tmp/tempConfig.yaml
- Update configuration map with updated file name of OCNSSF alert
file:
kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
- Add OCNSSF Alert rules in configuration map under file name of
OCNSSF alert file:
kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch "$(cat ~/NssfAlertrules.yaml)"
Note:
The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 20 seconds. Refresh the Prometheus GUI to confirm that the OCNSSF Alerts have been reloaded.OCNSSF Alert Config Details
Note:
By default the NameSpace is set to ocnssf. Must update it according to the requirement.Sample
apiVersion: v1
data:
alertsnssf: |
groups:
- name: OcnssfAlerts
rules:
- alert: OcnssfTrafficRateAboveMinorThreshold
annotations:
description: 'Ingress traffic Rate is above minor threshold i.e. 80 requests per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 80 < 90
labels:
severity: Minor
- alert: OcnssfTrafficRateAboveMajorThreshold
annotations:
description: 'Ingress traffic Rate is above major threshold i.e. 90 requests per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 90 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 90 < 95
labels:
severity: Major
- alert: OcnssfTrafficRateAboveCriticalThreshold
annotations:
description: 'Ingress traffic Rate is above critical threshold i.e. 95 requests per second (current value is: {{ $value }})'
summary: 'Traffic Rate is above 95 Percent of Max requests per second(1000)'
expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 95
labels:
severity: Critical
- alert: OcnssfTransactionErrorRateAbove1Percent
annotations:
description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 1 Percent of Total Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 1 < 10
labels:
severity: Warning
- alert: OcnssfTransactionErrorRateAbove10Percent
annotations:
description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 10 Percent of Total Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 10 < 25
labels:
severity: Minor
- alert: OcnssfTransactionErrorRateAbove25Percent
annotations:
description: 'Transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 25 Percent of Total Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 25 < 50
labels:
severity: Major
- alert: OcnssfTransactionErrorRateAbove50Percent
annotations:
description: 'Transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})'
summary: 'Transaction Error Rate detected above 50 Percent of Total Transactions'
expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 50
labels:
severity: Critical
- alert: ocnssfPolicyNotFoundWarn
annotations:
description: 'Policy Not Found Rate is above warning threshold i.e. 700 mps (current value is: {{ $value }})'
summary: 'Policy Not Found Rate is above 70 Percent'
expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 700100 < 850150
labels:
severity: Warning
- alert: ocnssfPolicyNotFoundMaj
annotations:
description: 'Policy Not Found Rate is above major threshold i.e. 850 mps (current value is: {{ $value }})'
summary: 'Policy Not Found Rate is above 85 Percent'
expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 850150 < 950200
labels:
severity: Major
- alert: ocnssfPolicyNotFoundCrit
annotations:
description: 'Policy Not Found Rate is above critical threshold i.e. 950 mps (current value is: {{ $value }})'
summary: 'Policy Not Found Rate is above 95 Percent'
expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 950200
labels:
severity: Critical
- alert: ocnssfNrfDiscFailedWarn
annotations:
description: 'Rate of failed NRF discovery attempts is above warning threshold i.e. 500 mps (current value is {{ $value }})'
summary: 'Failed NRF discovery Rate attempts is above 10 Percent'
expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 100 < 300
labels:
severity: Warning
- alert: ocnssfNrfDiscFailedMaj
annotations:
description: 'Rate of failed NRF discovery attempts is above major threshold i.e. 700 mps (current value is {{ $value }})'
summary: 'Failed NRF discovery Rate attempts is above 30 Percent'
expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 300 < 500
labels:
severity: Major
- alert: ocnssfNrfDiscFailedCrit
annotations:
description: 'Rate of failed NRF discovery attempts is above critical threshold i.e. 900 mps (current value is {{ $value }})'
summary: 'Failed NRF discovery Rate attempts is above 50 Percent'
expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 500
labels:
severity: Critical