NSSF Alert Configuration

Follow the steps below for NSSF Alert configuration in Prometheus:

Note:

  1. By default Namespace for OCNSSF is ocnssf that must be updated as per the deployment.
  2. The OCNSSF-config-1.4.0.0.0.zip file can be downloaded from OHC. Unzip the OCNSSF-config-1.4.0.0.0.zip package after downloading to get NssfAlertrules-1.4.0.yamlfile.

Procedure

  1. Take a backup of current configuration map of Prometheus:
    kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/ tempConfig.yaml
  2. Check and add OCNSSF Alert file name inside Prometheus configuration map:
    
    sed -i '/etc\/config\/alertsnssf/d' /tmp/tempConfig.yaml 
    sed -i '/rule_files:/a\ \- /etc/config/alertsnssf' /tmp/tempConfig.yaml
  3. Update configuration map with updated file name of OCNSSF alert file:
    kubectl replace configmap _NAME_-server -f /tmp/tempConfig.yaml
  4. Add OCNSSF Alert rules in configuration map under file name of OCNSSF alert file:
    kubectl patch configmap _NAME_-server -n _Namespace_--type merge --patch "$(cat ~/NssfAlertrules.yaml)"

Note:

The Prometheus server takes an updated configuration map that is automatically reloaded after approximately 20 seconds. Refresh the Prometheus GUI to confirm that the OCNSSF Alerts have been reloaded.

OCNSSF Alert Config Details

Note:

By default the NameSpace is set to ocnssf. Must update it according to the requirement.

Sample

apiVersion: v1
data:
  alertsnssf: |
    groups:
    - name: OcnssfAlerts
      rules:
      - alert: OcnssfTrafficRateAboveMinorThreshold
        annotations:
          description: 'Ingress traffic Rate is above minor threshold i.e. 80 requests per second (current value is: {{ $value }})'
          summary: 'Traffic Rate is above 80 Percent of Max requests per second(1000)'
        expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 80 < 90
        labels:
          severity: Minor
      - alert: OcnssfTrafficRateAboveMajorThreshold
        annotations:
          description: 'Ingress traffic Rate is above major threshold i.e. 90 requests per second (current value is: {{ $value }})'
          summary: 'Traffic Rate is above 90 Percent of Max requests per second(1000)'
        expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 90 < 95
        labels:
          severity: Major
      - alert: OcnssfTrafficRateAboveCriticalThreshold
        annotations:
          description: 'Ingress traffic Rate is above critical threshold i.e. 95 requests per second (current value is: {{ $value }})'
          summary: 'Traffic Rate is above 95 Percent of Max requests per second(1000)'
        expr: sum(rate(oc_ingressgateway_http_requests_total{kubernetes_namespace="ocnssf"}[2m])) >= 95
        labels:
          severity: Critical
      - alert: OcnssfTransactionErrorRateAbove1Percent
        annotations:
          description: 'Transaction Error rate is above 1 Percent of Total Transactions (current value is {{ $value }})'
          summary: 'Transaction Error Rate detected above 1 Percent of Total Transactions'
        expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 1 < 10
        labels:
          severity: Warning
      - alert: OcnssfTransactionErrorRateAbove10Percent
        annotations:
          description: 'Transaction Error rate is above 10 Percent of Total Transactions (current value is {{ $value }})'
          summary: 'Transaction Error Rate detected above 10 Percent of Total Transactions'
        expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 10 < 25
        labels:
          severity: Minor
      - alert: OcnssfTransactionErrorRateAbove25Percent
        annotations:
          description: 'Transaction Error Rate detected above 25 Percent of Total Transactions (current value is {{ $value }})'
          summary: 'Transaction Error Rate detected above 25 Percent of Total Transactions'
        expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 25 < 50
        labels:
          severity: Major
      - alert: OcnssfTransactionErrorRateAbove50Percent
        annotations:
          description: 'Transaction Error Rate detected above 50 Percent of Total Transactions (current value is {{ $value }})'
          summary: 'Transaction Error Rate detected above 50 Percent of Total Transactions'
        expr: (sum(rate(oc_ingressgateway_http_responses_total{Status!~"2.*",kubernetes_namespace="ocnssf"}[2m]) or (up * 0 ) ) )/sum(rate(oc_ingressgateway_http_responses_total{kubernetes_namespace="ocnssf"}[2m])) * 100 >= 50
        labels:
          severity: Critical
      - alert: ocnssfPolicyNotFoundWarn
        annotations:
          description: 'Policy Not Found Rate is above warning threshold i.e. 700 mps (current value is: {{ $value }})'
          summary: 'Policy Not Found Rate is above 70 Percent'
        expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 700100 < 850150
        labels:
          severity: Warning
      - alert: ocnssfPolicyNotFoundMaj
        annotations:
          description: 'Policy Not Found Rate is above major threshold i.e. 850 mps (current value is: {{ $value }})'
          summary: 'Policy Not Found Rate is above 85 Percent'
        expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 850150 < 950200
        labels:
          severity: Major
      - alert: ocnssfPolicyNotFoundCrit
        annotations:
          description: 'Policy Not Found Rate is above critical threshold i.e. 950 mps (current value is: {{ $value }})'
          summary: 'Policy Not Found Rate is above 95 Percent'
        expr: sum(rate(ocnssf_nsselection_policy_not_found_total[2m])) >= 950200
        labels:
          severity: Critical
      - alert: ocnssfNrfDiscFailedWarn
        annotations:
          description: 'Rate of failed NRF discovery attempts is above warning threshold i.e. 500 mps (current value is {{ $value }})'
          summary: 'Failed NRF discovery Rate attempts is above 10 Percent'
        expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 100 < 300
        labels:
          severity: Warning
      - alert: ocnssfNrfDiscFailedMaj
        annotations:
          description: 'Rate of failed NRF discovery attempts is above major threshold i.e. 700 mps (current value is {{ $value }})'
          summary: 'Failed NRF discovery Rate attempts is above 30 Percent'
        expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 300 < 500
        labels:
          severity: Major
      - alert: ocnssfNrfDiscFailedCrit
        annotations:
          description: 'Rate of failed NRF discovery attempts is above critical threshold i.e. 900 mps (current value is {{ $value }})'
          summary: 'Failed NRF discovery Rate attempts is above 50 Percent'
        expr: sum(rate(ocnssf_nsselection_nrf_disc_failure_total[2m])) >= 500
        labels:
          severity: Critical