Metrics, KPIs and Alerts

SEPP Metrics

This section provides information about the SEPP metrics.

Microservice Name : Consumer N32c

SL.No.	Prometheus Stat Metrics Name	Metrics Description(including pegging condition)	Dimensions
1.	sepp_n32c_handshake_failure	If N32c Handshake Procedure fails, this metrics will be pegged and corresponding alarm will be raised.	app(consumer, producer) peer_domain peer_fqdn n32c_procedure vendor nfInstanceId kubernetes_namespace
2.	sepp_cn32c_handshake_requests_total	Total number of requests send over n32c for handshake procedure. Condition:: When sepp initiates any handshake procedure requests towards peer sepp.	peer_domain peer_fqdn n32c_procedure vendor nfInstanceId kubernetes_namespace direction=egress
3.	sepp_cn32c_handshake_response_total	Total number of responses received over n32c for handshake procedure. Condition:: When sepp receives any handshake procedure response from peer sepp. It can be successful or failure based on responseCode	peer_domain peer_fqdn n32c_procedure responseCode(2xx,4xx,5xx) vendor nfInstanceId kubernetes_namespace direction=ingress
4.	sepp_cn32c_handshake_initiation_req_total	Total number of Handshake Initiation requests received from config-mgr. Condition:: When handshake initiation requests received from config-mgr.	peer_domain peer_fqdn vendor nfInstanceId kubernetes_namespace
5.	sepp_cn32c_handshake_delete_req_total	Total number of Handshake context delete requests received from config-mgr. Condition:: When handshake context delete requests received from config-mgr.	peer-domain peer_fqdn vendor nfInstanceId kubernetes_namespace

Microservice Name : Producer N32c

S.No.	Prometheus Stat Metrics Name	Metrics Description(including pegging condition)	Dimensions
1.	sepp_pn32c_handshake_requests_total	Total number of requests received over n32c for handshake procedure. Condition:: When any handshake procedure request received from peer sepp.	peer_domain peer_fqdn n32c_procedure vendor nfInstanceId kubernetes_namespace direction=ingress
2.	sepp_pn32c_handshake_response_total	Total number of responses sent over n32c for handshake procedure. Condition:: When sepp sends response to handshake procedure received. It can be a success response or failure response based on success code.	peer_domain peer_fqdn n32c_procedure statusCode(2xx,4xx,5xx) vendor nfInstanceId kubernetes_namespace direction=egress

S.No.

Prometheus Stat Metrics Name

Metrics Description(including pegging condition)

Dimensions

1.

sepp_pn32c_handshake_requests_total

Total number of requests received over n32c for handshake procedure.

Condition:: When any handshake procedure request received from peer sepp.

peer_domain
peer_fqdn
n32c_procedure
vendor
nfInstanceId
kubernetes_namespace
direction=ingress

2.

sepp_pn32c_handshake_response_total

Total number of responses sent over n32c for handshake procedure.

Condition:: When sepp sends response to handshake procedure received. It can be a success response or failure response based on success code.

peer_domain
peer_fqdn
n32c_procedure
statusCode(2xx,4xx,5xx)
vendor
nfInstanceId
kubernetes_namespace
direction=egress

Microservice Name : Consumer N32f

S.No.	Prometheus Stat Metrics Name	Metrics Description(including pegging condition)	Dimensions
1.	sepp_cn32f_requests_failure	Total number of requests failed to be sent from cn32f to remote SEPP. Condition:: When any error or exception occurs on cn32f side because of which request is not sent to pn32f.	peer_domain peer_id peer_fqdn vendor nfInstanceId kubernetes_namespace statuscode error_msg
2.	sepp_cn32f_response_failure	Total number of response failed to be sent from cn32f pod to NF. Condition:: When any error or exception occurs on cn32f side because of which request is not sent to NF.	peer_domain peer_id peer_fqdn vendor nfInstanceId kubernetes_namespace statuscode error_msg
3.	sepp_cn32f_requests	Total number of requests received from NF. Condition:: When a request is received on InboundInterface of cn32f.	peer_domain peer_id peer_fqdn vendor nfInstanceId kubernetes_namespace
4.	sepp_cn32f_response	Total number of response received from remote SEPP. Condition:: When a response is received on OutboundInterface of cn32f.	peer_domain peer_id peer_fqdn vendor nfInstanceId kubernetes_namespace

Microservice Name: pN32f

S.No	Prometheus Stat Metric Name	Metric Description(including pegging condition)	Dimensions
1	sepp_pn32f_requests_rx	Number of requests received from Peer Sepp Condition: When a request reaches pn32f from peer Sepp.	vendor nfInstanceId kubernetes_namespace kubernetes_podname peer_fqdn
2	sepp_pn32f_requests_tx	Number of requests transmitted to NF.Condition: When a request transmits a message to NF	vendor nfInstanceId kubernetes_namespace kubernetes_podname peer_fqdn
3	sepp_pn32f_responses_rx	Number of responses received from EGW Condition: When a response reaches pn32f from EGW.	vendor nfInstanceId kubernetes_namespace kubernetes_podname statusCode(2xx,4xx,5xx) error-reason(in case of EGW failure) peer_fqdn
4	sepp_pn32f_responses_tx	Number of responses transmitted to cSepp.Condition: When a responses transmits a message to cSepp	vendor nfInstanceId kubernetes_namespace kubernetes_podname statusCode(2xx,4xx,5xx) peer_fqdn

Note:

The templates for dashboard creations not shared in the deliverables. So dashboard needs to be created manually based on needs

SEPP KPIs

This section provides information about the SEPP KPIs.

KPI Details	SEPP Microservice	Metrics Used	Service Operation	Response Code
cn32c Handshake Success rate	cn32c	(sum(ocsepp_cn32c_handshake_response_total)/sum(ocsepp_cn32c_handshake_requests_total))*100	n32c Handshake	200
cn32f Routing Success Rate	cn32f	(sum(ocsepp_cn32f_response_total)/sum(ocsepp_cn32f_requests_total))*100	n32f message forward	All
cn32f Requests Rate Per Remote SEPP	cn32f	sum(irate(ocsepp_cn32f_requests_total[2m]))by(PEER_DOMAIN, PEER_FQDN, PLMN_ID)	n32f message forward	All
cn32f Response Rate Per Remote SEPP	cn32f	sum(irate(ocsepp_cn32f_response_total[2m])) by(PEER_DOMAIN, PEER_FQDN, PLMN_ID)	n32f message forward	All
cn32c Handshake Failure Per Remote SEPP	cn32c	sum(ocsepp_n32c_handshake_failure)by(peer_domain, peer_fqdn, peer_plmn_id)	n32c Handshake	4xx and 5xx
pn32c Handshake Success rate	pn32c	(sum(ocsepp_pn32c_handshake_response_total)/sum(ocsepp_pn32c_handshake_requests_total))*100	n32c Handshake	200
pn32f Routing Success Rate	pn32f	(sum(ocsepp_pn32f_response_total)/sum(ocsepp_pn32f_requests_total))*100	n32f message forward	All
pn32f Requests Rate	pn32f	sum(irate(ocsepp_pn32f_requests_rx_total[2m]))	n32f message forward	All
pn32f Response Rate	pn32f	sum(irate(ocsepp_pn32f_response_rx_total[2m]))	n32f message forward	All
pn32c Handshake Failure Per Remote SEPP	pn32c	sum(ocsepp_n32c_handshake_failure)by(peer_domain, peer_fqdn, peer_plmn_id)	n32c Handshake	4xx and 5xx

SEPP Alerts

This section provides information about the SEPP alerts and their configuration.

Table 3-1 SEPP Alerts

SL.no.	Alert	Severity	oid	Dimensions	Description	Remarks
1	SEPPCn32cHandshakeFailureAlert	Major	1.3.6.1.4.1.323.5.3.46.1.2.2001	reason	Handshake procedure has failed on Consumer sepp	SEPP-1.4
2	SEPPPn32cHandshakeFailureAlert	Major	1.3.6.1.4.1.323.5.3.46.1.2.3001	reason	Handshake procedure has failed on Producer sepp	SEPP-1.4
3	SEPPN32fRoutingFailure	warning	1.3.6.1.4.1.323.5.3.46.1.2.4001	error_msg	N32f service not able to forward message	SEPP-1.4
4	SEPPPodMemoryUsageAlert	warning	1.3.6.1.4.1.323.5.3.46.1.2.4003		Pod memory usage is above threshold ( 70% )	SEPP-1.4
5	SEPPPodCpuUsageAlert	warning	1.3.6.1.4.1.323.5.3.46.1.2.4002		Pod CPU usage is above threshold ( 70% )	SEPP-1.4

SEPP Alert Configuration

This section describes the Measurement based Alert rules configuration for SEPP. The Alert Manager uses the Prometheus measurements values as reported by microservices in conditions under alert rules to trigger alerts.

Steps for SEPP Alert Configuration in Prometheus

_Namespace_ - Prometheus NameSpace

Find the config map to configure alerts in prometheus server by executing the following command:
```
kubectl get configmap -n <Namespace>
```
where, <Namespace> is the prometheus server namespace used in helm install command.
Take Backup of current config map of prometheus server by executing the following command:
```
kubectl get configmaps _NAME_-server -o yaml -n _Namespace_ > /tmp/tempConfig.yaml
```
where, <Namespace> is the prometheus server namespace used in helm install command.
For Example, assuming chart name is "prometheus-alert", so "_NAME_-server" becomes "prometheus-alert-server", execute the following command to find the config map:
```
kubectl get configmaps prometheus-alert-server -o yaml -n prometheus-alert2 > /tmp/tempConfig.yaml
```
Check if alertssepp is present in the t_mapConfig.yaml file by executing the following command:
```
cat /tmp/t_mapConfig.yaml  | grep alertssepp
```
If alertssepp is present, delete the alertssepp entry from the t_mapConfig.yaml file, by executing the following command:
```
sed -i '/etc\/config\/alertssepp/d' /tmp/t_mapConfig.yaml
```
If alertssepp is not present, add the alertssepp entry in the t_mapConfig.yaml file by executing the following command:
```
sed -i '/rule_files:/a\    \- /etc/config/alertssepp'  /tmp/t_mapConfig.yaml
```
Reload the config map with the modifed file by executing the following command:
```
kubectl replace configmap <Name> -f /tmp/t_mapConfig.yaml
```
Add seppAlertRules.yaml file into prometheus config map under filename of SEPP alert file by executing the following command :
```
kubectl patch configmap <Name> -n <Namespace> --type merge --patch
"$(cat <PATH>/seppAlertRules.yaml)"
```
Restart prometheus-server pod.
Verify the alerts in prometheus GUI.

Note:

Prometheus server takes updated configmap reloaded after sometime automatically (~20 sec)