5 Alerts, Metrics and Traces
This section provides the information for Alerts, Metrics and Traces.
Alerts
This section provides information about configuring alerts and supported alerts.
Configuring Alerts
You can configure Alerts in Prometheus and SCPAlertrules.yaml file.
The following table provides information for Alerts for Service Communication Proxy.
Table 5-1 Alert Reference
Alert Name | Severity | Condition | OID used for SNMP Traps | Description |
---|---|---|---|---|
SCPIngressTrafficRateAboveMinorThreshold | minor |
sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name)>= 1200 < 1400 |
1.3.6.1.4.1.323.5.3.35.1.2.7001 | Notify that Traffic rate is above 1200mps (user configure minor threshold value) with Locality and current value of traffic rate. |
SCPIngressTrafficRateAboveMajorThreshold | major |
sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m]))by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1400 < 1600 |
1.3.6.1.4.1.323.5.3.35.1.2.7001 | Notify that Traffic rate is above 1400mps (user configure major threshold value) with Locality and current value of traffic rate. |
SCPIngressTrafficRateAboveCriticalThreshold | Critical |
sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m]))by(kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1600 |
1.3.6.1.4.1.323.5.3.35.1.2.7001 | Notify that Traffic rate is above 1600mps (user configure critical threshold value) with Locality and current value of traffic rate. |
SCPRoutingFailedForService | Info |
idelta(ocscp_metric_total_routing_send_fail{app_kubernetes_io_name="scp-worker",ocscp_nf_service_type!="Scpc-pilot"}[5m]) > 0 |
1.3.6.1.4.1.323.5.3.35.1.2.7005 | Notify that Routing failed for service. Provides detail like NFService Type, NFType, Locality, and value. |
SCPSoothsayerNotificationPodMemoryUsage | major |
sum(container_memory_usage_bytes{image!="",pod_name=~".*scpc-notification.+"}) by (pod_name,namespace, instance) > 3006477107 |
1.3.6.1.4.1.323.5.3.35.1.2.3001 |
Notify Notification service Pod memory usage if it is above threshold Threshold value is 70% of allocated (4GB) memory: 2.8 GB |
SCPWorkerPodMemoryUsage | major |
sum(container_memory_usage_bytes{image!="",pod_name=~".*scp-worker.+"}) by (pod_name,namespace, instance) > 6012954214 |
1.3.6.1.4.1.323.5.3.35.1.2.7004 |
Notify Worker per Pod memory usage is above threshold Threshold value is 70% of allocated (8GB) memory: 5.6 GB |
SCPPilotPodMemoryUsage | major |
sum(container_memory_usage_bytes{image!="",pod_name=~".*scpc-pilot.+"}) by (pod_name,namespace, instance) > 4509715660 |
1.3.6.1.4.1.323.5.3.35.1.2.5001 |
Notify Pilot per Pod memory usage is above threshold Threshold value is 70% of allocated (6GB) memory: 4.2 GB |
SCPIngressGatewayPodMemoryUsage | major | sum(container_memory_usage_bytes{image!="",pod_name=~".*ingress-gateway.+"}) by (pod_name,namespace, instance) > 2147483648 | 1.3.6.1.4.1.323.5.3.35.1.2.7010 |
Notify Ingress Gateway per Pod memory usage is above threshold Threshold value is 50% of allocated (4GB) memory: 2 GB |
SCPInstanceDown | Critical | kube_pod_status_ready{pod =~ '.*scp.*|.*ingress-gateway.*',condition=~ 'true'} !=1 | 1.3.6.1.4.1.323.5.3.35.1.2.7006 | Notify that if any pod in ocscp release is down. Provides information like pod name, instance id and app name. |
SCPSoothsayerAuditErrorResponse | Info | scp_soothsayer_audit_error_response > 0 | 1.3.6.1.4.1.323.5.3.35.1.2.4001 |
Alert is generated when Audit module receives a 3xx,4xx or 5xx error from NRF. Alert is labeled with specific nftype, servingscope and auditmethod. Alert is cleared on next Audit cycle. |
SCPSoothsayerAuditEmptyNFArrayResponse | Critical | scp_soothsayer_audit_2xx_empty_nf_array > 0 | 1.3.6.1.4.1.323.5.3.35.1.2.4002 |
Alert is generated when Audit module receives a 2xx response with empty NFInstance array from NRF. Alert is labeled with specific nftype, servingscope and auditmethod. Alert is cleared if Audit receives a success response with non-empty NFInstance array or on next audit cycle if topology source is changed to LOCAL. |
MediationServiceNotAvailable | Critical |
idelta(ocscp_metric_total_app_res{app_kubernetes_io_name="scp-worker",ocscp_app_name="nmediation-http",ocscp_response_code="503"}[2m]) > 0 |
1.3.6.1.4.1.323.5.3.35.1.2.7008 | This alert will be generated when SCP receives a '503' response from mediation service. |
MediationProcessingFailure | Info |
idelta(ocscp_metric_total_mediation_processing_failures{app_kubernetes_io_name="scp-worker"}[2m]) > 0 |
1.3.6.1.4.1.323.5.3.35.1.2.7009 | This alert will be generated when SCP receives a response from mediation service indicating some kind of failure in rules processing at mediation service. For example, JSON parsing error, error in rules application. |
NFInstanceConnectionDown | info |
sum(hosts_cx_active{cluster_name!~".*pilot.*|.*jaeger.*|.*prometheus.*|.*elasticsearch.*|.*tiller.*|.*grafana.*|.*snmp-notifier.*|.*kibana.*|.*scpc-.*|.*metric-server.*|.*coredns.*|.*mysql.*|.*scp-worker.*|.*ingress-gateway.*"}) by (endpoint, cluster_name) == 0 |
1.3.6.1.4.1.323.5.3.35.1.2.7007 | Alert will be generated when any upstream connection goes down. |
Note:
All metrics operated on namespace of the Service Communication Proxy are deployed. You must configure scp namespace when configuring SCPAlertRule.yaml.Configuring Service Communication Proxy Alert in Prometheus
SCP Helm Chart Release Name: _NAME_
Prometheus NameSpace: _Namespace _
To configure Service Communication Proxy Alert in Prometheus follow the procedure mentioned in Table 5-2.
Table 5-2 Configuring Service Communication Proxy Alert in Prometheus
Step No. | Procedure | Description |
---|---|---|
1. | Check the name of the config map | To check the name of the config map used by
Prometheus use below command:
|
2. | Take backup of current config map | Select the map name appended with " -server ".
In above example its " lisa-prometheus-alert2-server " and take its backup
using below command:
|
3. | Check and delete "alertsscp" rule | Check and delete "alertsscp" rule if its already
configured in the prometheus config map. If configured this step will delete
the " alertsscp " rule. This is optional step if doing for the first time.
|
4. | Add the "alertsscp" rule | Add the "alertsscp" rule in the configmap dump
file under ' rule_files ' tag.
|
5. | Update the configmap | Update the configmap using below command. Ensure
to use same configmap name which was used to take backup of prometheus
configmap in step 2.
|
6. | Add scpAlertrules in configmap | Patch the configmap with new "alertsscp" rule
using below command. Kindly note the patch file provided is the custom template
file provided with SCP (i.e SCPAlertrules.yaml).
|
Note:
Prometheus takes nearly 20 seconds to apply the updated Config map.Configuring Service Communication Proxy Alert using SCPAlertrules.yaml file
Note:
Default NameSpace is scpsvc for Service Communication Proxy. You can update the namespace as per the deployment.Following is a sample yaml file.
apiVersion: v1
data:
alertsscp: |
groups:
- name: SCPAlerts
rules:
#Alerts for SCP Ingress Traffic Rate, it uses namespace of spc deployed
- alert: SCPIngressTrafficRateAboveMinorThreshold
annotations:
description: 'Ingress Traffic Rate at locality: "{{$labels.ocscp_locality}}" is above minor threshold (i.e. 1200 mps)'
summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 70 Percent of Max MPS(1700)'
# Provide app and kubernetes_namespace of scp deployed
expr: sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1200 < 1400
for: 1m
labels:
severity: minor
alertname: "SCPIngressTrafficRateAboveMinorThreshold"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7001"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
- alert: SCPIngressTrafficRateAboveMajorThreshold
annotations:
timestamp: ' {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} '
description: 'Ingress Traffic Rate at locality: {{$labels.ocscp_locality}} is above major threshold (i.e. 1400 mps)'
summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 80 Percent of Max MPS(1700)'
# Provide app and kubernetes_namespace of scp deployed
expr: sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1400 < 1600
for: 1m
labels:
severity: major
alertname: "SCPIngressTrafficRateAboveMajorThreshold"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7001"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
- alert: SCPIngressTrafficRateAbovecriticalThreshold
annotations:
description: 'Ingress Traffic Rate at locality: {{$labels.ocscp_locality}} is above critical threshold (i.e. 1600 mps)'
summary: 'namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Current Ingress Traffic Rate is {{ $value | printf "%.2f" }} mps which is above 95 Percent of Max MPS(1700)'
# Provide app and kubernetes_namespace of scp deployed
expr: sum(rate(ocscp_metric_total_http_rx_req{app_kubernetes_io_name="scp-worker"}[2m])) by (kubernetes_namespace,ocscp_locality,kubernetes_pod_name) >= 1600
for: 1m
labels:
severity: critical
alertname: "SCPIngressTrafficRateAbovecriticalThreshold"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7001"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
- alert: SCPRoutingFailedForService
annotations:
description: 'Routing failed for nfservicetype: {{$labels.ocscp_nf_service_type}}'
summary: 'Routing failed for service: nfservicetype = {{$labels.ocscp_nf_service_type}}, nfserviceinstanceid = {{$labels.ocscp_service_instance_id}}, nftype = {{$labels.ocscp_nf_type}}, nfinstanceid = {{$labels.ocscp_nf_instance_id}}, locality = {{$labels.ocscp_locality}}, nfendpoint = {{$labels.ocscp_nf_end_point}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
# Provide app and kubernetes_namespace of scp deployed
expr: idelta(ocscp_metric_total_routing_send_fail{app_kubernetes_io_name="scp-worker",ocscp_nf_service_type!="Scpc-pilot"}[5m]) > 0
labels:
severity: info
alertname: "SCPRoutingFailedForService"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7005"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
nfservicetype: ' {{$labels.ocscp_nf_service_type}} '
nfendpoint: ' {{$labels.ocscp_nf_end_point}} '
nfserviceinstanceid: ' {{$labels.ocscp_service_instance_id}} '
nfinstanceid: ' {{$labels.ocscp_nf_instance_id}} '
vendor: oracle
- alert: SCPSoothsayerNotificationPodMemoryUsage
# Provide kubernetes_namespace of scp deployed and pod name substring as its regex match of pod name
expr: sum(container_memory_usage_bytes{image!="",pod_name=~"scpc-notification.+"}) by (pod_name,namespace, instance) > 3006477107
for: 2m
labels:
severity: major
alertname: "SCPSoothsayerNotificationPodMemoryUsage"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.3001"
namespace: ' {{ $labels.namespace }} '
podname: ' {{$labels.pod_name}} '
vendor: oracle
annotations:
description: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}: Soothsayer Notification Pod High Memory usage detected'
summary: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory usage is above 70% (current value is: {{ $value }})'
- alert: SCPWorkerPodMemoryUsage
# Provide kubernetes_namespace of scp deployed and pod name substring as its regex match of pod name
expr: sum(container_memory_usage_bytes{image!="",pod_name=~"scp-worker.+"}) by (pod_name,namespace, instance) > 6012954214
for: 2m
labels:
severity: major
alertname: "SCPWorkerPodMemoryUsage"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7004"
namespace: ' {{ $labels.namespace }} '
podname: ' {{$labels.pod_name}} '
vendor: oracle
annotations:
description: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}: Worker Pod High Memory usage detected'
summary: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory usage is above 70% (current value is: {{ $value }})'
- alert: SCPPilotPodMemoryUsage
# Provide kubernetes_namespace of scp deployed and pod name substring as its regex match of pod name
expr: sum(container_memory_usage_bytes{image!="",pod_name=~"scp-pilot.+"}) by (pod_name,namespace, instance) > 4509715660
for: 5m
labels:
severity: major
alertname: "SCPPilotPodMemoryUsage"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.5001"
namespace: ' {{ $labels.namespace }} '
podname: ' {{$labels.pod_name}} '
vendor: oracle
annotations:
description: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}: Pilot Pod High Memory usage detected'
summary: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory usage is above 70% (current value is: {{ $value }})'
- alert: SCPIngressGatewayPodMemoryUsage
# Provide kubernetes_namespace of scp deployed and pod name substring as its regex match of pod name
expr: sum(container_memory_usage_bytes{image!="",pod_name=~".*ingress-gateway.+"}) by (pod_name,namespace, instance) > 2147483648
for: 5m
labels:
severity: major
alertname: "SCPIngressGatewayPodMemoryUsage"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7010"
namespace: ' {{ $labels.namespace }} '
podname: ' {{$labels.pod_name}} '
vendor: oracle
annotations:
description: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}: SCP Ingress Gateway Pod High Memory usage detected'
summary: 'instancename: {{$labels.instance}}, namespace: {{$labels.namespace}}, podname: {{$labels.pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}: Memory usage is above 50% (current value is: {{ $value }})'
- alert: SCPInstanceDown
expr: kube_pod_status_ready{pod =~ '.*scp.*|.*ingress-gateway.*',condition =~ 'true'} !=1
for: 2m
labels:
severity: critical
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7006"
namespace: '{{ $labels.kubernetes_namespace }} '
podname: ' {{ $labels.kubernetes_pod_name }} '
vendor: oracle
annotations:
description: 'Pod with podname: {{$labels.kubernetes_pod_name}}, instancename: {{$labels.instance}}, appname: {{$labels.app}} has been down for more than 2 minutes'
summary: 'Pod with podname: {{$labels.kubernetes_pod_name}}, instancename: {{$labels.instance}}, appname: {{$labels.app}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} is Down '
- alert: NFInstanceConnectionDown
expr: sum(hosts_cx_active{cluster_name!~".*pilot.*|.*jaeger.*|.*prometheus.*|.*elasticsearch.*|.*tiller.*|.*grafana.*|.*snmp-notifier.*|.*kibana.*|.*scpc-.*|.*metric-server.*|.*coredns.*|.*mysql.*|.*scp-worker.*|.*ingress-gateway.*"}) by (endpoint, cluster_name) == 0
labels:
severity: info
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7007"
namespace: '{{ $labels.kubernetes_namespace }} '
podname: ' {{ $labels.kubernetes_pod_name }} '
nfendpoint: ' {{ $labels.endpoint }} '
nfclustername: ' {{ $labels.cluster_name }} '
vendor: oracle
annotations:
description: 'Connection down with IP Endpoint: {{$labels.endpoint}}, clustername: {{$labels.cluster_name}}'
summary: 'Connection down with IP Endpoint: {{$labels.endpoint}}, clustername: {{$labels.cluster_name}}'
- alert: SCPSoothsayerAuditErrorResponse
expr: scp_soothsayer_audit_error_response > 0
labels:
severity: info
alertname: "SCPSoothsayerAuditErrorResponse"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.4001"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
annotations:
description: 'SCP Audit received Error response for nftype {{$labels.nftype}}'
summary: 'SCP Audit received Error response for nftype {{$labels.nftype}}, servingscope: {{$labels.servingscope}}, auditmethod: {{$labels.auditmethod}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
- alert: SCPSoothsayerAuditEmptyNFArrayResponse
expr: scp_soothsayer_audit_2xx_empty_nf_array > 0
labels:
severity: critical
alertname: "SCPSoothsayerAuditEmptyNFArrayResponse"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.4002"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
annotations:
description: 'SCP Audit received Empty NF Array Response for nftype {{$labels.nftype}}'
summary: 'SCP Audit received Empty NF Array Response for nftype {{$labels.nftype}}, servingscope: {{$labels.servingscope}}, auditmethod: {{$labels.auditmethod}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }}'
- alert: MediationServiceNotAvailable
annotations:
description: 'Mediation service not available'
summary: 'Mediation Service is not available: triggerpoint = {{$labels.ocscp_trigger_name}}, nfservicetype = {{$labels.ocscp_nf_service_type}}, nfserviceinstanceid = {{$labels.ocscp_service_instance_id}}, nftype = {{$labels.ocscp_nf_type}}, nfinstanceid = {{$labels.ocscp_nf_instance_id}}, locality = {{$labels.ocscp_locality}}, nfendpoint = {{$labels.ocscp_nf_end_point}}, namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
# Provide app and kubernetes_namespace of scp deployed
expr: idelta(ocscp_metric_total_app_res{app_kubernetes_io_name="scp-worker",ocscp_app_name="nmediation-http",ocscp_response_code="503"}[2m]) > 0
labels:
severity: critical
alertname: "SCPMediationServiceNotAvailable"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7008"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
nfservicetype: ' {{$labels.ocscp_nf_service_type}} '
nfendpoint: ' {{$labels.ocscp_nf_end_point}} '
nfserviceinstanceid: ' {{$labels.ocscp_service_instance_id}} '
nfinstanceid: ' {{$labels.ocscp_nf_instance_id}} '
triggerpoint: ' {{$labels.ocscp_trigger_name}} '
vendor: oracle
- alert: MediationProcessingFailure
annotations:
description: 'Processing failure at mediation service'
summary: 'Mediation service failed to respond properly. namespace: {{$labels.kubernetes_namespace}}, podname: {{$labels.kubernetes_pod_name}}, timestamp: {{ with query "time()" }}{{ . | first | value | humanizeTimestamp }}{{ end }} and value = {{ $value }} '
# Provide app and kubernetes_namespace of scp deployed
expr: idelta(ocscp_metric_total_mediation_processing_failures{app_kubernetes_io_name="scp-worker"}[2m]) > 0
labels:
severity: info
alertname: "SCPMediationProcessingFailure"
oid: "1.3.6.1.4.1.323.5.3.35.1.2.7009"
namespace: ' {{ $labels.kubernetes_namespace }} '
podname: ' {{$labels.kubernetes_pod_name}} '
vendor: oracle
Alerts Details
Description and Summary are added by Prometheus alert manager.
- SCPIngress Traffic Rate
Above Threshold
- Has three threshold level Minor (above 1400 mps to 2000mps), Major (1600 to 1800 mps), Critical (above 1800 mps). These values are configurable.
- In the description, information is presented similar to: "Ingress Traffic Rate at Locality: <Locality of scp> is above <threshold level (minor/major/critical> threshold (i.e. <value of threshold>)"
- In Summary:
"Namespace: <Namespace of scp deployment that Locality>, Pod:
<SCP-worker Pod name>: Current Ingress Traffic Rate is <Current rate
of Ingress traffic > mps which is above 70 Percent of Max MPS(<upper
limit of ingress traffic rate per pod>)"
Note:
Ingress traffic rate is per scp-worker pod in a namespace at particular SCP-Locality. Currently, 2000mps is the upper limit for per scp-worker pod.
- SCP Routing Failed For
Service
- It alerts for which NF Service Type and NF Type at particular locality, Routing failed
- Description:- "Routing failed for service"
- Summary: - "Routing
failed for service: NFService Type = <Message NF Service Type>, NFType =
<Message NF Type>, Locality = <SCP Locality where Routing Failed>
and value = <Accumulated failure till now, of such message for NFType and
NFService Type>"
Note:
The value field currently does not provide number of failures in particular time interval, instead it provides the total number of Routing failures.
- SCP Pod Memory Usage:-
Three type of alerts namely SCPSoothsayerPodMemoryUsage,
SCPWorkerPodMemoryUsage, SCPPilotPodMemoryUsage
- Pod memory usage for SCP Pods (Soothsayer, Worker and Pilot) deployed at a particular node instance is provided.
- The Soothsayer pod threshold is 8 GB
- The Worker pod threshold is 4 GB
- The Pilot pod threshold is 6GB
- Summary: Instance: "<Node Instance name>, NameSpace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker/Pilot) Pod name>: <Soothsayer/Worker/Pilot> Pod High Memory usage detected"
- Summary: "Instance: "<Node Instance name>, Namespace: <Namespace of SCP deployment>, Pod: <(Soothsayer/Worker/Pilot) Pod name>: Memory usage is above <threshold value>G (current value is: <current value of memory usage>)"
Configuring alert manager for SNMP notifier
Grouping of alerts is based on:
- podname
- alertname
- severity
- namespace
- nfServiceType
- nfServiceInstanceId
- servingscope
- nftype
User needs to add sub-routes for SCP alerts in AlertManager config map as below:
Table 5-3 Alert Manager configuration for SNMP Notifier
Step No# | Procedure | Description |
---|---|---|
1.
|
Take Backup of current config map of Alertmanager | Execute the following command:
Example:
|
2.
|
Edit Configmap to add subroute for SCP Trap OID | Execute the following command:
Example:
|
3.
|
Add the subroute under 'route' in configmap |
|
Metrics Reference
The following table provides information for metrics for Service Communication Proxy.
Table 5-4 Metrics Reference
Prometheus Stat Metric Name | Metric Description | Dimensions | Example usage to filter metric by dimension on Grafana GUI |
---|---|---|---|
ocscp_metric_nf_total_http_tx_req | Total number of HTTP requests forwarded by Service Communication Proxy to upstream cluster |
|
|
ocscp_metric_nf_total_http_rx_req | The total number of incoming HTTP requests |
|
|
ocscp_metric_nf_total_http_rx_res | Total number of HTTP response received by SCP with specific HTTP response codes (e.g., 201, 503, etc.) |
|
|
ocscp_metric_nf_total_http_rx_res_xx | Total number of HTTP response received by SCP with aggregated HTTP response codes (e.g., 2xx, 5xx, etc.) |
|
|
ocscp_metric_nf_total_http_tx_res | Total number of HTTP response forwarded by SCP with specific HTTP response codes (e.g., 201, 503, etc.) |
|
|
ocscp_metric_nf_total_http_tx_res_xx | Total number of HTTP response forwarded by SCP with aggregated HTTP response codes (e.g., 2xx, 5xx, etc.) |
|
|
ocscp_metric_total_http_rx_req | The total number of incoming HTTP requests |
|
|
ocscp_metric_total_http_tx_req | Total number of HTTP requests forwarded by Service Communication Proxy to upstream cluster |
|
|
ocscp_metric_total_http_rx_res | Total number of HTTP response received by SCP with specific HTTP response codes (e.g., 201, 503, etc.) |
|
|
ocscp_metric_total_http_rx_res_xx | Total number of HTTP response received by SCP with aggregated HTTP response codes (e.g., 2xx, 5xx, etc.) |
|
|
ocscp_metric_total_http_tx_res | Total number of HTTP response forwarded by SCP with specific HTTP response codes (e.g., 201, 503, etc.) |
|
|
ocscp_metric_total_http_tx_res_xx | Total number of HTTP response forwarded by SCP with aggregated HTTP response codes (e.g., 2xx, 5xx, etc.) |
|
|
ocscp_metric_total_http_rx_messages | Total incoming (rx ) messages to SCP. This includes requests and responses. |
|
|
ocscp_metric_total_http_tx_messages | Total outgoing (tx) messages from SCP. This includes requests and responses. |
|
|
ocscp_metric_total_attempts_to_forward_route | Total number of requests that matched the catch-all-route routing rule. |
|
|
ocscp_metric_total_upstream_send_fail | Total number of request that SCP failed to send to the upstream cluster |
|
|
ocscp_metric_total_routing_send_fail | Total number of request that SCP failed to route due to any reason |
|
|
ocscp_metric_request_processing_time | This metric captures the processing time by SCP for ingress requests into the time buckets(e.g., 1ms, 2ms, 4ms, 8ms and so on) |
|
|
ocscp_metric_response_processing_time | This metric captures the processing time by SCP for ingress responses into the time buckets(e.g., 1ms, 2ms, 4ms, 8ms and so on) |
|
|
ocscp_metric_request_per_try_timeout | This metric captures the number of incoming request whose per try timeout expired |
|
|
ocscp_metric_total_transaction_timeout | This metric captures the total number of request whose transaction timed out |
|
|
ocscp_metric_max_routing_attempts_exhausted | This metric captures the total number of requests whose maximum routing attempts expired during alternate routing |
|
|
ocscp_sds_dbmetrics_total | Total number of db operations (create, update, delete and
find)
Metrics Pegging Condition: This metrics is pegged whenever create, update, delete and find operation are triggered. |
|
AMF
CREATE SUCCESS
AMF
CREATE FAILURE
AMF UPDATE SUCCESS
AMF
UPDATE FAILURE
AMF FIND SUCCESS
AMF FIND FAILURE
AMF DELETE SUCCESS
AMF DELETE FAILURE
AMF UNKNOWN FAILURE
SMF CREATE SUCCESS
SMF CREATE FAILURE
SMF UPDATE SUCCESS
SMF UPDATE FAILURE
SMF DELETE SUCCESS
SMF DELETE FAILURE
SMF FIND SUCCESS
SMF FIND FAILURE
SMF UNKNOWN FAILURE
|
ocscp_sds_dbmetrics_latencies_seconds_bucket |
Calculates processing time of each database request and put those values in a seconds bucket. Seconds bucket configured with [0.1,0.2,0.4,0.8,1.0,2.0,4.0,8.0,10.0] values.
|
|
|
hosts_cx_total |
Total number of connections attempted to the host Metrics Pegging Condition:New connection attempted. |
|
hosts_cx_total{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_cx_active |
Total number of active connections to the host Metrics Pegging Condition:New connection established and active (no failure or disconnect). |
|
hosts_cx_active{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_cx_connect_fail |
Total number of connection attempts to the host which resulted in failure (local + remote failures) Metrics Pegging Condition:Connection attempt failed. |
|
hosts_cx_connect_fail{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_rq_total |
Total requests sent to the host Metrics Pegging Condition:Request forwarded to host. |
|
hosts_rq_total{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_rq_timeout |
Total timed out requests Metrics Pegging Condition:Request timed out (upon expiry of timeout). |
|
hosts_rq_timeout{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_rq_success |
Total requests with non-5xx responses from host Metrics Pegging Condition:Success(non 5xx) response received from host. |
|
hosts_rq_success{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_rq_error |
Total requests with 5xx responses from host Metrics Pegging Condition:Failure(5xx) response received from host. |
|
hosts_rq_error{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_rq_active |
Total active requests (in-flight transactions) Metrics Pegging Condition:Request forwarded to host. It is decremented once response is received. |
|
hosts_rq_active{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
hosts_success_rate |
Request success rate (0-100). If there was not enough request volume in the interval to calculate it. |
|
hosts_success_rate{cluster_name="outbound|80||udm1_udm1svc_svc_cluster_local",endpoint="192.168.219.75:80"} |
scp_notifications_rejected_topologysource_local _total |
Number of NF notification messages rejected because "learning from NRF" was configured as "not allowed". Metrics Pegging Condition: This metrics will be pegged whenever notification is received for NF configured as LOCAL. |
nftype (eg: UDM, AMF) | scp_local_topology_source_total{NFType = "UDM"} |
scp_topologysource_toggle_nrf_to_local_total | Number of times topology source changed from "NRF" to "Local" for given NF. | nftype (eg: UDM, AMF) | scp_topologysource_toggle_nrf_to_local_total{NFType = "UDM"} |
scp_topologysource_toggle_local_to_nrf_total | Number of times topology source changed from "Local" to "NRF" for given NF. | nftype (eg: UDM, AMF) | scp_topologysource_toggle_local_to_nrf_total{NFType = "UDM"} |
scp_soothsayer_audit_db_fetch_failure_total | Number of times audit failed due to database failure | scp_soothsayer_audit_db_fetch_failure | |
scp_soothsayer_subscription_db_fetch_failure_total | Number of times subscription failed due to database failure | scp_soothsayer_subscription_db_fetch_failure | |
scp_failure_processed_nf_notification_total | Number of times notification process failure |
|
scp_failure_processed_nf_notification{nftype ="UDM",nfinstanceid="6faf3abc-6e4a-4454-a507-a14ef8e1bc4b"} |
ocscp_metric_total_app_req | Number of request for ocscp_app (mediation and amf/smf db messages) at
scp-worker
This metric can be used in conjunction with 'ocscp_metric_total_http_rx_req' to calculate percentage of message processing by mediation. |
|
ocscp_metric_total_app_req{ocscp_app_name="ocscp-sds-amf"} ocscp_metric_total_app_req{ocscp_trigger_name="OnRequestIngress"} ocscp_metric_total_app_req{ocscp_nf_type="UDM"} ocscp_metric_total_app_req{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_total_app_req{ocscp_nf_end_point="10.96.166.65:80"} ocscp_metric_total_app_req{ocscp_locality="Loc7"} ocscp_metric_total_app_req{ocscp_nf_instance_id="6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c"} ocscp_metric_total_app_req{ocscp_service_instance_id="fe137ab7-740a-46ee-aa5c-951806d77b02"} |
ocscp_metric_total_app_res | Number of response for ocscp_app (mediation and amf/smf db messages) at scp-worker |
|
ocscp_metric_total_app_req{ocscp_app_name="ocscp-sds-amf"} ocscp_metric_total_app_req{ocscp_trigger_name="OnRequestIngress"} ocscp_metric_total_app_req{ocscp_nf_type="UDM"} ocscp_metric_total_app_req{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_total_app_req{ocscp_nf_end_point="10.96.166.65:80"} ocscp_metric_total_app_req{ocscp_locality="Loc7"} ocscp_metric_total_app_req{ocscp_nf_instance_id="6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c"} ocscp_metric_total_app_req{ocscp_service_instance_id="fe137ab7-740a-46ee-aa5c-951806d77b02"} Note: Dimension "ocscp_trigger_name" and "ocscp_msg_size " cannot be used together. |
ocscp_http_downstream_cx_destroy_remote_active_rq | Number of downstream connections destroyed from remote end with active requests i.e. SCP was still processing the request from the downstream peer. | ocscp_http_conn_manager_prefix - Stat prefix for the listener. If not specified it is constructed using listener address. For example, 0.0.0.0_8080 for SCP signaling port. |
ocscp_http_downstream_cx_destroy_remote_active_rq{ocscp_http_conn_manager_prefix="0.0.0.0_8080"} Note: This metric is usually accompanied by a warning log to show downstream peer address. |
scp_patch_subscription_nf_success_total | Number of successful patch operation of subscription based on nfType and ServingScope |
|
scp_patch_subscription_nf_success_total{nftype="AUSF",ServingScope="Reg1"} |
scp_patch_subscription_nf_failure_total | Number of failure patch operation of subscription based on nfType and ServingScope |
|
scp_patch_subscription_nf_failure_total{nftype="AUSF",ServingScope="Reg1"} |
scp_unsubscription_nf_success_total | Number of successful operations of unsubscription based on nftype and servingscope |
|
scp_unsubscription_nf_success_total{nftype="AUSF",ServingScope="Reg1"} |
scp_unsubscription_nf_failure_total | Number of successful operations of unsubscription based on nftype and servingscope |
|
scp_unsubscription_nf_failure_total{nftype="AUSF",ServingScope="Reg1"} |
scp_subscription_nf_success_total | Number of successful operations of subscription based on nftype and servingscope |
|
scp_subscription_nf_success_total{nftype="AUSF",ServingScope="Reg1"} |
scp_subscription_nf_failure_total | Number of successful operations of subscription based on nftype and servingscope |
|
scp_subscription_nf_failure_total{nftype="AUSF",ServingScope="Reg1"} |
scp_soothsayer_audit_2xx_empty_nf_array | The state of metrics will be 1 when scp audit receives empty nf array response for nftype. Then state of metrics will be change to 0 when audit receives non empty nf array. | scp.soothsayer_audit_2xx_empty_nf_array{} | |
scp_soothsayer_audit_error_response |
The state of metrics will be 1 when scp audit receives error response for nftype. Then state of metrics will be change to 0 when audit receives non error response for nftype. |
scp.soothsayer_audit_error_response{} | |
scp_soothsayer_db_operation_success_total | Number of successful soothsayer db Operations |
|
scp_soothsayer_db_operation_success_total{dbOperation="insertOrUpdate",tableName="NF_SUBSCRIPTIONS"}
scp_soothsayer_db_operation_success_total{dbOperation="getAll",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_success_total{dbOperation="get",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_success_total{dbOperation="delete",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_success_total{dbOperation="insert",tableName="NRF_NF_DETAILS"} scp_soothsayer_db_operation_success_total{dbOperation="getAll",tableName="NRF_NF_DETAILS"} |
scp_soothsayer_db_operation_failure_total | Number of failure soothsayer db Operations |
|
scp_soothsayer_db_operation_failure_total{dbOperation="insertOrUpdate",tableName="NF_SUBSCRIPTIONS"}
scp_soothsayer_db_operation_failure_total{dbOperation="getAll",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_failure_total{dbOperation="get",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_failure_total{dbOperation="delete",tableName="NF_SUBSCRIPTIONS"} scp_soothsayer_db_operation_failure_total{dbOperation="insert",tableName="NRF_NF_DETAILS"} scp_soothsayer_db_operation_failure_total{dbOperation="getAll",tableName="NRF_NF_DETAILS"} |
scp_soothsayer_nrf_registration_success_total | This metrics pegs the number of times registration successful for a particular serving scope. |
servingScope ex: (Reg11,Reg1,Reg2) |
scp_soothsayer_nrf_registration_success_total{servingScope="Reg11"} |
scp_soothsayer_nrf_registration_failure_total | This metrics pegs the number of times registration failed for a particular serving scope. |
servingScope ex: (Reg11,Reg1,Reg2) |
scp_soothsayer_nrf_registration_failure_total{servingScope="Reg11"} |
scp_soothsayer_nrf_heartbeat_success_total | This metrics pegs the number of times heartbeat successful for a particular serving scope. |
servingScope ex: (Reg11,Reg1,Reg2) |
scp_soothsayer_nrf_heartbeat_success_total{servingScope="Reg11"} |
scp_soothsayer_nrf_heartbeat_failures_total | This metrics pegs the number of times heartbeat failed for a particular serving scope. |
servingScope ex: (Reg11,Reg1,Reg2) |
scp_soothsayer_nrf_heartbeat_failures_total{servingScope="Reg11"} |
scp_soothsayer_mediation_total_rules_per_trigger |
Total number of mediation rules configured per trigger point. To get cumulative value of all rules on all trigger, use the SUM function. |
nftype
nfservice trigger |
|
ocscp_metric_upstream_service_time | This metric captures the time taken by upstream host is responding the request in time buckets (e.g., 1ms, 2ms, 4ms, 8ms and so on). |
ocscp_nf_type (e.g., UDM, PCF, AMF, etc.) ocscp_nf_service_type (e.g., nudm-uecm, nudm-sdm, etc.) ocscp_nf_end_point (Default value = 0.0:00) ocscp_locality (SCP site name e.g., Loc6, Loc7, etc.) ocscp_nf_instance_id (e.g., 6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, etc. Default value = NA) ocscp_service_instance_id (e.g., fe137ab7-740a-46ee-aa5c-951806d77b02, etc. Default value = NA) ocscp_upstream_service_time (e.g., 1ms, 2ms, 4ms, 8ms, etc.) |
ocscp_metric_upstream_service_time{ocscp_nf_type="UDM"} ocscp_metric_upstream_service_time{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_upstream_service_time{ocscp_nf_end_point="10.96.166.65:80"} ocscp_metric_upstream_service_time{ocscp_locality="Loc7"} ocscp_metric_upstream_service_time{ocscp_nf_instance_id="6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c"} ocscp_metric_upstream_service_time{ocscp_service_instance_id="fe137ab7-740a-46ee-aa5c-951806d77b02"} ocscp_metric_upstream_service_time{ocscp_upstream_service_time="1ms"} |
ocscp_metric_request_header_to_body_time | This metric captures the time between receiving the request headers to receiving the request body in time buckets (e.g., 1ms, 2ms, 4ms, 8ms and so on) |
ocscp_nf_type (e.g., UDM, PCF, AMF, etc.) ocscp_nf_service_type (e.g., nudm-uecm, nudm-sdm, etc.) ocscp_nf_end_point (Default value = 0.0:00) ocscp_locality (SCP site name e.g., Loc6, Loc7, etc.) ocscp_nf_instance_id (e.g., 6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, etc. Default value = NA) ocscp_service_instance_id (e.g., fe137ab7-740a-46ee-aa5c-951806d77b02, etc. Default value = NA) ocscp_header_to_body_time (e.g., 1ms, 2ms, 4ms, 8ms, etc.) |
ocscp_metric_request_header_to_body_time{ocscp_nf_type="UDM"} ocscp_metric_request_header_to_body_time{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_request_header_to_body_time{ocscp_nf_end_point="10.96.166.65:80"} ocscp_metric_request_header_to_body_time{ocscp_locality="Loc7"} ocscp_metric_request_header_to_body_time{ocscp_nf_instance_id="6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c"} ocscp_metric_request_header_to_body_time{ocscp_service_instance_id="fe137ab7-740a-46ee-aa5c-951806d77b02"} ocscp_metric_request_header_to_body_time{ocscp_header_to_body_time="1ms"} |
ocscp_metric_request_complete_time | This metric captures the time of receiving the complete request in time buckets (e.g., 1ms, 2ms, 4ms, 8ms and so on) |
ocscp_nf_type (e.g., UDM, PCF, AMF, etc.) ocscp_nf_service_type (e.g., nudm-uecm, nudm-sdm, etc.) ocscp_nf_end_point (Default value = 0.0:00) ocscp_locality (SCP site name e.g., Loc6, Loc7, etc.) ocscp_nf_instance_id (e.g., 6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c, etc. Default value = NA) ocscp_service_instance_id (e.g., fe137ab7-740a-46ee-aa5c-951806d77b02, etc. Default value = NA) ocscp_request_complete_time (e.g., 1ms, 2ms, 4ms, 8ms, etc.) |
ocscp_metric_request_complete_time{ocscp_nf_type="UDM"} ocscp_metric_request_complete_time{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_request_complete_time{ocscp_nf_end_point="10.96.166.65:80"} ocscp_metric_request_complete_time{ocscp_locality="Loc7"} ocscp_metric_request_complete_time{ocscp_nf_instance_id="6faf1bbc-6e4a-4454-a507-a14ef8e1bc5c"} ocscp_metric_request_complete_time{ocscp_service_instance_id="fe137ab7-740a-46ee-aa5c-951806d77b02"} ocscp_metric_request_complete_time{ocscp_request_complete_time="1ms"} |
ocscp_metric_downstream_connection | Total number of downstream connections per thread. | ocscp_thread_id (Thread Id of the thread which is handling the connection) | ocscp_metric_downstream_connection{ocscp_thread_id ="23434434234"} |
ocscp_metric_total_http_rx_downstream_req | Total number of incoming HTTP requests per downstream peers |
ocscp_nf_type (e.g., UDM, PCF, AMF, etc.) ocscp_nf_service_type (e.g., nudm-uecm, nudm-sdm, etc.) ocscp_thread_id (Thread Id of the thread which is processing the request) ocscp_locality (SCP site name e.g., Loc6, Loc7, etc.) ocscp_downstream_remote_address (IP address ("-" separated instead of ".") of downstream peer e.g. 10-233-73-63) |
ocscp_metric_total_http_rx_downstream_req{ocscp_nf_type="UDM"} ocscp_metric_total_http_rx_downstream_req{ocscp_nf_service_type="nudm-uecm"} ocscp_metric_total_http_rx_downstream_req{ocscp_thread_id ="23434434234"} ocscp_metric_total_http_rx_downstream_req{ocscp_locality="Loc7"} ocscp_metric_total_http_rx_downstream_req{ocscp_downstream_remote_address ="10-233-73-63"} |
ocscp_upstream_app_service_time_ms |
Bucketed response time by SCP APPS as per allowed dimensions. For mediation, ocscp_app_name is 'nmediation-http' for ocscp_upstream_service_time, buckets(ms) are = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 |
ocscp_app_name (nmediation_http, ocscp-sds-amf and ocscp-sds-smf) ocscp_upstream_service_time (e.g., 1ms, 2ms, 4ms, 8ms, etc.) |
ocscp_nf_type
ocscp_nf_service_type ocscp_app_name ocscp_trigger_name ocscp_upstream_service_time |
ocscp_metric_total_mediation_processing_failures | Number of processing failures at nf-mediation service. | NA | NA |
ocscp_metric_downstream_request_reset |
For downstream message got reset at scp, this metrics calculate bucketed elapsed time of such message at SCP. Time Buckets are 1ms,2ms,4ms, 8ms, to 2048ms. |
ocscp_elapsed_time(Example: 1ms, 2ms, 4ms, 8ms, etc.) | sum(ocscp_metric_downstream_request_reset) by (ocscp_elapsed_time) |
ocscp_metric_scp_generated_response | Captures the error response generated by SCP. |
|
ocscp_metric_scp_generated_response{ocscp_nf_type="UDM",ocscp_nf_service_type="nudm-uecm",ocscp_response_code="503",ocscp_error_group="ocscp.http.routing_failure.connect_failure"} |
ocscp_metric_config_error_cluster_not_found | Captures cluster name ( destination endpoint), which is not get configured properly at SCP. | ocscp_cluster_name (Example: outbound|80||udm1svctest_default_svc_cluster_local) | ocscp_metric_config_error_cluster_not_found{ocscp_cluster_name="outbound|80||udm1svctest_default_svc_cluster_local"}
Note: This metric is relevant for internal debugging purpose, as it has to show cluster name which is used by SCP, but might not similar to what user has configured in nf_profile. |
Traces Reference
The following table provides information for Traces for Service Communication Proxy.
Table 5-5 Traces Reference
Field Name | Request/ Response Type | Description |
---|---|---|
component | common | The software package, framework, library, or module that generated the associated Span. |
node_id | common | Local information |
method | request,common | HTTP method of the request for the associated Span. Example: "GET","POST" |
scheme | request | Url scheme is http |
authority | request | Authority give you details about registered name or server address, along with optional port and user information |
path | request | A path consists of a sequence of path segments separated by a slash ("/") character. |
3gpp-sbi-message-priority | request | This header shall be included in HTTP/2 messages when a priority for the message needs to be conveyed |
x-forwarded-for | request | To identify the originating IP address of a client. |
x-forwarded-proto | request | To determine the protocol used between the client and the spf. |
x-envoy-internal | request | service wants to know whether a request is internal origin or not |
via | request | It is used for tracking message forwards, avoiding request loops, and identifying the protocol capabilities of senders along the request/response chain |
x-request-id | common | The x-request-idheader is used to uniquely identify a request as well as perform stable access logging and tracing |
payload | request, common | http request body. |
status | response | To determine the http request has been succeeded or not. |
content-type | response | The Content-Type entity header is used to indicate the media type of the resource |
content-length | response | The Content-Length header indicates the size of the entity body in the message, in bytes |
server | response | The Server response-header field contains information about the software used by the origin server to handle the request |
date | response | The time and date, when the request is processed. |
x-envoy-upstream-service-time | response | Contains the time in milliseconds spent by the upstream host processing the request |
location | response | To provide information about the location of a newly created resource |
payload | response | http response body. |
http.url | request,common | Specifies the request's URL. |
downstream_cluster | common | A downstream host connects to Envoy, sends requests, and receives responses |
user_agent | request,common | When your browser connects to a website, it includes a User-Agentfield in itsHTTPheader |
http.protocol | common | The communication between client and server over HTTP/2. |
request_size | common | HTTP header size. |
upstream_cluster | common | An upstream host receives connections and requests from Envoy and returns responses. |
http.ststus_code | common | HTTP response status code for the associated Span. Example: 200, 503, 404 |
response_size | common | HTTP header size. |
response_flag | common | Additional details about the response or connection, if any |
span.kind | common | Specifies the role of the Span in a RPC communication. In the case of HTTP communication it is seeingclientandservervalues for this tag. |
error | common | True, if and only if, the application considers the operation represented by the Span to have failed |
x-request-id | request | The x-request-idheader is used by Envoy to uniquely identify a request as well as perform stable access logging and tracing |
x-b3-traceid | request | The x-b3-traceidHTTP header is used by the Zipkin tracer in Envoy. The TraceId is 64-bit in length and indicates the overall ID of the trace. Every span in a trace shares this ID |
x-b3-spanid | request | The x-b3-spanidHTTP header is used by the Zipkin tracer in Envoy. The SpanId is 64-bit in length and indicates the position of the current operation in the trace tree |
x-b3-sampled | request | The x-b3-sampledHTTP header is used by the Zipkin tracer in Envoy. When the Sampled flag is either not specified or set to 1, the span will be reported to the tracing system |
ueidentitytype | request | NF type. |
ueidentityvalue | request | supi range for NF type . |
x-envoy-expected-rq-timeout-ms | request | This is the time in milliseconds the router expects the request to be completed |
HTTP Status Code and applicability for rerouting
Description
This page describes the HTTP status codes usage on SBI. HTTP status codes are carried in ":status" pseudo header field in HTTP/2, as defined in subclause 8.1.2.4 in IETF RFC 7540.
Below table specifies HTTP status codes per HTTP method which is supported on SBI. Support of an HTTP status code is:
- Mandatory, which is marked in table as "M". This means that all 3GPP NFs shall support the processing of the specific HTTP status code for the specific HTTP method, when received in a HTTP response message. In such cases the 3GPP NF also supports the handling of the "ProblemDetails" JSON object with the Content-Type header field set to the value "application/problem+json" for HTTP status codes 4xx and 5xx, if the corresponding API definition in the related technical specification does not specify another response body for the corresponding status code;
- Service specific, which is marked in table as "SS" and means that the requirement to process the HTTP status code depends on the definition of the specific API; or
- Not applicable, which is marked in table as "N/A". This means that the specific HTTP status code shall not be used for the specific HTTP method within the 3GPP NFs.
- Applicable for Rerouting column describes if the status code is applicable for rerouting at SPF. These Status codes can be configured in Routing options for each NF services.
NOTE 1: "200 OK" response used on SBI shall contain body.
NOTE 2: If the NF acting as an HTTP Client receives 2xx response code not appearing in table, the NF shall treat the received 2xx response: - as "204 No Content" if 2xx response does not contain body; and - as "200 OK" if 2xx response contains body.
HTTP status code supported on SBI
Table 5-6 HTTP status code supported on SBI
HTTP status code | HTTP status code |
HTTP method |
Applicable for Rerouting | ||||
DELETE | GET | PATCH | POST | PUT | |||
100 Continue | N/A | N/A | N/A | N/A | N/A | No | |
200 OK (NOTE 1) | SS | M | SS | SS | SS | No | |
201 Created | N/A | N/A | N/A | SS | SS | No | |
202 Accepted | SS | N/A | SS | SS | SS | No | |
204 No Content (NOTE 2) | M | N/A | SS | SS | SS | No | |
300 Multiple Choices | N/A | N/A | N/A | N/A | N/A | No | |
303 See Other | SS | SS | N/A | SS | SS | NO | |
307 Temporary Redirect | SS | SS | SS | SS | SS | Yes | 307 (Should be included as part of 3xx) |
308 Permanent Redirect | SS | SS | SS | SS | SS | Yes | 308 (Should be included as part of 3xx) |
400 Bad Request | M | M | M | M | M | No | |
401 Unauthorized | M | M | M | M | M | No | |
403 Forbidden | SS | SS | SS | SS | SS | No | |
404 Not Found | SS | SS | SS | SS | SS | Yes | 404 |
405 Method Not Allowed | SS | SS | SS | SS | SS | No | |
406 Not Acceptable | N/A | N/A | N/A | N/A | N/A | No | |
408 Request Timeout | SS | SS | SS | SS | SS | Yes | 408 |
409 Conflict | N/A | N/A | SS | SS | SS | Yes | 409 (should be included as part of "retriable-4xx" ) |
410 Gone | SS | SS | SS | SS | SS | Yes | 410 |
411 Length Required | N/A | N/A | M | M | M | No | |
412 Precondition Failed | SS | SS | SS | SS | SS | No | |
413 Payload Too Large | N/A | N/A | M | M | M | No | |
414 URI Too Long | N/A | M | N/A | N/A | N/A | No | |
415 Unsupported Media Type | N/A | N/A | M | M | M | No | |
500 Internal Server Error | M | M | M | M | M | Yes | 500 |
501 Not Implemented | SS | SS | SS | SS | SS | Yes | 501 |
503 Service Unavailable | M | M | M | M | M | Yes | 503 (Should be included as part of "5xx") |
504 Gateway Timeout | SS | SS | SS | SS | SS | Yes | 504 (Should be included as part of "5xx") |
NF as HTTP Client
Besides the HTTP Status Codes defined in the API specification, a NF as HTTP client should support handling of 1xx, 3xx, 4xx and 5xx HTTP Status Codes specified in above table, following the client behavior in corresponding IETF RFC where the received HTTP Status Code is defined.
When receiving a not recommended or not recognized 1xx, 3xx, 4xx or 5xx HTTP Status Code, a NF as HTTP client should treat it as x00 status code of the class, as described in clause 6 of IETF RFC 7231.
If 100, 200/204, 300, 400 or 500 response code is not defined by the API specification, the client may follow guidelines below:
- For 1xx (Informational):
- Discard the response and wait for final response.
- For 2xx (Successful):
- Consider the service operation is successful if no mandatory information is expected from the response payload in subsequent procedure.
- If mandatory information is expected from response payload in subsequent procedure, parse the payload following description in subclause 6.2.1 of IETF RFC 7231 [11]. If parse is successful and mandatory information is extracted, continue with subsequent procedure.
- Otherwise, consider service operation has failure and start failure handling.
- For 3xx (Redirection):
- Retry the request towards the directed resource referred in the Location header, using same request method.
- For 4xx (Client Error):
- Validate the request message and make correction before resending. Otherwise, stop process and go to error handling procedure.
- For 5xx (Server Error):
- Stop process and go to error handling process.
NF as HTTP Server
A NF acting as an HTTP server is able to generate HTTP status codes specified in above table per indicated HTTP method.
An HTTP method which is not supported by 5GC SBI API specification is rejected with the HTTP status code "501 Not Implemented".
NOTE 1: In this case, the NF does not need to include in the HTTP response the "cause" attribute indicating corresponding error since the HTTP status code "501 Not Implemented" itself provides enough information of the error, i.e. the NF does not recognize the HTTP method.
If the specified target resource does not exist, the NF rejects the HTTP method with the HTTP status code "404 Not Found".
If the NF supports the HTTP method but not by a target resource, the NF rejects the HTTP method with the HTTP status code "405 Method Not Allowed" and includes in the response an Allow header field containing the supported method(s) for that resource.
NOTE 2: In this case, the NF does not need to include in the HTTP response the "cause" attribute indicating corresponding error since the HTTP status code "405 Method Not Allowed" itself provides enough information of the error and hence the Allow header field lists HTTP method(s) supported by the target resource.
If the received HTTP request contains incorrect optional IE, the NF discards the incorrect IE.
If the NF supports the HTTP method by a target resource but the NF cannot successfully fulfil the received request, the following requirements apply.
A NF as HTTP Server should map application errors to the most similar 3xx/4xx/5xx HTTP status code specified in table 5.2.7.1-1. If no such code is applicable, it should use "400 Bad Request" status code for errors caused by client side or "500 Server Internal Error" status code for errors caused on server side.
If the received HTTP request contains unsupported payload format, the NF rejects the HTTP request with the HTTP status code "415 Unsupported Media Type". If the HTTP PATCH method is rejected, the NF includes the Accept-Patch header field set to the value of supported patch document media types for a target resource i.e. to "application/merge-patch+json" if the NF supports "JSON Merge Patch" and to "application/json-patch+json" if the NF supports "JSON Patch". If the received HTTP PATCH request contains both "JSON Merge Patch" and "JSON Patch" documents and the NF supports only one of them, the NF ignores unsupported patch document.
NOTE 3: The format problem might be due to the request's indicated Content-Type or Content-Encoding header fields, or as a result of inspecting the payload body directly.
If the received HTTP request contains payload body larger than the NF is able to process, the NF rejects the HTTP request with the HTTP status code "413 Payload Too Large".
If the result of the received HTTP POST request used for a resource creation would be equivalent to the existing resource, the NF rejects the HTTP request with the HTTP status code "303 See Other" and includes in the HTTP response a Location header field set to the URI of the existing resource.
Protocol and application errors common to several 5GC SBI API specifications for which the NF includes in the HTTP response a payload body ("ProblemDetails" data structure or application specific error data structure) with the "cause" attribute indicating corresponding error are listed in below table.
Table 5-7 Protocol and application errors
Parameters | HTTP status code | Description |
---|---|---|
INVALID_API | 400 Bad Request | The HTTP request contains an unsupported API name or API version in the URI. |
INVALID_MSG_FORMAT | 400 Bad Request | The HTTP request has an invalid format. |
INVALID_QUERY_PARAM | 400 Bad Request | The HTTP request contains an unsupported query parameter in the URI. |
MANDATORY_IE_INCORRECT | 400 Bad Request | A mandatory IE or conditional IE in data structure, but mandatory required, for an HTTP method was received with a semantically incorrect value. (NOTE 1) |
MANDATORY_IE_MISSING | 400 Bad Request | IE which is defined as mandatory or as conditional in data structure, but mandatory required, for an HTTP method is not included in the payload body of the request. (NOTE 1) |
UNSPECIFIED_MSG_FAILURE | 400 Bad Request | The request is rejected due to unspecified client error. (NOTE 2) |
MODIFICATION_NOT_ALLOWED | 403 Forbidden | The request is rejected because the contained modification instructions attempt to modify IE which is not allowed to be modified. |
SUBSCRIPTION_NOT_FOUND | 404 Not Found | The request for modification or deletion of subscription is rejected because the subscription is not found in the NF. |
RESOURCE_URI_STRUCTURE_NOT_FOUND | 404 Not Found | The request is rejected because a fixed part after the first variable part of an "apiSpecificResourceUriPart" (as defined in subclause 4.4.1 of 3GPP TS 29.501) is not found in the NF. This fixed part of the URI may represent a sub-resource collection (e.g. contexts, subscriptions, policies) or a custom operation. (NOTE X) |
INCORRECT_LENGTH | 411 Length Required | The request is rejected due to incorrect value of a Content-length header field. |
NF_CONGESTION_RISK | 429 Too Many Requests | The request is rejected due to excessive traffic which, if continued over time, may lead to (or may increase) an overload situation. |
INSUFFICIENT_RESOURCES | 500 Internal Server Error | The request is rejected due to insufficient resources. |
UNSPECIFIED_NF_FAILURE | 500 Internal Server Error | The request is rejected due to unspecified reason at the NF. (NOTE 3) |
SYSTEM_FAILURE | 500 Internal Server Error | The request is rejected due to generic error condition in the NF. |
NF_CONGESTION | 503 Service Unavailable | The NF experiences congestion and performs overload control, which does not allow the request to be processed. (NOTE 4) |
NOTE 1: "invalidParams" attribute is included in the "ProblemDetails" data structure indicating missing or incorrect IE.
NOTE 2: This application error indicates error in the HTTP request and there is no other application error value that can be used instead.
NOTE 3: This application error indicates error condition in the NF and there is no other application error value that can be used instead.
NOTE 4: If the reason for rejection is a temporary overload, the NF may include in the response a Retry-After header field to indicate how long the service is expected to be unavailable.
NOTE X: If the request is rejected because of an error in an URI before the first variable part of an "apiSpecificResourceUriPart", the "404 Not Found" HTTP status code may be sent without "ProblemDetails" data structure indicating protocol or application error.