D Manual Shutdown and Recovery of NRF
D.1 Manual Shutdown of NRF
This section provides the procedure to graceful shutdown of NRF deployment.
- Run the following command to retrieve the replica count for the NRF
microservices
Note:
Before running graceful shutdown, note down the replica count for the NRF microservices.kubectl -n <NRF Namespace> get deploymentsWhere,
<NRF Namespace>is the namespace where NRF is deployedFor example:
kubectl -n usw2az1001np-ns-or-nrf-001 get deploymentsSample output:NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/or-nrf001c-use1az2n01p-appinfo 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-egressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-ingressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfregistration 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfsubscription 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfauditor 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration 1/1 1 1 36hNote:
Note down the replica count for NRF Auditor, Ingress Gateway, and Egress Gateway. This replica count value is used while restoring back the NRF deployment. - Scale down the pods on NRF for NRF Auditor, Ingress Gateway, and Egress Gateway, as
follows:
- Change the NRF Auditor replicaSet to
0:
kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=0For example:kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0Sample output:$ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0 deployment.apps/or-nrf001c-use1az2n01p-nrfauditor scaled - Change Ingress Gateway replicaSet to
0:
kubectl -n <NRF Namespace> get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <OCNRF Namespace> scale deployment --replicas=0For example:kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0Sample output:$ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0 deployment.apps/or-nrf001c-use1az2n01p-ingressgateway scaled - Change Egress Gateway replicaSet to
0:
kubectl -n <NRF Namespace> get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=0For example:kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0Sample output:$ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0 deployment.apps/or-nrf001c-use1az2n01p-egressgateway scaled
- Change the NRF Auditor replicaSet to
0:
- Run the following command to verify that the deployment and pods for NRF Auditor,
Ingress Gateway, and Egress Gateway are
down:
kubectl -n <NRF Namespace> get pods,deploymentsFor example:kubectl -n usw2az1001np-ns-or-nrf-001 get pods,deploymentsSample output:$ kubectl -n use1az2001np-ns-or-nrf-001 get pods,deployments NAME READY STATUS RESTARTS AGE pod/or-nrf001c-use1az2n01p-appinfo-7bb648b6cb-4knk2 2/2 Running 0 36h pod/or-nrf001c-use1az2n01p-egressgateway-7c44dd664-zg5rx 2/2 Terminating 0 36h pod/or-nrf001c-use1az2n01p-ingressgateway-7678c78c9d-gwdrd 2/2 Terminating 0 36h pod/or-nrf001c-use1az2n01p-nfdiscovery-6fdcfdbbd9-sfrwt 2/2 Running 0 36h pod/or-nrf001c-use1az2n01p-nfregistration-568986fd5-4j2pf 2/2 Running 0 36h pod/or-nrf001c-use1az2n01p-nfsubscription-549b474747-9s5qv 2/2 Running 0 36h pod/or-nrf001c-use1az2n01p-nrfauditor-69544798cc-q7pbw 2/2 Terminating 0 36h pod/or-nrf001c-use1az2n01p-nrfconfiguration-894759544-zzsqg 2/2 Running 0 36h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/or-nrf001c-use1az2n01p-appinfo 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-egressgateway 0/0 0 0 36h deployment.apps/or-nrf001c-use1az2n01p-ingressgateway 0/0 0 0 36h deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfregistration 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfsubscription 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfauditor 0/0 0 0 36h deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration 1/1 1 1 36h
D.2 Recovery of NRF
This section provides the procedure to bring back NRF by scaling up the replica count for the NRF microservices.
Note:
The replica count is noted down before running graceful shutdown. For more details, see Manual Shutdown of NRF.- Run the following command to restore the Ingress Gateway replicaSet
value:
kubectl -n <NRF Namespace> get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before ingress gateway was scaled down>For example:kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=1 - Run the following command to restore the Egress Gateway replicaSet
value:
kubectl -n <NRF Namespace> get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before egress gateway was scaled down>For example:kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=1 - Run the following command to verify that the Ingress Gateway and Egress
Gateway replicas mentioned in above command are
running:
kubectl -n <NRF Namespace> get deploymentsFor example:kubectl -n usw2az1001np-ns-or-nrf-001 get deploymentsSample output:NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/or-nrf001c-use1az2n01p-appinfo 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-egressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-ingressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfregistration 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfsubscription 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfauditor 0/0 0 0 36h deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration 1/1 1 1 36h - Wait for Maximum HB value configured at NRF (default 5 mins) * 3, that is, 15
minutes default.
See
maxHbTimerattribute value configured NF Management Options API section from Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide. - Run the following command to restore the NRFAuditor replicaSet
value:
kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before NRF auditor was scaled down>Sample command:kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before NRF auditor was scaled down> - Run the following command to verify that the NRFAuditor replicas mentioned in above
command are
running:
kubectl -n <NRF Namespace> get deploymentsSample command:kubectl -n usw2az1001np-ns-or-nrf-001 get deploymentsSample output:NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/or-nrf001c-use1az2n01p-appinfo 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-egressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-ingressgateway 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfregistration 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nfsubscription 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfauditor 1/1 1 1 36h deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration 1/1 1 1 36h
D.3 NRF Behavior Post Fault Recovery
During the database restore procedure, along with the configuration data, the NFProfile data/subscription data also gets restored which may not always be the latest state data at that moment. In this state, the NrfAuditor microservice may act upon the NFProfiles/NFsubscriptions which are not up-to-date yet. Also, the NFs is in process of moving to the current NRF which is available now. In this state, if the audit procedure is performed, NRF suspends those NFs and send out notifications to the consumer NFs. The same is applicable for NfSubscriptions, where the subscriptions may get deleted due to an older lastUpdatedTimestamp in the backup data.
To avoid this problem, NrfAuditor microservice waits for a waiting period
before resuming the auditing of NFProfiles and subscriptions as soon as it comes to
Ready state from NotReady state. An alert
("OcnrfAuditOperationsPaused") is raised to indicate that the audit
processes have paused, see NRF Alerts section in Oracle Communications
Cloud Native Core, Network Repository Function User Guide. Once the waiting
period is elapsed, the audit processes resume and the alert is cleared.
To know about the computation of the waiting period, refer to "Controlled Shutdown of NRF" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.
Note:
The NrfAuditor pod goes to NotReady state whenever it loses connectivity with the database. During temporary connectivity fluctuations, the NrfAuditor pod may transition between Ready and NotReady states cause the cool off period to kick in for every NotReady to Ready transition. To avoid such short and frequent transitions, NrfAuditor microservice applies the waiting period only when the pod is in NotReady for more than 5 seconds.