D Manual Shutdown and Recovery of NRF

D.1 Manual Shutdown of NRF

This section provides the procedure to graceful shutdown of NRF deployment.

  1. Run the following command to retrieve the replica count for the NRF microservices

    Note:

    Before running graceful shutdown, note down the replica count for the NRF microservices.
    kubectl -n <NRF Namespace> get deployments

    Where,

    <NRF Namespace> is the namespace where NRF is deployed

    For example:

    kubectl -n usw2az1001np-ns-or-nrf-001 get deployments

    Sample output:
    NAME                                                     READY  UP-TO-DATE AVAILABLE   AGE
    deployment.apps/or-nrf001c-use1az2n01p-appinfo            1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-egressgateway      1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-ingressgateway     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery        1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfregistration     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfsubscription     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfauditor         1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration   1/1    1         1           36h
    

    Note:

    Note down the replica count for NRF Auditor, Ingress Gateway, and Egress Gateway. This replica count value is used while restoring back the NRF deployment.
  2. Scale down the pods on NRF for NRF Auditor, Ingress Gateway, and Egress Gateway, as follows:
    1. Change the NRF Auditor replicaSet to 0:
      kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=0
      For example:
      kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0
      Sample output:
      $ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0
      deployment.apps/or-nrf001c-use1az2n01p-nrfauditor scaled
      
    2. Change Ingress Gateway replicaSet to 0:
      kubectl -n <NRF Namespace> get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <OCNRF Namespace> scale deployment --replicas=0
      For example:
      kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0
      Sample output:
      $ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0
      deployment.apps/or-nrf001c-use1az2n01p-ingressgateway scaled
      
    3. Change Egress Gateway replicaSet to 0:
      kubectl -n <NRF Namespace> get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=0
      For example:
      kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=0
      Sample output:
      $ kubectl -n use1az2001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n use1az2001np-ns-or-nrf-001 scale deployment --replicas=0
      deployment.apps/or-nrf001c-use1az2n01p-egressgateway scaled
      
  3. Run the following command to verify that the deployment and pods for NRF Auditor, Ingress Gateway, and Egress Gateway are down:
    kubectl -n <NRF Namespace> get pods,deployments 
    For example:
    kubectl -n usw2az1001np-ns-or-nrf-001 get pods,deployments 
    Sample output:
    $ kubectl -n use1az2001np-ns-or-nrf-001 get pods,deployments
    NAME                                                        READY STATUS     RESTARTS  AGE
    pod/or-nrf001c-use1az2n01p-appinfo-7bb648b6cb-4knk2         2/2   Running       0      36h
    pod/or-nrf001c-use1az2n01p-egressgateway-7c44dd664-zg5rx    2/2   Terminating   0      36h
    pod/or-nrf001c-use1az2n01p-ingressgateway-7678c78c9d-gwdrd  2/2   Terminating   0      36h
    pod/or-nrf001c-use1az2n01p-nfdiscovery-6fdcfdbbd9-sfrwt     2/2   Running       0      36h
    pod/or-nrf001c-use1az2n01p-nfregistration-568986fd5-4j2pf   2/2   Running       0      36h
    pod/or-nrf001c-use1az2n01p-nfsubscription-549b474747-9s5qv  2/2   Running       0      36h
    pod/or-nrf001c-use1az2n01p-nrfauditor-69544798cc-q7pbw      2/2   Terminating   0      36h
    pod/or-nrf001c-use1az2n01p-nrfconfiguration-894759544-zzsqg 2/2   Running       0      36h
    NAME                                                     READY  UP-TO-DATE AVAILABLE   AGE
    deployment.apps/or-nrf001c-use1az2n01p-appinfo            1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-egressgateway      0/0    0         0           36h
    deployment.apps/or-nrf001c-use1az2n01p-ingressgateway     0/0    0         0           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery        1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfregistration     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfsubscription     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfauditor         0/0    0         0           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration   1/1    1         1           36h
     
    

D.2 Recovery of NRF

This section provides the procedure to bring back NRF by scaling up the replica count for the NRF microservices.

Note:

The replica count is noted down before running graceful shutdown. For more details, see Manual Shutdown of NRF.
  1. Run the following command to restore the Ingress Gateway replicaSet value:
    kubectl -n <NRF Namespace> get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before ingress gateway was scaled down>
    For example:
    kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'ingressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=1
  2. Run the following command to restore the Egress Gateway replicaSet value:
    kubectl -n <NRF Namespace> get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before egress gateway was scaled down>
    For example:
    kubectl -n usw2az1001np-ns-or-nrf-001 get deployments | egrep 'egressgateway' | awk '{print $1}' | xargs -L1 -r kubectl -n usw2az1001np-ns-or-nrf-001 scale deployment --replicas=1
  3. Run the following command to verify that the Ingress Gateway and Egress Gateway replicas mentioned in above command are running:
    kubectl -n <NRF Namespace> get deployments 
    For example:
    kubectl -n usw2az1001np-ns-or-nrf-001 get deployments
    Sample output:
    NAME                                                     READY  UP-TO-DATE AVAILABLE   AGE
    deployment.apps/or-nrf001c-use1az2n01p-appinfo            1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-egressgateway      1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-ingressgateway     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery        1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfregistration     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfsubscription     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfauditor         0/0    0         0           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration   1/1    1         1           36h
    
  4. Wait for Maximum HB value configured at NRF (default 5 mins) * 3, that is, 15 minutes default.

    See maxHbTimer attribute value configured NF Management Options API section from Oracle Communications Cloud Native Core, Network Repository Function REST Specification Guide.

  5. Run the following command to restore the NRFAuditor replicaSet value:
    kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before NRF auditor was scaled down>
    Sample command:
    kubectl -n <NRF Namespace> get deployments | egrep 'nrfauditor' | awk '{print $1}' | xargs -L1 -r kubectl -n <NRF Namespace> scale deployment --replicas=<Replicas count before NRF auditor was scaled down>
  6. Run the following command to verify that the NRFAuditor replicas mentioned in above command are running:
    kubectl -n <NRF Namespace> get deployments
    Sample command:
    kubectl -n usw2az1001np-ns-or-nrf-001 get deployments
    Sample output:
    NAME                                                     READY  UP-TO-DATE AVAILABLE   AGE
    deployment.apps/or-nrf001c-use1az2n01p-appinfo            1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-egressgateway      1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-ingressgateway     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfdiscovery        1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfregistration     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nfsubscription     1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfauditor         1/1    1         1           36h
    deployment.apps/or-nrf001c-use1az2n01p-nrfconfiguration   1/1    1         1           36h
    

D.3 NRF Behavior Post Fault Recovery

During the database restore procedure, along with the configuration data, the NFProfile data/subscription data also gets restored which may not always be the latest state data at that moment. In this state, the NrfAuditor microservice may act upon the NFProfiles/NFsubscriptions which are not up-to-date yet. Also, the NFs is in process of moving to the current NRF which is available now. In this state, if the audit procedure is performed, NRF suspends those NFs and send out notifications to the consumer NFs. The same is applicable for NfSubscriptions, where the subscriptions may get deleted due to an older lastUpdatedTimestamp in the backup data.

To avoid this problem, NrfAuditor microservice waits for a waiting period before resuming the auditing of NFProfiles and subscriptions as soon as it comes to Ready state from NotReady state. An alert ("OcnrfAuditOperationsPaused") is raised to indicate that the audit processes have paused, see NRF Alerts section in Oracle Communications Cloud Native Core, Network Repository Function User Guide. Once the waiting period is elapsed, the audit processes resume and the alert is cleared.

To know about the computation of the waiting period, refer to "Controlled Shutdown of NRF" section in Oracle Communications Cloud Native Core, Network Repository Function User Guide.

Note:

The NrfAuditor pod goes to NotReady state whenever it loses connectivity with the database. During temporary connectivity fluctuations, the NrfAuditor pod may transition between Ready and NotReady states cause the cool off period to kick in for every NotReady to Ready transition. To avoid such short and frequent transitions, NrfAuditor microservice applies the waiting period only when the pod is in NotReady for more than 5 seconds.