Handling SLF Segments

4 Handling SLF Segments

In this section, you will learn about SLF Segments, detecting any failed SLF Segment in Provisioning Gateway setup, removing that failed SLF Segment and after fixing the issue, restoring that segment back to the setup.

Understanding the SLF Segment

The Provisioning Gateway receives provisioning requests from a provisioning client, forks the requests and sends it to all its configured SLF Segments. It accumulates all the responses it receives from each of the SLF Segment and sends a final response to the provisioning client.

Note:

The SLF Segment can have more than one SLFs. And, you can deploy more than one SLF Segment.

If all the SLFs in the SLF Segment are down and other segment responds success, the final response is failure. It causes the provisioning system to retry the same message without letting the user to know about the original problem. In this case, the user needs to remove that faulty segment from the Provisioning Gateway.

Detecting SLF Segment Failure

Provisioning Gateway raises Prometheus alerts that helps you to detect the SLF Segment failure or unavailability. You can monitor the following alerts using the Prometheus server:

ProvgwSegmentDownAbove1Percent (Warning): Indicates more than 1% and less than 10% of the total number of messages lost due to a segment failure.
ProvgwSegmentDownAbove10Percent (Minor): Indicates more than 10% and less than 25% of the total number of messages lost due to a segment failure.
ProvgwSegmentDownAbove25Percent (Major): Indicates more than 25% and less than 50% of the total number of messages lost due to a segment failure.
ProvgwSegmentDownAbove50Percent (Critical): Indicates 50% or more number of total messages lost due to segment failure.

[OR]

You can observe the Provisioning Gateway logs. These logs helps you to identify whether all the IP Addresses in the SLF Segment are down or not.

You can also receive or get the responses and verify which segment is not available or has gone down while looking into the problem statement.

Removing Failed SLF Segment

To remove the failed SLF Segment:

Execute the following command to copy the provgw-custom-values.yaml file.
cp provgw-custom-values.yaml provgw-custom-values.yaml.bkp
Edit the provgw-custom-values.yaml file and remove the faulty segment details completely.
Stop the Auditor Service.
Example:
1. In the initial deployment, there are two segments, SEG-1. SEG-2.
  
  Figure 4-1 Segments Deployed
2. The user identifies that SEG-1 has gone down and wants to remove it. To remove SEG-1, user should remove the entire segment configuration details from the .yaml file as shown below:
  
  Figure 4-2 Removing Segment Details
3. Change the auditor-service.enabled to false to stop the Auditor service.
  
  Figure 4-3 Stopping Auditor Service
4. After making necessary configuration changes in the .yaml file, execute the following command to do Helm upgrade.
  helm upgrade <release-name> <helm chart path> –- namespace <provgw namespace>

Restoring SLF Segment

After updating the SLF segment and assuring that it is ready to accept the messages, the user can add this segment back to the previous configuration. To restore the SLF Segment:

Copy the provgw-custom-values.yaml.bkp to provgw-custom-values.yaml if the updated SLF Segment is same as the previous one. If it is a new SLF Segment, then update the new segment details in the provgw-custom-values.yaml file.
After making necessary changes, execute the following command to upgrade helm:
helm upgrade <release-name> <helm chart path> – namespace <provgw namespace>
Verify if all the pods are up and running in the same namespace.