4 Handling SLF Segments
In this section, you will learn about SLF Segments, detecting any failed SLF Segment in Provisioning Gateway setup, removing that failed SLF Segment and after fixing the issue, restoring that segment back to the setup.
Understanding the SLF Segment
Note:
The SLF Segment can have more than one SLFs. And, you can deploy more than one SLF Segment.If all the SLFs in the SLF Segment are down and other segment responds success, the final response is failure. It causes the provisioning system to retry the same message without letting the user to know about the original problem. In this case, the user needs to remove that faulty segment from the Provisioning Gateway.
Detecting SLF Segment Failure
Provisioning Gateway raises Prometheus alerts that helps you to detect the SLF Segment failure or unavailability. You can monitor the following alerts using the Prometheus server:
- ProvgwSegmentDownAbove1Percent (Warning): Indicates more than 1% and less than 10% of the total number of messages lost due to a segment failure.
- ProvgwSegmentDownAbove10Percent (Minor): Indicates more than 10% and less than 25% of the total number of messages lost due to a segment failure.
- ProvgwSegmentDownAbove25Percent (Major): Indicates more than 25% and less than 50% of the total number of messages lost due to a segment failure.
- ProvgwSegmentDownAbove50Percent (Critical): Indicates 50% or more number of total messages lost due to segment failure.
[OR]
You can observe the Provisioning Gateway logs. These logs helps you to identify whether all the IP Addresses in the SLF Segment are down or not.
You can also receive or get the responses and verify which segment is not available or has gone down while looking into the problem statement.
Removing Failed SLF Segment
To remove the failed SLF Segment:
- Execute the following command to copy the provgw-custom-values.yaml
file.
cp provgw-custom-values.yaml provgw-custom-values.yaml.bkp
- Edit the provgw-custom-values.yaml file and remove the faulty segment details completely.
- Stop the Auditor Service.
Example:
- In the initial deployment, there are two segments, SEG-1. SEG-2.
Figure 4-1 Segments Deployed
- The user identifies that SEG-1 has gone down and wants to remove it. To
remove SEG-1, user should remove the entire segment configuration details
from the .yaml file as shown below:
Figure 4-2 Removing Segment Details
- Change the auditor-service.enabled to false to stop the Auditor
service.
Figure 4-3 Stopping Auditor Service
- After making necessary configuration changes in the .yaml file,
execute the following command to do Helm upgrade.
helm upgrade <release-name> <helm chart path> –- namespace <provgw namespace>
- In the initial deployment, there are two segments, SEG-1. SEG-2.
Restoring SLF Segment
After updating the SLF segment and assuring that it is ready to accept the messages, the user can add this segment back to the previous configuration. To restore the SLF Segment:
- Copy the provgw-custom-values.yaml.bkp to provgw-custom-values.yaml if the updated SLF Segment is same as the previous one. If it is a new SLF Segment, then update the new segment details in the provgw-custom-values.yaml file.
- After making necessary changes, execute the following command to upgrade
helm:
helm upgrade <release-name> <helm chart path> – namespace <provgw namespace>
- Verify if all the pods are up and running in the same namespace.