Configure
To deploy this automated failover solution, you need to configure the load balancer, set up alarms and notifications, create a function, and configure OCI API Gateway.
The following steps are detailed below:
- The process begins with preparing the Load Balancer, which requires setting secondary listeners and a specific rule set to control the redirection behavior.
- Following this, configure an alarm and notification to trigger action when the application servers are all in an unhealthy status, and when healthy servers become available.
- You then enable the core automation by deploying an fuction using OCI Functions, which programmatically controls the attachment or detachment of the load balancer's rule set based on the current alarm state.
- Finally, configure OCI API Gateway to host your custom static maintenance page.
Each of these configurations play a specific and integrated role in enabling the seamless, automated failover to a friendly maintenance page.
Configure the Load Balancer
The foundation of this solution lies in the Load Balancer, which already fronts the application and distributes traffic across its backend servers. These steps assume most of the deployment prerequisites are already in place, including an application listener (HTTP or HTTPS), a backend set with health checks configured, and routing through an Internet Gateway so users can reach the service.
Start with a primary listener on the load balancer, configured to handle regular traffic to the application. When everything is operating normally, this listener routes incoming requests to the backend set of VM instances. It listens on standard ports (HTTP/80 or HTTPS/443), and health checks ensure that only healthy VMs receive traffic.
To serve the maintenance page, add a second listener on the load balancer. Unlike the application listener, this one does not forward requests to the application servers. Instead, its backend set points to an OCI API Gateway instance, which is responsible for hosting the static error page. This separation ensures that even if all application servers are down, the load balancer can still present a branded and informative maintenance page through the highly available API Gateway. The creation of the secondary listener and API gateway steps are optional: the maintenance page can be hosted anywhere on the internet.
The handoff between these two listeners is managed through a rule set. The rule set is attached to the application listener and defines the conditions under which traffic should be redirected. Under normal circumstances, the listener sends traffic directly to the application servers. However, when the application servers fail their health checks, the rule set comes into play. It tells the load balancer to redirect traffic to the maintenance listener, which in turn serves the custom page hosted in the API Gateway.
The following steps describe how to create the rule set used for redirecting users to the maintenance page.
About Alarms
An Alarm acts as a bridge between detection and action.
OCI Monitoring listens to the health metrics of the components of your deployment, including the load balancer — including the status of the backend set of VMs. When an alarm condition you configured in OCI Alarms is met (for example, all monitored VMs unhealthy for more than one minute), it immediately triggers a notification. This notification isn’t just for human administrators. You can route it through OCI Notifications to invoke a custom function deployed with OCI Functions. That function acts to change the load balancer configuration to show the custom error page.
The notification message sent to the function contains dimensions — key-value pairs that describe which resource and which backend set of VMs the metric event belongs to.
In the body of your alarm configuration, you will include the following code:
{{dimensions.resourceId}},{{dimensions.backendSetName}},<name of the ruleset>
This table describes the components of this alarm body:
| Element | Description | Purpose |
|---|---|---|
{{dimensions.resourceId}} |
The OCID of the load balancer resource that generated the metric event | The function uses this OCID to identify which load balancer needs the rule set update |
{{dimensions.backendSetName}} |
The name of the backend set that went unhealthy | The function can validate or log which backend set failed; useful for dynamic environments with multiple backend sets |
<name of the ruleset> |
A static value (string) — the name of the rule set to be attached when all backends are unhealthy | Let the function know which rule set to apply when triggered |
This design allows you to reuse the same function to handle tasks like configuring a load balancer to display the server maintenance page, and routing traffic back to the real application once services are restored. This approach can also be applied to manage all load balancers or applications on load balancers across your OCI deployment.
The OCI Load
Balancer service automatically publishes a metric called Unhealthybackendserver in the namespace oci_lbaas. It tracks the number of unhealthy backends in each backend set.
For the purposes of this solution, the important items in this metric are:
- Description
- Dimensions
- Triger rule
- Message grouping
In this solution, the alarm should trigger when all backend servers (VMs) become unhealthy. That means the unhealthy server count should be greater or equal to the total number of backend servers in the set.
Here is an example Alarm Trigger Rule query:
UnHealthyBackendServers[1m]{lbName = <name of lb>, backendSetName = <name of the backend set>}.max() >= 1The query translates to the following:
- If the maximum number of unhealthy backends is greater or equal to a specificed value (in this example, 1)
- For a defined period of 1 minute.
- Then the alarm transitions to
FIRINGstate.
However, this dynamic population of values only works when Split Notification is enabled under message grouping. Split notification forces OCI to send one notification per dimension value, instead of grouping everything together. Because of this, the alarm notification that reaches your custom function contains the exact load balancer OCID and the exact backend set name where the failure occurred. As a result, the same function becomes fully reusable across multiple load balancers, backend sets, or environments, without hardcoding load balancer details.
This configuration allows the entire automation chain to work — the alarm publishes dynamic context, the function reads it and performs the correct rule-set attachment on the exact listener which is serving the application to the end user.
Configure Alarms and Notifications
Perform the following step to configure the alarm and notification for this solution.
Create a Function
At the heart of the automation lies a function, which is triggered whenever the alarm detects that all application backends are unhealthy.
The function’s role is simple yet powerful: it dynamically updates the Load Balancer configuration by attaching or detaching the rule set that handles traffic redirection.
The Python code inside the function follows three logical steps:
- Authentication with OCI: The function begins by establishing a secure session with OCI using the Resource Principal (this is how functions in OCI are allowed to call other OCI services without manually managing keys). This ensures that the code can safely interact with the Load Balancer service. For more information on the authentication, refer to the links in Explore More.
- API call to modify the load balancer listener: Once authenticated, the code makes a call to the load balancer API.
- If the backends are failing, the function attaches the redirect rule set to the application listener, redirecting users to the custom error page.
- If the backends recover, the function detaches the rule set, restoring normal traffic flow to the application servers.
- Logging and Validation: The code also includes simple logging so administrators can track what action was taken: for example, “Attached Maintenance-Page rule set to listener-1”. This becomes extremely useful during troubleshooting or audits.
Use the following example Python code to create your function in Oracle Functions, modifying it as needed.
Function.py
import io
import json
import os
import oci
from fdk import response
import logging
def handler(ctx, data: io.BytesIO=None):
message = "start of function"
logging.getLogger().info("HTTP function start")
try:
payload_bytes = data.getvalue()
if payload_bytes == b'':
raise KeyError('No keys in payload')
body1 = json.loads(payload_bytes)
type1 = body1["type"]
query = body1["body"]
load_balancer_ocid = query.split(",")[0]
maintenance = query.split(",")[2]
signer = oci.auth.signers.get_resource_principals_signer()
load_balancer_client = oci.load_balancer.LoadBalancerClient(config={}, signer=signer)
load_balancer_client_composite_ops = oci.load_balancer.LoadBalancerClientCompositeOperations(load_balancer_client)
load_balancer_data = json.loads(str(load_balancer_client.get_load_balancer(load_balancer_ocid).data))
lb_config = load_balancer_data['listeners']
list1 = json.dumps(lb_config)
for key,value in json.loads(list1).items():
if value['default_backend_set_name'] == query.split(",")[1]:
f_list = key
rulesets = value['rule_set_names']
if type1=="OK_TO_FIRING":
message = "FIRE"
if maintenance in rulesets:
message = "Already in Maintenance Mode"
logging.getLogger().info("Already in Manintenance mode")
else:
rulesets.insert(0, maintenance)
message = "Entering Maintenance Mode"
logging.getLogger().info("Entering Main mode")
load_balancer_client_composite_ops.update_listener_and_wait_for_state(
oci.load_balancer.models.UpdateListenerDetails(
default_backend_set_name=value["default_backend_set_name"],
rule_set_names=rulesets,
port=value["port"],
protocol=value["protocol"],
ssl_configuration=value["ssl_configuration"]
),
load_balancer_ocid,
key,
wait_for_states=[oci.load_balancer.models.WorkRequest.LIFECYCLE_STATE_SUCCEEDED]
)
elif type1=="FIRING_TO_OK":
message = "OK"
if maintenance in rulesets:
message = "Entering Operation Mode"
logging.getLogger().info("Entering Operation Mode")
rulesets.remove(maintenance)
load_balancer_client_composite_ops.update_listener_and_wait_for_state(
oci.load_balancer.models.UpdateListenerDetails(
default_backend_set_name=value["default_backend_set_name"],
rule_set_names=rulesets,
port=value["port"],
protocol=value["protocol"],
ssl_configuration=value["ssl_configuration"]
),
load_balancer_ocid,
key,
wait_for_states=[oci.load_balancer.models.WorkRequest.LIFECYCLE_STATE_SUCCEEDED]
)
else:
message = "Already in operation Mode"
logging.getLogger().info("Already in Operation mode")
except (Exception) as ex:
message = "Error:" + str(ex)
return message
Configure OCI API Gateway
In this solution, the OCI API Gateway is configured to directly serve a static web page.
Note:
The use of OCI API Gateway is optional: you could also host your maintenance/error page outside of OCI.Unlike the typical use of OCI API Gateway where requests are routed to dynamic backends such as functions or compute instances, this approach leverages OCI API Gateway’s ability to host a static response. This static page acts as a friendly maintenance message, informing users that the service is temporarily unavailable due to scheduled maintenance or other issues. The static page is fully managed by OCI API Gateway, removing the need for additional infrastructure like web servers or object storage.
When the system detects that all backend servers are unhealthy, the function triggered by the alarm will respond by configuring the load balancer to redirect the traffic to secondary listener front-ending the OCI API Gateway instance, ensuring a seamless and user-friendly experience without exposing default error pages.
In this example you are only focused on the steps required to configure a static response using OCI API Gateway. For more information, review the resources in Explore More.