Configure

To deploy this automated failover solution, you need to configure the load balancer, set up alarms and notifications, create a function, and configure OCI API Gateway.

The following steps are detailed below:

  1. The process begins with preparing the Load Balancer, which requires setting secondary listeners and a specific rule set to control the redirection behavior.
  2. Following this, configure an alarm and notification to trigger action when the application servers are all in an unhealthy status, and when healthy servers become available.
  3. You then enable the core automation by deploying an fuction using OCI Functions, which programmatically controls the attachment or detachment of the load balancer's rule set based on the current alarm state.
  4. Finally, configure OCI API Gateway to host your custom static maintenance page.

Each of these configurations play a specific and integrated role in enabling the seamless, automated failover to a friendly maintenance page.

Configure the Load Balancer

The foundation of this solution lies in the Load Balancer, which already fronts the application and distributes traffic across its backend servers. These steps assume most of the deployment prerequisites are already in place, including an application listener (HTTP or HTTPS), a backend set with health checks configured, and routing through an Internet Gateway so users can reach the service.

Start with a primary listener on the load balancer, configured to handle regular traffic to the application. When everything is operating normally, this listener routes incoming requests to the backend set of VM instances. It listens on standard ports (HTTP/80 or HTTPS/443), and health checks ensure that only healthy VMs receive traffic.

To serve the maintenance page, add a second listener on the load balancer. Unlike the application listener, this one does not forward requests to the application servers. Instead, its backend set points to an OCI API Gateway instance, which is responsible for hosting the static error page. This separation ensures that even if all application servers are down, the load balancer can still present a branded and informative maintenance page through the highly available API Gateway. The creation of the secondary listener and API gateway steps are optional: the maintenance page can be hosted anywhere on the internet.

The handoff between these two listeners is managed through a rule set. The rule set is attached to the application listener and defines the conditions under which traffic should be redirected. Under normal circumstances, the listener sends traffic directly to the application servers. However, when the application servers fail their health checks, the rule set comes into play. It tells the load balancer to redirect traffic to the maintenance listener, which in turn serves the custom page hosted in the API Gateway.

The following steps describe how to create the rule set used for redirecting users to the maintenance page.

  1. In the OCI Console, select Networking, then Load Balancers, and select your load balancer.
  2. Select Rule Sets and then Create Rule Set. Use the following values:
    • Name: (give your rule set a name)
    • URL redirect rules:
      • Condition: select Path , PREFIX_MATCH and set the value to /. This will match all requests that reach the load balancer.
      • Action: Under URL redirect rules, select Redirect
    • Protocol: select https (or http)
    • Host: Enter a URL for your redirection target
    • Path: set to /
    • Response code: 307- temporary redirect

About Alarms

An Alarm acts as a bridge between detection and action.

OCI Monitoring listens to the health metrics of the components of your deployment, including the load balancer — including the status of the backend set of VMs. When an alarm condition you configured in OCI Alarms is met (for example, all monitored VMs unhealthy for more than one minute), it immediately triggers a notification. This notification isn’t just for human administrators. You can route it through OCI Notifications to invoke a custom function deployed with OCI Functions. That function acts to change the load balancer configuration to show the custom error page.

The notification message sent to the function contains dimensions — key-value pairs that describe which resource and which backend set of VMs the metric event belongs to.

In the body of your alarm configuration, you will include the following code:

{{dimensions.resourceId}},{{dimensions.backendSetName}},<name of the ruleset>

This table describes the components of this alarm body:

Element Description Purpose
{{dimensions.resourceId}} The OCID of the load balancer resource that generated the metric event The function uses this OCID to identify which load balancer needs the rule set update
{{dimensions.backendSetName}} The name of the backend set that went unhealthy The function can validate or log which backend set failed; useful for dynamic environments with multiple backend sets
<name of the ruleset> A static value (string) — the name of the rule set to be attached when all backends are unhealthy Let the function know which rule set to apply when triggered

This design allows you to reuse the same function to handle tasks like configuring a load balancer to display the server maintenance page, and routing traffic back to the real application once services are restored. This approach can also be applied to manage all load balancers or applications on load balancers across your OCI deployment.

The OCI Load Balancer service automatically publishes a metric called Unhealthybackendserver in the namespace oci_lbaas. It tracks the number of unhealthy backends in each backend set.

For the purposes of this solution, the important items in this metric are:

  • Description
  • Dimensions
  • Triger rule
  • Message grouping

In this solution, the alarm should trigger when all backend servers (VMs) become unhealthy. That means the unhealthy server count should be greater or equal to the total number of backend servers in the set.

Here is an example Alarm Trigger Rule query:

UnHealthyBackendServers[1m]{lbName = <name of lb>, backendSetName = <name of the backend set>}.max() >= 1

The query translates to the following:

  • If the maximum number of unhealthy backends is greater or equal to a specificed value (in this example, 1)
  • For a defined period of 1 minute.
  • Then the alarm transitions to FIRING state.

However, this dynamic population of values only works when Split Notification is enabled under message grouping. Split notification forces OCI to send one notification per dimension value, instead of grouping everything together. Because of this, the alarm notification that reaches your custom function contains the exact load balancer OCID and the exact backend set name where the failure occurred. As a result, the same function becomes fully reusable across multiple load balancers, backend sets, or environments, without hardcoding load balancer details.

This configuration allows the entire automation chain to work — the alarm publishes dynamic context, the function reads it and performs the correct rule-set attachment on the exact listener which is serving the application to the end user.

Configure Alarms and Notifications

Perform the following step to configure the alarm and notification for this solution.

  1. In the OCI Console, navigate to: Observability & Management, select Monitoring, and select Alarms Status.
  2. Select Create Alarm. In the Alarm Name field, create a name for your alarm.
  3. Enter values for the Metric:
    • Compartment: <Select the one where your Load Balancer exists>
    • Metric Namespace: oci_lbaas
    • Metric Name: <Select UnhealthyBackendServers>
    • Interval: <Frequency of the polling interval>
    • Statistic: Max
    • Metric Dimensions:
      • Dimension name: <select the load balancer name>
      • Dimension value: <select the name of the backend set>
  4. Create a Trigger Rule with the following values:
    • Operator: ≥ (the greater-or-equal-to sign)
    • Value: <Total number of backend servers in the backend set>
    • trigger delay minutes: <time delay before triggering alarm in min>
  5. Set the Set Severity to the desired severity of the alert.
  6. Set the Alarm Body: {{dimensions.resourceId}},{{dimensions.backendSetName}},<ruleset name>
  7. Define the alarm notification with the following values:
    • Destination Service: notification
    • Compartment: select your compartment containing the services
    • Topic: <name of the topic for notification>
    • Message grouping: Split notifications per metric stream
    • Message format: Send formatted messages
After creating your new alarm, enable it in the console.

Create a Function

At the heart of the automation lies a function, which is triggered whenever the alarm detects that all application backends are unhealthy.

The function’s role is simple yet powerful: it dynamically updates the Load Balancer configuration by attaching or detaching the rule set that handles traffic redirection.

The Python code inside the function follows three logical steps:

  • Authentication with OCI: The function begins by establishing a secure session with OCI using the Resource Principal (this is how functions in OCI are allowed to call other OCI services without manually managing keys). This ensures that the code can safely interact with the Load Balancer service. For more information on the authentication, refer to the links in Explore More.
  • API call to modify the load balancer listener: Once authenticated, the code makes a call to the load balancer API.
    • If the backends are failing, the function attaches the redirect rule set to the application listener, redirecting users to the custom error page.
    • If the backends recover, the function detaches the rule set, restoring normal traffic flow to the application servers.
  • Logging and Validation: The code also includes simple logging so administrators can track what action was taken: for example, “Attached Maintenance-Page rule set to listener-1”. This becomes extremely useful during troubleshooting or audits.

Use the following example Python code to create your function in Oracle Functions, modifying it as needed.

Function.py

import io
import json
import os
import oci
from fdk import response
import logging

def handler(ctx, data: io.BytesIO=None):
    message = "start of function"
    logging.getLogger().info("HTTP function start")
    try:
        payload_bytes = data.getvalue()
        if payload_bytes == b'':
            raise KeyError('No keys in payload')
        body1 = json.loads(payload_bytes)
        type1 = body1["type"]
        query = body1["body"]
        load_balancer_ocid = query.split(",")[0]
        maintenance = query.split(",")[2]
        signer = oci.auth.signers.get_resource_principals_signer()
        load_balancer_client = oci.load_balancer.LoadBalancerClient(config={}, signer=signer)
        load_balancer_client_composite_ops = oci.load_balancer.LoadBalancerClientCompositeOperations(load_balancer_client)
        load_balancer_data = json.loads(str(load_balancer_client.get_load_balancer(load_balancer_ocid).data))
        lb_config = load_balancer_data['listeners']
        list1 = json.dumps(lb_config)
        for key,value in json.loads(list1).items():
            if value['default_backend_set_name'] == query.split(",")[1]:
                f_list = key
                rulesets = value['rule_set_names']
                if type1=="OK_TO_FIRING":
                    message = "FIRE"
                    if maintenance in rulesets:
                        message = "Already in Maintenance Mode"
                        logging.getLogger().info("Already in Manintenance mode")
                    else:
                        rulesets.insert(0, maintenance)
                        message = "Entering Maintenance Mode"
                        logging.getLogger().info("Entering Main mode")
                        load_balancer_client_composite_ops.update_listener_and_wait_for_state(
                            oci.load_balancer.models.UpdateListenerDetails(
                                default_backend_set_name=value["default_backend_set_name"],
                                rule_set_names=rulesets,
                                port=value["port"],
                                protocol=value["protocol"],
                                ssl_configuration=value["ssl_configuration"]
                            ),
                            load_balancer_ocid,
                            key,
                            wait_for_states=[oci.load_balancer.models.WorkRequest.LIFECYCLE_STATE_SUCCEEDED]
                        )
                elif type1=="FIRING_TO_OK":
                    message = "OK"
                    if maintenance in rulesets:
                        message = "Entering Operation Mode"
                        logging.getLogger().info("Entering Operation Mode")
                        rulesets.remove(maintenance)
                        load_balancer_client_composite_ops.update_listener_and_wait_for_state(
                            oci.load_balancer.models.UpdateListenerDetails(
                                default_backend_set_name=value["default_backend_set_name"],
                                rule_set_names=rulesets,
                                port=value["port"],
                                protocol=value["protocol"],
                                ssl_configuration=value["ssl_configuration"]
                            ),
                            load_balancer_ocid,
                            key,
                            wait_for_states=[oci.load_balancer.models.WorkRequest.LIFECYCLE_STATE_SUCCEEDED]
                        )   

                    else:
                        message = "Already in operation Mode"
                        logging.getLogger().info("Already in Operation mode")


    except (Exception) as ex:
       message = "Error:" + str(ex)

    return message

Configure OCI API Gateway

In this solution, the OCI API Gateway is configured to directly serve a static web page.

Note:

The use of OCI API Gateway is optional: you could also host your maintenance/error page outside of OCI.

Unlike the typical use of OCI API Gateway where requests are routed to dynamic backends such as functions or compute instances, this approach leverages OCI API Gateway’s ability to host a static response. This static page acts as a friendly maintenance message, informing users that the service is temporarily unavailable due to scheduled maintenance or other issues. The static page is fully managed by OCI API Gateway, removing the need for additional infrastructure like web servers or object storage.

When the system detects that all backend servers are unhealthy, the function triggered by the alarm will respond by configuring the load balancer to redirect the traffic to secondary listener front-ending the OCI API Gateway instance, ensuring a seamless and user-friendly experience without exposing default error pages.

In this example you are only focused on the steps required to configure a static response using OCI API Gateway. For more information, review the resources in Explore More.

  1. In the OCI console, navigate to the gateway, open it, select Deployments, and then select Create Deployment.
  2. Choose Create a new API.
  3. Configure Basic Information:
    • Name: webpage
    • Path Prefix: /
    • Compartment: use the same compartment as the gateway

    Leave the remaining options at their default values.

  4. Configure Authentication.
    You can use the default configuration.
  5. Configure Routes:
    • Path: /{req*} (a wildcard match)
    • Methods: GET
    • Click Edit to add a single backend.
    • Backend Type: Stock response
    • Status code: 200
    • Body: <HTML content of the maintenance page>
    • Header name: content-type
    • Header value: text/html