6 Adding or Removing Nodes to an Existing Cluster

This chapter provides instructions for adding or removing nodes to and from an existing cluster.

Adding a New Control Plane Node to a Cluster

To add new control node to a cluster, do the following:
  1. Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
  2. Configure the host, following the instructions in Setting up Hosts. Do not run the awx-manage migrate or awx-manage createsuperuser. These only need to be run when initially creating the cluster.
  3. Set up the service mesh for the control plane node, by following the instructions in Configuring and Starting the Control Plane Service Mesh.
  4. Set up the service mesh for the execution plane nodes you want to connect to your new control plane node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
  5. Set up the hop nodes you want to connect to your new control plane node, by following the instructions in Configuring and Starting the Hop Nodes.
  6. Provision the node as the control node type, register the node to an appropriate instance group (called a queuename in the command), and establish the peer relationships between the execution, hop, and the control nodes as described in Configuring the Control, Execution, and Hop Nodes.
  7. Start the control plane node as described in Starting the Control, Execution, and Hop Nodes. Do not run the command to create preloaded data.
  8. If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.

Adding a New Execution Plane Node to a Cluster

To add a new execution node to a cluster, do the following:
  1. Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
  2. Configure the host, following the instructions in Setting up Hosts. Do not run the awx-manage migrate or awx-manage createsuperuser. These only need to be run when initially creating the cluster.
  3. Set up the service mesh for the execution plane node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
  4. Provision the node as the execution node type, register the node to an appropriate instance group (called a queuename in the command), and establish the peer relationships between the execution node and the control plane nodes or between the execution node and the hop nodes as described in Configuring the Control, Execution, and Hop Nodes.
  5. Start the execution plane node as described in Starting the Control, Execution, and Hop Nodes. Do not run the command to create preloaded data.
  6. If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.

Adding a New Hop Node to a Cluster

To add new hop node to a cluster, do the following:
  1. Prepare the new hosts, as described in Setting Up the Network and Enabling Access to the Oracle Linux Automation Manager Packages.
  2. Configure the host, following the instructions in Setting up Hosts. Do not run the awx-manage migrate or awx-manage createsuperuser. These only need to be run when initially creating the cluster.
  3. Set up the hop nodes you want to connect to your control plane nodes, by following the instructions in Configuring and Starting the Hop Nodes.
  4. Set up the execution nodes you want to connect to your new hop node, by following the instructions in Configuring and Starting the Execution Plane Service Mesh.
  5. Provision the node as the hop node type, and for any new execution nodes, register the execution node to the execution instance group (called a queuename in the command), and establish the peer relationships between the execution, hop, and the control nodes as described in Configuring the Control, Execution, and Hop Nodes.
  6. Start the hop node and execution nodes as described in Starting the Control, Execution, and Hop Nodes. Do not run the command to create preloaded data.
  7. If required, apply TLS verification and signed work requests as described in Configuring TLS Verification and Signed Work Requests.

Removing a Node from a Cluster

To remove a node from a cluster, do the following:
  1. Log on the node you want to remove.
  2. Stop Oracle Linux Automation Manager on the node.
    sudo systemctl stop ol-automation-manager.service
  3. Stop the service mesh.
    sudo systemctl stop receptor-awx
  4. Delete the /etc/tower/SECRET_KEY file.
  5. Open the /etc/tower/settings.py file and remove the database password from DATABASES node or remove any configuration that provides a password for your database, if you are using alternative approaches.
  6. From any control plane node, verify that the node you want to remove no longer shows capacity or heartbeat information. For example, the following shows the node with IP address 192.0.124.44 has zero capacity and no heartbeat information.
    sudo su -l awx -s /bin/bash
    awx-manage list_instances
    [controlplane capacity=126]
            192.0.119.192 capacity=126 node_type=control version=19.5.1 heartbeat="2022-10-20 06:55:44"
            192.0.124.44 capacity=0 node_type=control version=19.5.1
    
    [execution capacity=126]
            192.0.114.137 capacity=126 node_type=execution version=19.5.1 heartbeat="2022-10-20 06:56:20"
  7. Deprovision the instance from the cluster.
    awx-manage deprovision_instance --hostname=<IP address or host name>
                   

    In the previous example, <IP address or host name> is the host you want to remove from the cluster.

  8. Check the status of the remaining control and execution plane nodes to verify that the deprovisioned instance no longer appears. For example, the deprovisioned node with IP address 192.0.124.44 from the previous example no longer appears:
    awx-manage list_instances
    [controlplane capacity=126]
            192.0.119.192 capacity=126 node_type=control version=19.5.1 heartbeat="2022-10-20 06:55:44"
    
    [execution capacity=126]
            192.0.114.137 capacity=126 node_type=execution version=19.5.1 heartbeat="2022-10-20 06:56:20"
  9. Exit the awx shell environment.
    exit
  10. If required, remove any tcp-peer nodes pointing to the deprovisioning node in the /etc/receptor/receptor.conf files of the remaining cluster nodes, the restart the nodes.
    sudo systemctl restart receptor-awx