5 Installing Oracle Linux Automation Manager in a Clustered Deployment

This chapter discusses how to prepare hosts in an Oracle Linux Automation Manager multi-host deployment. When you prepare the hosts, you must install the Oracle Linux Automation Manager software packages and configure them as part of the Oracle Linux Automation Manager service mesh. Configure and start the Service Mesh nodes before configuring and starting the control plane and execution plane nodes.

Configuring and Starting the Control Plane Service Mesh

You configure each node in the control plane of a cluster by editing the /etc/receptor/receptor.conf file. This file contains the following elements:
  • node ID: The node ID must be the IP address or host name of the host.
  • log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
  • tcp-listener port: This is the port that the node listens for incoming tcp peer connections configured on other nodes. For example, if the node ID represents a control node that listens on port 27199, then all other nodes that want to establish a connection to this control node would have to specify port 27199 in the tcp-peer element they configure in their /etc/receptor/receptor.conf file.
  • control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
  • work-command: This element defines the type of work that can be done on a node. For control plane nodes, the work type is always Local. The command it runs is the Ansible Runner tool which provides an abstraction layer for running Ansible and Ansible playbook tasks and can be configured to send status and event data to other systems. For more information about Ansible Runner, see https://ansible-runner.readthedocs.io/en/stable/.

On each host intended for use as a control plane node, do the following:

  1. Remove any default configuration for Receptor and edit /etc/receptor/receptor.conf to contain the following configuration control plane specific information:

    ---
    - node:
        id: <IP address or host name>
     
    - log-level: info
     
    - tcp-listener:
        port: <port_number>
    
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
     
    - work-command:
        worktype: local
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: false

    In the previous example, IP address or hostname is the IP address or hostname of the node and port_number is the port number that this node is listening on. For example, you can use something like control1-192.0.121.28 where you provide a node name and the IP address of the node. And you could configure the tcp-listener list listen on port 27199. The worktype parameter must be local in control plane nodes.

  2. Start the Oracle Linux Automation Manager mesh service.
    sudo systemctl start receptor-awx
  3. Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.

    Note:

    At this point in the process, the peer relationships between service mesh nodes have not been established yet. Status information only exists for the individual servers running the Service Mesh.

Configuring and Starting the Execution Plane Service Mesh

You configure each node in the execution plane of a cluster by editing the /etc/receptor/receptor.conf file. This file contains the following elements:
  • node ID: The node ID must be the IP address or hostname of the host.
  • log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
  • tcp-listener port: This is the port that the node listens for incoming tcp peer connections configured on other nodes. For example, if the node ID represents an execution node that listens on port 27199, then all other nodes that want to establish a connection to this execution node would have to specify port 27199 in the tcp-peer element they configure in their /etc/receptor/receptor.conf file.
  • tcp-peer port: This element must include the hostname and port number of the host it is connecting with. For example, if this execution node needs to connect to more than one control plane node to provide redundancy, you would need to add tcp-peer elements for each control plane node that the execution node connects with. In the address field, enter the host name or IP address of the control plane node, followed by the port number. The redial element, if enabled, attempts to periodically reestablish a connection to the host if connectivity fails.

    You can also configure tcp-peer elements to include the hostnames and port numbers of other execution nodes or hop nodes based on your service mesh topology requirements.

  • control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
  • work-command: This element defines the type of work that can be done on a node. For execution plane nodes, the work type is always ansible-runner. The command it runs is the Ansible Runner tool which provides an abstraction layer for running Ansible and Ansible playbook tasks and can be configured to send status and event data to other systems. For more information about Ansible Runner, see https://ansible-runner.readthedocs.io/en/stable/.

On each host intended for use as an execution plane node, do the following:

  1. Remove any default configuration for Receptor and edit /etc/receptor/receptor.conf to contain the following configuration execution plane specific information:

    ---
    - node:
        id: <IP address or hostname>
     
    - log-level: debug
     
    - tcp-listener:
        port: <port_number>
     
    - tcp-peer:
        address: <hostname or IP address>:<target_port_number>
        redial: true
    
    - tcp-peer:
        address: <hostname or IP address>:<target_port_number>
        redial: true
     
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
     
    - work-command:
        worktype: ansible-runner
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: false

    In the previous example,

    • IP address or hostname is the IP address or hostname of the node.
    • port_number is the port number that this node is listening on.
    • target_port is the port number of the peer node that you are configuring this node to connect with.
    • hostname or IP address is the hostname or IP address of the execution, control, or hop node being connected with.
    • The worktype parameter must be ansible-runner in execution plane nodes.

    If the execution environment is associated with more than one control, execution, or hop node, enter additional - tcp-peer: nodes for instances that the execution host is associated with.

  2. Start the Oracle Linux Automation Manager mesh service.
    sudo systemctl start receptor-awx
  3. Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.

    Note:

    At this point in the process, the peer relationships between service mesh nodes have not been established yet. Status information only exists for the individual servers running the Service Mesh.

Configuring and Starting the Hop Nodes

You configure each hop node in the cluster by editing the /etc/receptor/receptor.conf file. This file contains the following elements:
  • node ID: The node ID must be the IP address or hostname of the host.
  • log-level: Available options are: Error, Warning, Info and Debug. Log level options provide increasing verbosity, such that Error generates the least information and Debug generates the most.
  • tcp-listener port: This is the port that the node listens for incoming tcp peer connections configured on other nodes. For example, if the node ID represents an execution node that listens on port 27199, then all other nodes that want to establish a connection to this execution node would have to specify port 27199 in the tcp-peer element they configure in their /etc/receptor/receptor.conf file.
  • tcp-peer port: This element must include the hostname and port number of the host it is connecting with. For example, you might configure your hop node to connect to a control node as the intermediate node between the control node and an execution node. In the address field, enter the host name or IP address of the control plane node, followed by the port number. The redial element, if enabled, attempts to periodically reestablish a connection to the host if connectivity fails.
  • control-service: All nodes in a cluster run the control service which reports status and launches and monitors work.
  • work-command: This element defines the type of work that can be done on a node. Hop nodes do not run playbooks. However, you must configure the default fields. The work type for hop nodes is always ansible-runner.

On each host intended for use as a hop node, do the following:

  1. Remove any default configuration for Receptor and edit /etc/receptor/receptor.conf to contain the following configuration with hop node specific information:

    ---
    - node:
        id: <node IP address or hostname>
     
    - log-level: debug
     
    - tcp-listener:
        port: <port_number>
     
    - tcp-peer:
        address: <control hostname or IP address>:<target_port_number>
        redial: true
    
    - tcp-peer:
        address: <control hostname or IP address>:<target_port_number>
        redial: true
     
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
     
    - work-command:
        worktype: local
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: false

    In the previous example,

    • node IP address or hostname is the IP address or hostname of the node.
    • port_number is the port number that this node is listening on.
    • target_port is the port number of the peer node that you are configuring this node to connect with.
    • control hostname or IP address is the hostname or IP address of the control nodes that the hop node is connecting with.

    If the hop node is associated to more than one control node, enter additional - tcp-peer: nodes for each instance that the hop node is associated with.

  2. Start the Oracle Linux Automation Manager mesh service.
    sudo systemctl start receptor-awx
  3. Verify the Service Mesh. For more information, see Viewing Service Mesh Status for a Cluster Node.

    Note:

    At this point in the process, the peer relationships between service mesh nodes have not been established yet. Status information only exists for the individual servers running the Service Mesh.

Configuring the Control, Execution, and Hop Nodes

To configure the control plane, execution plane, and hop nodes, on one control plane host do the following steps which applies to all Oracle Linux Automation Manager instances:

  1. Run the following commands:
    sudo su -l awx -s /bin/bash
    
  2. Do the following:

    • Repeat the following command for each host you want to designate as control node type, changing the IP address or host name each time you run the command:

      awx-manage provision_instance --hostname=<control hostname or IP address> --node_type=control

      In the previous example, control hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. Your choice of host name or IP address must match with the host name or IP addressed used when you configured the /etc/receptor/receptor.conf file node ID (see Configuring and Starting the Control Plane Service Mesh). If hostname is used, the host must be resolvable.

    • Repeat the following command for each host you want to designate as execution node type, changing the IP address or host name each time you run the command:

      awx-manage provision_instance --hostname=<execution hostname or IP address> --node_type=execution

      In the previous example, execution hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. Your choice of host name or IP address must match with the host name or IP addressed used when you configured the /etc/receptor/receptor.conf file node ID (see Configuring and Starting the Execution Plane Service Mesh). If hostname is used, the host must be resolvable.

    • Repeat the following command for each host you want to designate as the hop node type, changing the IP address or host name each time you run the command:

      awx-manage provision_instance --hostname=<hop hostname or IP address> --node_type=hop

      In the previous example, hop hostname or IP address is the hostname or IP address of the system running Oracle Linux Automation Manager. Your choice of host name or IP address must match with the host name or IP addressed used when you configured the /etc/receptor/receptor.conf file node ID (see Configuring and Starting the Hop Nodes). If hostname is used, the host must be resolvable.

  3. Run the following command to register the default execution environments, which are:
    • Control Plane Execution Environment
    • OLAM EE: (Latest)
    awx-manage register_default_execution_environments
  4. Run the following command to create the controlplane instance groups (specified as a queue in the command) and associate it to a control plane host. Repeat the command with the same queue name for each control plane host in your cluster:

    awx-manage register_queue --queuename=controlplane --hostnames=<control hostname or IP address>
  5. Run the following command to create instance groups and associate it to an execution plane host. Repeat the command with the same queue name for each execution plane host in your cluster:

    awx-manage register_queue --queuename=execution --hostnames=<execution hostname or IP address>
  6. Run the awx-manage list_instances command to ensure each host you registered are available under the correct instance group. For example, the following shows the IP addresses of two control plane and three execution plane nodes running under the controlplane and execution instance groups. The nodes are currently not running, and therefore do not show available capacity or heartbeat information.
    awx-manage list_instances
    [controlplane capacity=0]
    	192.0.119.192 capacity=0 node_type=control version=?
    	192.0.124.44 capacity=0 node_type=control version=?
    
    [execution capacity=0]
    	192.0.114.137 capacity=0 node_type=execution version=ansible-runner-???
    	192.0.117.98 capacity=0 node_type=execution version=ansible-runner-???
    	192.0.125.241 capacity=0 node_type=execution version=ansible-runner-???

    Note:

    Hop nodes do not appear in this list because they are not associated to any instance group.
  7. Run the following command to register the Oracle Linux Automation Manager service mesh peer relationship between each node in the cluster:
    awx-manage register_peers <execution or hop hostname or IP address> --peers <execution, hop, or control hostname or IP address>
                

    This command must be run for each pair of nodes to requiring a peer relationship. For example, the peer relationships being established in the example described in Service Mesh Topology Examples shows the command being run twice for each execution node so that each execution node is connected to a different control node. This ensures that each execution node always has a backup control node if one of the control nodes were to fail.

    Additional topologies are possible, such as those where an isolated execution node must peer to a hop node, and the hop node must peer to a control node. In this case the command must be run one time to peer the execution node with the hop node, and again to peer the hop node with the control node.

  8. Exit the awx shell environment.
    exit
  9. For each control and execution plane host, edit the /etc/tower/settings.py file with the following:
    DEFAULT_EXECUTION_QUEUE_NAME = 'execution'
    DEFAULT_CONTROL_PLANE_QUEUE_NAME = 'controlplane'

Starting the Control, Execution, and Hop Nodes

To start the control, execution, and hop nodes, do the following:

  1. Start the service on each node:

    sudo systemctl enable --now ol-automation-manager.service
  2. On one control plane node, run the following command to preload data, such as:
    • Demo Project
    • Default Galaxy Credentials
    • Demo Organization
    • Demo Inventory
    • Demo Job template
    • And so on
    sudo su -l awx -s /bin/bash
    awx-manage create_preload_data 

    Note:

    This command only needs to be run one time because the preloaded data persists in the database that all cluster nodes connect with.
  3. Run the awx-manage list_instances command to ensure that the control and execution plane nodes are now running and show available capacity and display heartbeat information. For example, the following shows all control and execution plane instances running, with available capacity, and active heartbeat information.
    awx-manage list_instances
    [controlplane capacity=270]
    	192.0.119.192 capacity=135 node_type=control version=19.5.1 heartbeat="2022-09-22 14:38:29"
    	192.0.124.44 capacity=135 node_type=control version=19.5.1 heartbeat="2022-09-22 14:39:09"
    
    [execution capacity=405]
    	192.0.114.137 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:07"
    	192.0.117.98 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:35"
    	192.0.125.241 capacity=135 node_type=execution version=19.5.1 heartbeat="2022-09-22 14:40:55"

    Note:

    Hop nodes do not appear in this list because they are not associated to any instance group.
  4. Exit the awx shell environment.
    exit

Configuring TLS Verification and Signed Work Requests

Oracle recommends that you secure your Service Mesh communication within your cluster with TLS verification and signed work requests sent between cluster nodes. TLS verification ensures secure communication in the Service Mesh network and signed work requests ensure secure job execution.

To configure TLS verification and signed work requests, do the following:
  1. On each host in the cluster (each execution, hop, and control plane nodes), to enable signed work requests, add the following text to the /etc/tower/settings.py file.
    RECEPTOR_NO_SIG = False
  2. From one of your control nodes, in the /etc/tower folder, do the following:
    • If you are using IP addresses for the node_id field, run the following commands to create the certs folder and generate TLS certificates:
      sudo mkdir -p certs
      sudo receptor --cert-init commonname="test CA" bits=2048 outcert=certs/ca.crt outkey=certs/ca.key
      node=<node_id>; sudo receptor --cert-makereq bits=2048 commonname="$node test cert"  ipaddress=$node nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key
      node=<node_id>; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt

      In the previous example, node_id is the IP address of the node you are creating keys for that you set in the /etc/receptor/receptor.conf file for the execution, hop, or control plane nodes.

    • If you are using a host name for the node_id field, run the following commands to create the certs folder and generate TLS certificates:
      sudo mkdir -p certs
      sudo receptor --cert-init commonname="test CA" bits=2048 outcert=certs/ca.crt outkey=certs/ca.key
      node=<node_id>; sudo receptor --cert-makereq bits=2048 commonname="$node test cert"  dnsname=$node nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key
      node=<node_id>; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt

      In the previous example, node_id is the host name of the node you are creating the keys for that you set in the /etc/receptor/receptor.conf file for the execution, hop, or control plane nodes.

  3. After the second command, type yes to confirm that you want to sign the certificate.
    For example, the following generates certificates for a cluster with two hosts:
    node=192.0.250.40; sudo receptor --cert-makereq bits=2048 commonname="$node test cert"  ipaddress=192.0.250.40 nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key
    node=192.0.250.40; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt
    Requested certificate:
      Subject: CN=192.0.250.40 test cert
      Encryption Algorithm: RSA (2048 bits)
      Signature Algorithm: SHA256-RSA
      Names:
        IP Address: 192.0.250.40
        Receptor Node ID: 192.0.250.40
    Sign certificate (yes/no)? yes
                                                
    node=192.0.251.206; sudo receptor --cert-makereq bits=2048 commonname="$node test cert" ipaddress=192.0.251.206 nodeid=$node outreq=certs/$node.csr outkey=certs/$node.key
    node=192.0.251.206; sudo receptor --cert-signreq req=certs/$node.csr cacert=certs/ca.crt cakey=certs/ca.key outcert=certs/$node.crt
    Requested certificate:
      Subject: CN=192.0.251.206 test cert
      Encryption Algorithm: RSA (2048 bits)
      Signature Algorithm: SHA256-RSA
      Names:
        IP Address: 192.0.251.206
        Receptor Node ID: 192.0.251.206
    Sign certificate (yes/no)? yes
                                            
  4. From the /etc/tower/certs folder, run the following commands to generate certificates for work request signing and verification:
    sudo openssl genrsa -out signworkprivate.pem 2048
    sudo openssl rsa -in signworkprivate.pem -pubout -out signworkpublic.pem
  5. From the cd /etc/tower/ folder, run the following command to change the certs folder ownership and all files within the folder:
    sudo chown -R awx:awx certs
  6. Check that you have all the files you need in the /etc/tower/certs folder. For example, the following shows the generated key information for a four node cluster.
    ls -al
    total 68
    drwxr-xr-x. 2 awx awx 4096 Sep 12 18:26 .
    drwxr-xr-x. 4 awx awx  132 Sep 12 16:49 ..
    -rw-------. 1 awx awx 1180 Sep 12 18:19 192.0.113.178.crt
    -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.113.178.csr
    -rw-------. 1 awx awx 1679 Sep 12 18:19 192.0.113.178.key
    -rw-------. 1 awx awx 1176 Sep 12 18:20 192.0.121.28.crt
    -rw-------. 1 awx awx 1001 Sep 12 18:20 192.0.121.28.csr
    -rw-------. 1 awx awx 1675 Sep 12 18:20 192.0.121.28.key
    -rw-------. 1 awx awx 1180 Sep 12 18:20 192.0.126.172.crt
    -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.126.172.csr
    -rw-------. 1 awx awx 1679 Sep 12 18:19 192.0.126.172.key
    -rw-------. 1 awx awx 1176 Sep 12 18:19 192.0.127.70.crt
    -rw-------. 1 awx awx 1001 Sep 12 18:19 192.0.127.70.csr
    -rw-------. 1 awx awx 1675 Sep 12 18:19 192.0.127.70.key
    -rw-------. 1 awx awx 1107 Sep 12 16:54 ca.crt
    -rw-------. 1 awx awx 1679 Sep 12 16:54 ca.key
    -rw-------. 1 awx awx 1675 Sep 12 18:26 signworkprivate.pem
    -rw-r--r--. 1 awx awx  451 Sep 12 18:26 signworkpublic.pem
  7. On each node in the cluster, in the /etc/tower folder, create a certs folder and change the ownership and group of the certs folder to awx:awx:
    sudo mkdir -p certs
    sudo chown -R awx:awx certs
  8. Copy over the ca.crt, node specific .crt, csr, and key files, and the signworkprivate.pem, and signworkpublic.pem files to each node in the cluster.
  9. For each control plane node, add the following lines to the /etc/receptor/receptor.conf file:
    
    ---
    - node:
        id: <IP address or host name>
     
    - log-level: debug
    
    # Add the tls: control that specifies the tls-server name for the listener
    - tcp-listener:
        port: 27199
        tls: controller
    
    # Add the TLS server configuration
    - tls-server:
        name: controller
        cert: /etc/tower/certs/<IP address or host name>.crt
        key: /etc/tower/certs/<IP address or host name.key
        requireclientcert: true
        clientcas: /etc/tower/certs/ca.crt
     
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
    
    # Add the work-signing and work-verification elements
    - work-signing:
        privatekey: /etc/tower/certs/signworkprivate.pem
        tokenexpiration: 30m
     
    - work-verification:
        publickey: /etc/tower/certs/signworkpublic.pem 
    
    # Set verifysignature to true.
    - work-command:
        worktype: local
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: true
                   

    In the previous example, IP address or host name is the host name or IP address of the control plane host. If host name is used, the host must be resolvable.

  10. For each execution plane node, add the following lines to the /etc/receptor/receptor.conf file:
    ---
    - node:
        id: <execution IP address or host name>
     
    - log-level: debug
     
    - tcp-listener:
        port: 27199
    
    # Add tls: client that specifies the tls-client name.
    - tcp-peer:
        address: <hostname or IP address>:27199
        redial: true
        tls: client
    
    - tcp-peer:
        address: <hostname or IP address>:27199
        redial: true
        tls: client
    
    # Add the tls-client element.
    - tls-client:
        name: client
        rootcas: /etc/tower/certs/ca.crt
        insecureskipverify: false
        cert: /etc/tower/certs/<execution IP address or host name>.crt
        key: /etc/tower/certs/<execution IP address or host name.key
     
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
    
    # Add the work-verification element.
    - work-verification:
        publickey: /etc/tower/certs/signworkpublic.pem
    
    # Set verifysignature to true.
    - work-command:
        worktype: ansible-runner
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: true
                   

    In the previous example,

    • execution IP address or host name is the IP address or host name of the node
    • hostname or IP address is the host name or IP address of the execution, control, or hop node you are peering with.
  11. (If required) For each hop node, add the following lines to the /etc/receptor/receptor.conf file:
    
    ---
    - node:
        id: <node IP address or hostname>
     
    - log-level: debug
    
    # Add the tls: control that specifies the tls-server name for the listener
    - tcp-listener:
        port: 27199
        tls: controller
    
    # Add tls: client that specifies the tls-client name.
    - tcp-peer:
        address: <control hostname or IP address>:27199
        redial: true
        tls: client
    
    # Add the tls-client element.
    - tls-client:
        name: client
        rootcas: /etc/tower/certs/ca.crt
        insecureskipverify: false
        cert: /etc/tower/certs/<node IP address or hostname>.crt
        key: /etc/tower/certs/<node IP address or hostname>.key
    
    
                      - work-verification:
        publickey: /etc/tower/certs/signworkpublic.pem 
    
    # Add the work-signing and work-verification elements
    - work-signing:
        privatekey: /etc/tower/certs/signworkprivate.pem
        tokenexpiration: 30m
    
    # Add the TLS server configuration
    - tls-server:
        name: controller
        cert: /etc/tower/certs/<node IP address or hostname>.crt
        key: /etc/tower/certs/<node IP address or hostname>.key
        requireclientcert: true
        clientcas: /etc/tower/certs/ca.crt
     
    - control-service:
        service: control
        filename: /var/run/receptor/receptor.sock
    
    
    # Set verifysignature to true.
    - work-command:
        worktype: local
        command: /var/lib/ol-automation-manager/venv/awx/bin/ansible-runner
        params: worker
        allowruntimeparams: true
        verifysignature: true
                   

    In the previous example,

    • node IP address or host name is the IP address or host name of the node
    • control hostname or IP address is the host name or IP address of the control plane host you are peering with.
  12. On each node, restart the Service Mesh and Oracle Linux Automation Manager.
    sudo systemctl restart receptor-awx
    sudo systemctl restart ol-automation-manager
  13. Verify the Service Mesh. See Viewing the Service Mesh for more information.