Microservice Cluster Setup

Before you can deploy microservices, you must set up Kubernetes clusters. To minimize the complexities of creating and maintaining Kubernetes clusters, Unified Assurance provides the clusterctl application, which operates as a frontend to the Rancher Kubernetes Engine (RKE2) command line tool and provides the opinionated setup configuration.

Before setting up a cluster:

Review the architecture and components described in Understanding Microservices in Unified Assurance Concepts.
Ensure your system meets the Linux prerequisites and that you have opened the ports in your firewall to enable communication between servers as described in Opening Ports in Unified Assurance Installation Guide.

Setting up a microservice cluster involves the following steps:

Setting Environment Variables
Installing Roles on Servers
Updating the Default Kubernetes IP Range
Creating SSH Keys
Confirming the SELinux Configuration
(Optional) Setting Up Cluster FQDN
Creating Clusters
Updating the Helm Repository
(Optional) Installing MetalLB for Load Balancing
Installing Helm Packages to Deploy Microservices
(Optional) Replacing the Default Kubernetes Ingress SSL Certificate

You can optionally customize the cluster configuration file as described in Customizing the Cluster Configuration File.

See Troubleshooting for tips on using Kubernetes commands to get information about your Kubernetes pods and microservices.

Setting Environment Variables

Most of the commands for setting up clusters must be run as the root user. Before running the commands, as the root user, set environment variables by running the following command:

source $A1BASEDIR/.bashrc

Installing Roles on Servers

You must install the Cluster.Master and Cluster.Worker roles on one or more servers. For single-server development systems, install both roles. For production systems, each data center or availability zone should have at least three servers with the Cluster.Master role. These servers can also have the Cluster.Worker role, depending on the resources available.

Run the following commands as the root user to install the roles:

Install both the Cluster.Master and Cluster.Worker roles by using the Cluster meta-role:
```
$A1BASEDIR/bin/Package install-role Cluster
```

Install only the Cluster.Master role:

$A1BASEDIR/bin/Package install-role Cluster.Master

Install only the Cluster.Worker role:
```
$A1BASEDIR/bin/Package install-role Cluster.Worker
```
Note:

You can install roles on new servers when you run SetupWizard by specifying the roles in the --Roles option.

Updating the Default Kubernetes IP Range

Unified Assurance uses a runtime container for Kubernetes, which uses a Container Network Interface (CNI) plugin to assign IP address to pods from ranges configured for the cluster.

The default IP addresses set in the RKE2 cluster configuration file are:

cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
cluster-dns: 10.43.0.10

If any of these IPs are already in use, you will encounter networking errors when you create the microservice cluster.

To change the default settings:

Open the config-tmpl.yaml file, available in the following location:
```
$A1BASEDIR/etc/rke2/config-tmpl.yaml
```
By default, this file contains only the parameters relevant to Unified Assurance. All other parameters, including the CIDR and DNS ones, use the default values.
Add any of the following parameters that you need to update:
```
service-cidr: <new_IP_range>
cluster-cidr: <new_IP_range>
cluster-dns: <new_IP>
```
Note:

cluster-dns must be an IPv4 address within the range set in service-cidr.
Save and close the file.

These ranges will apply to any new clusters you create.

Creating SSH Keys

You enable access between servers in the same Unified Assurance instance by creating SSH keys.

On each server other than the primary presentation server, run the following command as the assure1 user:

$A1BASEDIR/bin/CreateSSLCertificate --Type SSH

Confirming the SELinux Configuration

When SELinux is enabled, the $A1BASEDIR/var/rke2 directory is required with context type container_var_lib_t.

Confirm that it exists with the correct context by running the following command:

ls -ldZ var/rke2

The output should include container_var_lib_t, similar to the following:

drwxr-xr-x. 4 root root unconfined_u:object_r:container_var_lib_t:s0 <date_and_time> var/rke2

If the directory is missing or the context type does not match, recreate it and reset the configuration by running the following commands as the root user:

mkdir $A1BASEDIR/var/rke2
restorecon -R -v $A1BASEDIR/var/rke2

Setting Up Cluster FQDN

In a multi-node Kubernetes cluster, services are often accessed using the fully qualified domain name (FQDN) of individual hosts (host FQDN). However, if the node associated with a particular host FQDN becomes unavailable, access to services through that FQDN is disrupted.

To ensure seamless and reliable access to cluster services, even in the event of node failures, Unified Assurance uses a cluster-level FQDN (cluster FQDN) that dynamically routes traffic to healthy nodes within the cluster. The cluster FQDN is associated with a virtual IP (VIP) that is initially attached to one of the cluster nodes. Should that node become unavailable, the system automatically reassigns the VIP to another healthy node within the cluster using the MetalLB load balancer.

You can specify the cluster FQDN in the --clusterfqdn flag when you create a cluster. If you do not explicitly set it, the host FQDN is used.

If you have set up a cluster FQDN with a load balancer, make sure to use the cluster FQDN where you would normally use host FQDN.

Creating Clusters

You use the clusterctl command line application to manage clusters. The application determines the servers in each cluster by the cluster master and worker roles, and whether those servers have been associated with an existing cluster.

To create a cluster, run the following command as the root user:

$A1BASEDIR/bin/cluster/clusterctl create <cluster_name> --clusterfqdn fqdn.example.com --enable-load-balancer

In the command:

The --clusterfqdn flag and --enable-load-balancer flag are optional. Include them to explicitly set a cluster-level FQDN. If you do not explicitly set --clusterfqnd, the host FQDN is used.

Note:

The --enable-load-balancer option is not supported when the cluster FQDN is the same as one of the hosts (including if you do not explicitly set --clusterfqdn). Even if you include --enable-load-balancer in this situation, the system will not set it up correctly.

See Installing MetalLB for Load Balancing for more information on setting up the load balancer when using cluster FQDN.
<cluster_name> is the name of the cluster. Use a name relevant to the servers being added to the cluster. For example:
- PrimaryPresentationCluster
- RedundantPresentationCluster
- PrimaryCollectionCluster
- RedundantCollectionCluster

The default namespaces are automatically created and added to the cluster. Unless you specify otherwise, this includes the a1-zone1-pri device zone and namespace. Optionally:

To create a cluster for an additional device zone, add the --zone X option, where X is a unique number for the new zone. The clusterctl command automatically creates the relevant zone and namespace.

For example, if you create a cluster for zone 2 with the following command, a zone and namespace named a1-zone2-pri will automatically be created along with the cluster:
```
$A1BASEDIR/bin/cluster/clusterctl create PrimaryZone2Cluster --zone 2
```
Note:

If you are using a cluster FQDN and load balancer, include the --clusterfqdn and --enable-load-balancer flags.
To create a cluster for a component where zones are not needed, such as Vision, add the --no-zone option. All the default namespaces are created except the zone namespace. See Namespaces in Unified Assurance Concepts for information about the default namespaces.

For a non-redundant cluster, if you are using the Historical database (OpenSearch) for Event Analytics, logs, Flow Analytics, or metrics, you must run the following command after creating the cluster:

$A1BASEDIR/bin/cluster/clusterctl update-config

This command updates the Historical database hosts in the a1-config configmap for all associated namespaces, using the Historical database hosts from the assure1.yaml file. This is required for microservices such as Metric Sink to communicate with the Historical database.

For redundant clusters, you will run this after creating the redundant cluster.

Creating Redundant Clusters

By default, clusterctl adds all available servers with roles to a single cluster. You can instead create multiple redundant clusters by specifying the hosts to add to each cluster in the create commands.

To create redundant clusters, as the root user:

Create the primary cluster by running the following command, replacing the example hosts with your hosts:
```
$A1BASEDIR/bin/cluster/clusterctl create <primary_cluster_name> --host cluster-pri1.example.com --host cluster-pri2.example.com --host cluster-pri3.example.com
```
Note:

If you are using a cluster FQDN and load balancer, include the --clusterfqdn and --enable-load-balancer flags.
Create the redundant cluster by running the following command, replacing the example hosts with your hosts:
```
$A1BASEDIR/bin/cluster/clusterctl create <secondary_cluster_name> --host cluster-sec1.example.com --host cluster-sec2.example.com --host cluster-sec3.example.com --secondary
```
While creating the cluster, the a1-zone1-sec device zone and namespace are automatically created and added to the secondary cluster.
Combine the clusters into a redundant pair by running one of the following commands:
- On a server in the primary or secondary cluster:
```
$A1BASEDIR/bin/cluster/clusterctl join --primaryCluster <PrimaryHostFQDN> --secondaryCluster <SecondaryHostFQDN>
```
- On a server outside the cluster, add the --repo flag to specify the primary presentation server's FQDN:
```
$A1BASEDIR/bin/cluster/clusterctl join --primaryCluster <PrimaryHostFQDN> --secondaryCluster <SecondaryHostFQDN> --repo <PrimaryPresentationWebFQDN>
```
Tip:

Add the --debug option to the commands to show additional information about the cluster joining process.
If you are using the Historical database (OpenSearch) for Event Analytics, logs, Flow Analytics, or metrics, ensure that the a1-config configmap is updated with the appropriate Historical database hosts by running the following command:
```
$A1BASEDIR/bin/cluster/clusterctl update-config
```

Detaching Redundant Clusters

To remove a redundant pairing relationship between the cluster containing the current server and its redundant pair, run the following command as the root user from one of the servers in the cluster:

$A1BASEDIR/bin/cluster/clusterctl detach

The command automatically identifies which cluster pair to detach based on the cluster association of the server host where the command was run.

Managing Clusters

You can manage existing clusters by using the clusterctl command with the add and drop options and specifying arguments for hosts, namespaces, or device zones.

Both options also support the following flags:

debug: Sets the log level to DEBUG. By default, it is INFO.
dry-run: Prints the actions that the command performs without actually performing them.

You run this command as the root user.

The syntax for the clusterctl command for adding or dropping a host, namespace, or device zone is:

clusterctl [add | drop] [host --host <hostname> | zone --zone <zoneID> | namespace --namespace <namespace>] [--debug | --dry-run]

You can only perform one action (add or drop) on one type (host, zone, or namespace) at a time, but you can include multiple arguments to add or drop multiple items of the same type in one command. For example, to add multiple hosts:

clusterctl add host --host host1.example.com --host host2.example.com --host host3.example.com

Adding and Dropping Zones

By default, clusterctl adds and removes zones from the primary cluster. If you have redundant clusters, to add or drop a zone on a secondary cluster, include the --secondary flag.

When you add a zone, a corresponding namespace is created in the form a1-zone<zonedId>-pri or a1-zone<zonedId>-sec.

For example, to add zone 6 to the secondary cluster and create the a1-zone6-sec namespace:

clusterctl add zone --zone 6 --secondary

Dropping Hosts

When dropping hosts, the syntax for the clusterctl drop command is the same, but you must run additional commands to cordon off, drain, and delete the associated node.

a1k cordon <node>
a1k drain <node> --ignore-daemonsets --delete-emptydir-data
clusterctl drop host --host <host_FQDN>
a1k delete node <node>

where <node> is the node where the host was deployed. You can find the nodes in your cluster by running the a1k get nodes command.

Updating the Helm Repository

On at least one primary server in the cluster, update the Helm repository by running the following commands as the assure1 user:

export WEBFQDN=<Primary Presentation Web FQDN> 
a1helm repo update

Installing MetalLB for Load Balancing

MetalLB consists of the following components:

A Controller that assigns the VIP to the Kubernetes service (typically the ingress controller).
A Speaker that advertises the VIP on the local network of the cluster, mapping it to the appropriate node. When a node hosting the VIP goes down, the Speaker automatically reassigns the VIP to another node, enabling continuous access to services through the cluster FQDN.

Unified Assurance delivers MetalLB as a microservice.

After creating a cluster with a cluster-level FQDN, install MetalLB components for load balancing:

Run the following commands:

export WEBFQDN=<PresFQDN>
a1helm repo update
a1helm install metallb-controller assure1/metallb-controller -n metallb-system --set global.imageRegistry=$WEBFQDN
a1helm install metallb-speaker assure1/metallb-speaker -n metallb-system --set global.imageRegistry=$WEBFQDN

Create a file called l2-pool.yaml with the following content:

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: first-l2
namespace: metallb-system
spec:
ipAddressPools:
- first-pool
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- <VIP>
avoidBuggyIPs: true
serviceAllocation:
namespaces:
- kube-system

Replace <VIP> with the virtual IP.

Apply the file to your Kubernetes cluster by running the following command as the assure1 user:
```
a1k apply -n metallb-system -f l2-pool.yaml
```

Installing Helm Packages to Deploy Microservices

After setting up the cluster and updating the Helm repository, you are ready to deploy your microservices by installing Helm packages. You deploy multiple microservices in your cluster to accomplish a common goal as part of a microservice pipeline. The Prometheus and KEDA microservices are automatically deployed to the a1-monitoring namespace when you create the cluster.

You can install Helm packages to deploy microservices by using the command line or the Unified Assurance user interface.

Helm packages are installed as releases, which can have unique names. Oracle recommends following the default convention of giving the release the same name as the Helm chart. When installing each Helm chart, you define the location of the image registry and the namespace to install in. You can use default configurations, or set additional configurations during installation, depending on the options provided for each chart.

If you are using redundant clusters, you can configure some microservices for redundancy by deploying pairs of microservices on each cluster.

See the following topics for more information:

Managing Microservices for general information about deploying, updating, and undeploying microservices, and configuring microservice redundancy.
The documentation for each microservice for specific configuration requirements and options.
Understanding Microservice Pipelines in Unified Assurance Concepts for general information about Unified Assurance microservice pipelines.
Understanding the Event Pipeline in Unified Assurance Concepts for a description of a pipeline that includes multiple microservices

Replacing the Default Kubernetes Ingress SSL Certificate

By default, the Kubernetes ingress controller uses the Unified Assurance SSL certificate bundle. If you want to use a custom certificate for the ingress controller, you can optionally replace the default one.

To do this:

Obtain or generate your certificate key pair in a PEM encoded form.
Use your custom PEM encoded certificate to generate a Kubernetes secret by running the following command:
```
a1k -n ingress-nginx create secret tls ingress-custom-cert --cert=<custom_cert>.cert --key=<custom_key>.key -o yaml
```
where <custom_cert> and <custom_key> are the names for your custom PEM encoded certificate and key.

In $A1BASEDIR/etc/rke2/config-tmpl.yml, in the ingress section, under extra_args, add the following line:

default-ssl-certificate: "ingress-nginx/ingress-custom-cert"

The ingress section should now look like this:

ingress:
  provider: "nginx"
  extra_args:
    default-ssl-certificate: "ingress-nginx/ingress-custom-cert"

From one of the servers in the cluster, run the following command as the root user:
```
clusterctl upgrade
```
Optionally, verify the new certificate by running following cURL command:
```
curl -k -v https://<server_address>:9443
```

Customizing the Cluster Configuration File

You can optionally customize the configuration file that is used when creating clusters. You can do this before or after creating the cluster.

Making Customizations Before Creating a Cluster

Before creating a new cluster:

Update the $A1BASEDIR/etc/rke2/config-tmpl.yml template file.

For example, if you want to change the maximum file size for the Vision ingress controller, locate the proxy-body-size setting under ingress, and update the value.
Create the cluster as described in Creating Clusters.

The clusterctl create command uses the customized file to create the cluster.

Making Customizations to an Existing Cluster

To make customizations in a cluster that already exists:

Update the $A1BASEDIR/etc/rke2/config-tmpl.yml file as needed.
From one of the servers in the cluster, run the following command as the root user:
```
$A1BASEDIR/bin/cluster/clusterctl upgrade
```

Testing Cluster Failover

Before testing cluster failover, ensure that you have attached the clusters and correctly deployed microservices with redundancy enabled wherever possible. You can configure redundancy for most pollers and collectors, and for other microservices, you deploy independent instances on the primary and redundant cluster. See Configuring Microservice Redundancy for more information.

Note:

This topic covers microservice cluster failover only, not database or complete system failover. See Scalability and Redundancy in Unified Assurance Concepts for general information about redundancy and failover.

To test failover:

Do one of the following, depending on your desired level of testing:
- For an individual microservice, undeploy the instance on the primary cluster. See Undeploying a Microservice.
  
  The secondary instance should seamlessly take over.
- For all microservices, do one of the following:
  - Shut down all nodes in the primary microservice cluster, starting with the worker nodes and ending with the master node. The secondary nodes should seamlessly take over.
  - Simulate a complete primary cluster outage by temporarily blocking port 9092 in the firewall between the two clusters. The secondary cluster sends periodic availability checks through this port to the primary cluster. If the secondary cluster cannot detect the primary, it assumes the primary is down, and takes over.
  Caution:
  
  Do this in test environments only. This simulates an outage so that the secondary cluster becomes active, but in reality, the primary and secondary are both online. This results in double polling; both will be polling the same devices and writing the same data to the primary database.
Monitor the microservice logs and optionally send test events to confirm that the secondary microservices took over and end-to-end functionality is maintained. You can find microservice logs in the Workloads UI.
Bring the primary microservices back up according to the level of testing you performed by redeploying the primary microservice, starting up the primary nodes (starting with the master node ending with workers), or reopening the port in the firewall.

The primary should seamlessly take over from the secondary, which goes back to standby.
Monitor the microservice logs and optionally send test events to confirm that the primary microservices took over and end-to-end functionality is maintained.

Troubleshooting

Helm deployments and the associated Kubernetes pods, services, and other components can fail to initialize or crash unexpectedly.

To help troubleshoot these issues, you can run the following commands as the assure1 user:

To see information about all running pods, including pod names and namespaces that you can use in other commands:
```
a1k get pods --all-namespaces
```
To describe a pod to get events if it fails to start:
```
a1k describe pod <pod_name> -n <namespace>
```
where <pod_name> is the name of the pod you want to describe and <namespace> is the namespace the pod is running in. You can get these by running the a1k get pods command.
To get and tail logs of a running pod:
```
a1k logs <pod_name> -n <namespace> -f
```
To list all running microservices across all namespaces:
```
a1helm list --all-namespaces
```
To uninstall a microservice:
```
a1helm uninstall <microservice_release_name> -n <namespace>
```
where <microservice_release_name> is the release name for the microservice you are uninstalling. You can get the exact name by running the a1helm list command.

Title and Copyright Information

Implementation Guide

G49441-01