Deploy GitLab Runners on Oracle Container Engine for Kubernetes with Cluster Autoscaling

Deploy GitLab Runners on Oracle Container Engine for Kubernetes with autoscaling functionality to scale worker nodes automatically based on load for smooth running jobs in the CI/CD pipeline.

Architecture

This architecture shows GitLab Runners deployed in an Oracle Container Engine for Kubernetes cluster on Oracle Cloud Infrastructure.

The following diagram illustrates this reference architecture.

Description of git-lab-runner-kubernetes.png follows
Description of the illustration git-lab-runner-kubernetes.png

git-lab-runner-kubernetes-oracle.zip

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and never traverses the internet.

  • Container Engine for Kubernetes

    Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts.

  • Cloud Guard

    You can use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

  • Security zone

    Security zones ensure Oracle's security best practices from the start by enforcing policies such as encrypting data and preventing public access to networks for an entire compartment. A security zone is associated with a compartment of the same name and includes security zone policies or a "recipe" that applies to the compartment and its sub-compartments. You can't add or move a standard compartment to a security zone compartment.

  • Kubernetes Cluster Autoscaler

    Kubernetes Cluster Autoscaler increases or decreases the size of a node pool automatically based on resource requests, instead of resource utilization of nodes in the node pool.

  • OKE services

    Kubernetes (OKE) service is an abstraction that defines a logical set of pods and a policy by which to access them. The set of pods targeted by a service is usually determined by a selector. The Kubernetes services manage autoscaling.

  • OKE Workers node pool

    A Kubernetes (OKE) Workers node pool is a subset of worker nodes within a cluster that all have the same configuration. Node pools enable you to create pools of machines within a cluster that have different configurations. For example, you might create one pool of nodes in a cluster as virtual machines, and another pool of nodes as bare metal machines. A cluster must have a minimum of one node pool, but a node pool need not contain any worker nodes.

    Worker nodes in a node pool are connected to a worker node subnet in your VCN.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • Network address translation (NAT) gateway

    A NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.

Recommendations

Use the following recommendations as a starting point.Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

    Use regional subnets.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Cloud Guard

    Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public.

    Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations.

    You can also use the Managed List feature to apply certain configurations to detectors.

  • Network security groups (NSGs)

    You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists, because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.

  • Container Engine for Kubernetes

    Although the operator supports any generic Kubernetes cluster, this architecture uses Oracle Container Engine for Kubernetes clusters. These clusters have three worker nodes spread across different availability and fault domains. The cluster shown has worker nodes spread across different physical hosts. You can create up to 1000 nodes in a cluster.

  • Security Zones

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Compute

    Choose shapes with the appropriate OCPUs and memory combination, and provision local NVMe and/or block storage according to need, for nodes of the Kubernetes cluster.

Considerations

Consider the following points when deploying this reference architecture:

  • Performance

    Cluster autoscaling is based on deployment resource booking, you can control job resource booking by editing the gitlab-ci.yaml file.

  • Security

    Use policies that restrict who can access which Oracle Cloud Infrastructure (OCI) resources that your company has and how.

    Oracle Cloud Infrastructure Container Engine for Kubernetes is integrated with Oracle Cloud Infrastructure Identity and Access Management. Oracle Cloud Infrastructure Identity and Access Management provides easy authentication with native OCI identity functionality.

    Use the following variables to control resource booking for a job:

    KUBERNETES_CPU_REQUEST: 1  
    KUBERNETES_MEMORY_REQUEST: 4000M
  • Scalability

    You can scale out your application by updating the number of worker nodes in the Kubernetes cluster, depending on the load. Similarly, you can scale in by reducing the number of worker nodes in the cluster. When you create a service on the Kubernetes cluster, you can create a load balancer to distribute service traffic among the nodes assigned to that service. Cluster autoscaling is based on deployment resource booking, you can control the booking by editing the gitlab-ci.yaml file.

    Note:

    Job resource booking by using parameters in the gitlab-ci.yaml file should not exceed the maximum allowed bookings defined for gitlab runners as part of the following rows in the locals.tf file:
    cpu_request_overwrite_max_allowed = "1"        
    memory_request_overwrite_max_allowed = "4096M"
  • Cost

    Using Oracle Container Engine for Kubernetes is free of cost and using Oracle container registry is free of cost. The Nodes in the Kubernetes cluster are charged at the same rate as any other compute instances with the same shape.

Deploy

The Terraform code for creating an Oracle Container Engine for Kubernetes (OKE) cluster with all dependent resources (networking, worker node-pool), deploy cluster autoscaling and GitLab runners is available in GitHub.

  • Deploy by using Oracle Cloud Infrastructure Resource Manager:
    1. Click Deploy to Oracle Cloud

      If you aren't already signed in, enter the tenancy and user credentials.

    2. Review and accept the terms and conditions.
    3. Select the region where you want to deploy the stack.
    4. Follow the on-screen prompts and instructions to create the stack.
    5. After creating the stack, click Terraform Actions, and select Plan.
    6. Wait for the job to be completed, and review the plan.

      To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

    7. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
  • Deploy using the Terraform code in GitHub:
    1. Go to GitHub.
    2. Clone or download the repository to your local computer.
    3. Follow the instructions in the README document.

Acknowledgments

  • Authors: Chandrashekar Avadhani, Andrei Ilas
  • Contributors: Ben Romine, Lukasz Feldman