Note:

This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.

Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes

Introduction

As eBPF based tools get more popular for cloud native applications, this tutorial describes how to run Tetragon, an eBPF based tool for security observability and enforcement on Oracle Cloud Infrastructure (OCI).

What is eBPF and why is it popular?

An operating system kernel is usually the best place to observe and influence a system as there are very few barriers between the tool performing a function and the activities it is trying to observe and influence. Executing within the kernel space often lets a program have very low overhead but at the cost of security and maintenance.

eBPF provides a method of introducing new functionality that can execute in a sand-boxed and privileged context without changing kernel source code or loading a kernel module. It essentially functions by creating a paradigm similar to a virtual machine-based programming language. In a manner similar to how modern Java programs are compiled into byte-code that the JVM (Java Virtual Machine) compiles to native code using a JIT compiler to get native performance, eBPF programs have a byte-code representation. BPF is deeply tied to the Linux kernel and the in-kernel JIT compiler compiles the eBPF byte-code into native code that can execute in the kernel space.

eBPF uses an event-based model to load programs and eBPF programs are written to “hook” into network events, systems calls, and more. When an event that an eBPF program hooks into have been called, the eBPF program is loaded into the kernel after verification and JIT compilation. The verification step ensures that the program is safe to run, has the right privileges and can run to completion, while the JIT compilation ensures native performance. In many cases, eBPF programs are written in higher-level languages, and compiled into the byte-code representation. These are then loaded into a running kernel after JIT compilation based on the events that the programs are hooked into.

Tetragon - eBPF based security observability and enforcement

Tetragon is a cloud-native eBPF based tool that performs security observability and enforcement. It is a component of the cilium project. Using eBPF, Tetragon is able to filter and observe events and apply policies in real-time without sending events to an agent that is running outside the kernel. Tetragon can address numerous security and observability use cases by filtering for events like a workload opening a network connection, accessing a file or even starting a process inside a container. For instance, a shell process being started inside an application container could be considered a security event. It could be someone trying to troubleshoot an issue or it could be some malicious activity, either way, its something that should trigger a security check to rule out an attack on the system. The same could be said about network connections being opened or files being read. Tetragon can trace and filter these activities while introducing little to no overhead and usually at the earliest stage that these events can be detected in software.

Tetragon is ideally suited for Kubernetes workloads and it runs as a daemonset in each node on the cluster. Tetragon can then pull metadata from the Kubernetes API server and correlate that metadata with the events observed within the kernel of each node. Tetragon makes it easy to set up real-time filters for these activities and more using TracingPolicies. TracingPolicy is a custom resource created by Tetragon that lets admins and DevSecOps create and deploy filters for kernel events as Kubernetes resources. A TracingPolicy can match system calls, process attributes, and arguments and trigger an action on matches.

Objective

Learn how to setup Tetragon on an OKE cluster in OCI.
Learn how to use TracingPolicies to observe and corelate events.

Prerequisites

Sign up or Sign in to your Oracle Cloud account
Create an OKE cluster

Note: Tetragon can be deployed to Kubernetes clusters like the Oracle Container Engine for Kubernetes (OKE) using the helm chart published by the Tetragon project. Once installed, the TracingPolicy CRD is created and Tetragon runs on the cluster nodes as a daemonset.

Prerequisites for Oracle Linux

OKE uses Oracle Linux and Tetragon relies on having the BTF (BPF Type Format) support in the kernel. Recent Oracle Linux kernels include this out-of-the-box, therefore users should use a kernel that is 5.4.17-2136.305.3.el7uek or newer. Tetragon also does not provide support for Arm (linux/arm64) architecture and at the time of writing provides only x86 (linux/amd64) support. If you have arm nodes in your OKE cluster, the daemon set will stay in a Init:CrashLoopBackOff status.

Note: Recent versions of the OKE node images are based on kernels that include BTF support. This caveat for BTF support is applicable only for clusters where the node OS has not been updated in a while, and not newly created clusters. If you are unsure, the best way to check if you have BTF support is to SSH on to the node and run ls /sys/kernel/btf, you should see the kernel (vmlinux) as well as modules listed here. As a general note, Oracle Linux 8 based nodes are preferred for this deployment.

To check the version of the kernel that your nodes are running, run uname -a on the node. If you are running an older version of the kernel, you can upgrade the version on the node pool configuration. However, this affects only newly created nodes and existing nodes are not upgraded automatically to ensure continuity for the workloads that might be running on them. You can follow the node pool upgrade process to bring your existing nodes up to the newer kernel versions.

Once you have ensured that you are running on a recent version of the kernel on your nodes, you can get started with Tetragon installation using the Tetragon helm chart. You can follow the instructions from Tetragon’s github page as well.
```
helm repo add cilium https://helm.cilium.io
helm repo update
helm install tetragon cilium/tetragon -n kube-system
kubectl rollout status -n kube-system ds/tetragon -w
```
Once the daemon set is ready and the tetragon pods are in the Running state, you can start listening to events on your nodes. Out-of-the-box, it can monitor process execution. Tetragon emits the matching eventsin JSON format, and the logs can be observed with the following command (assuming you have jq installed)
```
kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -c export-stdout -f | jq
```

Depending on what activity is occurring on your cluster, you will see a stream of JSON objects that represent these events. The following code snippet is a sample output from the cluster that was running ArgoCD, where it was cloning a git repository.

{
  "process_exec": {
    "process": {
      "exec_id": "MTAuMC4xMC4yMTg6OTE0MTQ2NjAzODU0MDcwOjEwNDA4Ng==",
      "pid": 104086,
      "uid": 999,
      "cwd": "/tmp/_argocd-repo/83c509d8-f9ba-48c3-a217-a9278134963e/",
      "binary": "/usr/bin/git",
      "arguments": "rev-parse HEAD",
      "flags": "execve clone",
      "start_time": "2022-06-07T17:03:42.519Z",
      "auid": 4294967295,
      "pod": {
        "namespace": "argocd",
        "name": "argocd-repo-server-7db4cc4b45-cpvlt",
        "container": {
          "id": "cri-o://1c361244fcb1d89c02ef297e69a13bd80fd4d575ae965a92979deec740711e17",
          "name": "argocd-repo-server",
          "image": {
            "id": "quay.io/argoproj/argocd@sha256:85d55980e70f8f7073e4ce529a7bbcf6d55e51f8a7fc4b45d698f0a7ffef0fea",
            "name": "quay.io/argoproj/argocd:v2.3.4"
          },
          "start_time": "2022-05-31T16:57:53Z",
          "pid": 319
        }
      },
      "docker": "1c361244fcb1d89c02ef297e69a13bd",
      "parent_exec_id": "MTAuMC4xMC4yMTg6MzA4OTk3NTAyODQyMTEzOjExMjQ3",
      "refcnt": 1
    }
  },
  "node_name": "10.0.10.218",
  "time": "2022-06-07T17:03:42.519Z"
}

The event stream as JSON output is verbose and hard to understand, but it is information dense. There are several ways of ingesting this JSON data and deriving analytical information from it. The obvious one is to use the tetragon CLI tool. Isovalent, the company behind Cilium and Tetragon also offers a full featured commercial product that can analyze and visualize this data to make it more actionable and easier to assimilate.

Task 1: Install the `tetra` CLI

The Tetragon cli tol tetra is useful to filter events by pod, host, namespace or process. The CLI can be downloaded from the github releases page. You can download the tool based on your operating system and CPU architecture, and untar it to a standard location such as /usr/local/bin or add the path to the binary to your PATH variable for your shell.

Alternatively, if you have go installed on your workstation where you want to run the cli, you can download and install it with the following commands:

GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz{,.sha256sum}
sha256sum --check tetra-${GOOS}-${GOARCH}.tar.gz.sha256sum
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
rm tetra-${GOOS}-${GOARCH}.tar.gz{,.sha256sum}

After the CLI is installed, the events can be pretty printed by passing the JSON output to tetra getevents.

kubectl logs -n kube-system ds/tetragon -c export-stdout -f | tetra getevents -o compact

The -o compact option displays compact output instead of JSON. The tool also allows you to restrict output display to certain namespaces, processes and more. The complete list of flags are shown below

Usage:
  tetra getevents [flags]

Flags:
      --color string        Colorize compact output. auto, always, or never (default "auto")
  -h, --help                help for getevents
      --host                Get host events
  -n, --namespace strings   Get events by Kubernetes namespaces
  -o, --output string       Output format. json or compact (default "json")
      --pod strings         Get events by pod name regex
      --process strings     Get events by process name regex
      --timestamps          Include timestamps in compact output

Global Flags:
  -d, --debug                   Enable debug messages
      --server-address string   gRPC server address (default "localhost:54321")

Task 2: Configure TracingPolicies for FileAccess and Network Observability

TracingPolicies are custom resources that make it easy to setup real-time filters for kernel events. A TracingPolicy matches and filters system calls for observability and also triggers an action on these matches. Tetragon offers a few examples that showcase this ability and can be used as a starting point to configure your TracingPolicies.

Apply the example tracing policies for FileAccess and Observability

kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/crds/examples/sys_write_follow_fd_prefix.yaml
kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/crds/examples/tcp-connect.yaml

With these additional TracingPolicies enabled, Tetragon starts tracing file access as well as network activity. The output below shows the activity that is seen from the kernel when a curl command is invoked. It shows the curl program accessing files like etc/hosts and /etc/resolv.conf and opening TCP connections and transferring data.

$ kubectl logs -n kube-system ds/tetragon -c export-stdout -f | tetra getevents -o compact

...[output truncated]
🚀 process default/xwing /usr/bin/curl -Lv https://cloud.oracle.com
📬 open    default/xwing /usr/bin/curl /etc/ssl/openssl.cnf
📪 close   default/xwing /usr/bin/curl
📬 open    default/xwing /usr/bin/curl /etc/hosts
📪 close   default/xwing /usr/bin/curl
📬 open    default/xwing /usr/bin/curl /etc/resolv.conf
📪 close   default/xwing /usr/bin/curl
🔌 connect default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443
📬 open    default/xwing /usr/bin/curl /etc/ssl/certs/ca-certificates.crt
📪 close   default/xwing /usr/bin/curl
📤 sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 517
📤 sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 126
📤 sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 109
📤 sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 31
🧹 close   default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443
💥 exit    default/xwing /usr/bin/curl -Lv https://cloud.oracle.com 0
💥 exit    default/xwing /bin/bash  0
...[output truncated]

As these events are monitored directly from within the kernel, it adds very little overhead to the act of tracing these calls and very little can be obfuscated or masked by a malicious actor.

The primary downside to this approach is that actions you can take like say, kill a process that reads a file is reactionary, in that we know about the event as it is happening, and not before. Still it is extremely powerful to be able to have a low overhead solution for filtering and matching events at the kernel level and being able to create policies that can help us observe and act on them.

Troubleshooting

Error: If you see that Tetragon pods are in a CrashLoopBackOff state, this could be caused by two reasons.

Possible reason: The most likely reason is that this is occurring on Arm based nodes, if you have them in your cluster. Tetragon does not yet run on Arm.

Troubleshoot:

To confirm, use
```
kubectl describe pod
```
To see the init container named tetragon-operator. This is likely failing, and in a terminated state with an exit code of 1. You can use
```
kubectl logs <pod_name> -c tetragon-operator -n kube-system
```
To view the init container logs, and you might see the reason for the init container to terminate as standard_init_linux.go:228: exec user process caused: exec format error, indicating the binary is not meant for use on Arm CPU architecture.

The second reason could be that you are have an old kernel on your node, and BTF support is not included. To verify if this is in fact the issue, get the container logs for the failing container in the pod as described above. If lack of BTF support in the kernel is the issue, you can see an error message similar to

aborting. kernel autodiscovery failed: Kernel version BTF search failed kernel is not included in supported list.
Use --btf option to specify BTF path and/or '--kernel' to specify kernel version

This is expected on nodes that have not had their OS updated recently or older versions of the OS. To resolve this, you may switch to the latest Oracle Linux 8 based OKE or platform images for your worker nodes. The node pool configuration needs to be updated with this choice and the nodes upgraded following the standard node pool upgrade process.

Acknowledgments

Author - Jeevan Joseph (Senior Principal Product Manager)

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.

Title and Copyright Information

Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes

F74876-01

December 2022

Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes

Introduction

What is eBPF and why is it popular?

Tetragon - eBPF based security observability and enforcement

Objective

Prerequisites

Prerequisites for Oracle Linux

Task 1: Install the tetra CLI

Task 2: Configure TracingPolicies for FileAccess and Network Observability

Troubleshooting

Related Links

Acknowledgments

More Learning Resources

Task 1: Install the `tetra` CLI