Note:

Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes

Introduction

As eBPF based tools get more popular for cloud native applications, this tutorial describes how to run Tetragon, an eBPF based tool for security observability and enforcement on Oracle Cloud Infrastructure (OCI).

An operating system kernel is usually the best place to observe and influence a system as there are very few barriers between the tool performing a function and the activities it is trying to observe and influence. Executing within the kernel space often lets a program have very low overhead but at the cost of security and maintenance.

eBPF provides a method of introducing new functionality that can execute in a sand-boxed and privileged context without changing kernel source code or loading a kernel module. It essentially functions by creating a paradigm similar to a virtual machine-based programming language. In a manner similar to how modern Java programs are compiled into byte-code that the JVM (Java Virtual Machine) compiles to native code using a JIT compiler to get native performance, eBPF programs have a byte-code representation. BPF is deeply tied to the Linux kernel and the in-kernel JIT compiler compiles the eBPF byte-code into native code that can execute in the kernel space.

eBPF uses an event-based model to load programs and eBPF programs are written to β€œhook” into network events, systems calls, and more. When an event that an eBPF program hooks into have been called, the eBPF program is loaded into the kernel after verification and JIT compilation. The verification step ensures that the program is safe to run, has the right privileges and can run to completion, while the JIT compilation ensures native performance. In many cases, eBPF programs are written in higher-level languages, and compiled into the byte-code representation. These are then loaded into a running kernel after JIT compilation based on the events that the programs are hooked into.

Tetragon - eBPF based security observability and enforcement

Tetragon is a cloud-native eBPF based tool that performs security observability and enforcement. It is a component of the cilium project. Using eBPF, Tetragon is able to filter and observe events and apply policies in real-time without sending events to an agent that is running outside the kernel. Tetragon can address numerous security and observability use cases by filtering for events like a workload opening a network connection, accessing a file or even starting a process inside a container. For instance, a shell process being started inside an application container could be considered a security event. It could be someone trying to troubleshoot an issue or it could be some malicious activity, either way, its something that should trigger a security check to rule out an attack on the system. The same could be said about network connections being opened or files being read. Tetragon can trace and filter these activities while introducing little to no overhead and usually at the earliest stage that these events can be detected in software.

Tetragon is ideally suited for Kubernetes workloads and it runs as a daemonset in each node on the cluster. Tetragon can then pull metadata from the Kubernetes API server and correlate that metadata with the events observed within the kernel of each node. Tetragon makes it easy to set up real-time filters for these activities and more using TracingPolicies. TracingPolicy is a custom resource created by Tetragon that lets admins and DevSecOps create and deploy filters for kernel events as Kubernetes resources. A TracingPolicy can match system calls, process attributes, and arguments and trigger an action on matches.

Objective

Prerequisites

Prerequisites for Oracle Linux

OKE uses Oracle Linux and Tetragon relies on having the BTF (BPF Type Format) support in the kernel. Recent Oracle Linux kernels include this out-of-the-box, therefore users should use a kernel that is 5.4.17-2136.305.3.el7uek or newer. Tetragon also does not provide support for Arm (linux/arm64) architecture and at the time of writing provides only x86 (linux/amd64) support. If you have arm nodes in your OKE cluster, the daemon set will stay in a Init:CrashLoopBackOff status.

Note: Recent versions of the OKE node images are based on kernels that include BTF support. This caveat for BTF support is applicable only for clusters where the node OS has not been updated in a while, and not newly created clusters. If you are unsure, the best way to check if you have BTF support is to SSH on to the node and run ls /sys/kernel/btf, you should see the kernel (vmlinux) as well as modules listed here. As a general note, Oracle Linux 8 based nodes are preferred for this deployment.

To check the version of the kernel that your nodes are running, run uname -a on the node. If you are running an older version of the kernel, you can upgrade the version on the node pool configuration. However, this affects only newly created nodes and existing nodes are not upgraded automatically to ensure continuity for the workloads that might be running on them. You can follow the node pool upgrade process to bring your existing nodes up to the newer kernel versions.

The event stream as JSON output is verbose and hard to understand, but it is information dense. There are several ways of ingesting this JSON data and deriving analytical information from it. The obvious one is to use the tetragon CLI tool. Isovalent, the company behind Cilium and Tetragon also offers a full featured commercial product that can analyze and visualize this data to make it more actionable and easier to assimilate.

Task 1: Install the tetra CLI

Task 2: Configure TracingPolicies for FileAccess and Network Observability

TracingPolicies are custom resources that make it easy to setup real-time filters for kernel events. A TracingPolicy matches and filters system calls for observability and also triggers an action on these matches. Tetragon offers a few examples that showcase this ability and can be used as a starting point to configure your TracingPolicies.

As these events are monitored directly from within the kernel, it adds very little overhead to the act of tracing these calls and very little can be obfuscated or masked by a malicious actor.

The primary downside to this approach is that actions you can take like say, kill a process that reads a file is reactionary, in that we know about the event as it is happening, and not before. Still it is extremely powerful to be able to have a low overhead solution for filtering and matching events at the kernel level and being able to create policies that can help us observe and act on them.

Troubleshooting

Error: If you see that Tetragon pods are in a CrashLoopBackOff state, this could be caused by two reasons.

Possible reason: The most likely reason is that this is occurring on Arm based nodes, if you have them in your cluster. Tetragon does not yet run on Arm.

Troubleshoot:

The second reason could be that you are have an old kernel on your node, and BTF support is not included. To verify if this is in fact the issue, get the container logs for the failing container in the pod as described above. If lack of BTF support in the kernel is the issue, you can see an error message similar to

aborting. kernel autodiscovery failed: Kernel version BTF search failed kernel is not included in supported list.
Use --btf option to specify BTF path and/or '--kernel' to specify kernel version

This is expected on nodes that have not had their OS updated recently or older versions of the OS. To resolve this, you may switch to the latest Oracle Linux 8 based OKE or platform images for your worker nodes. The node pool configuration needs to be updated with this choice and the nodes upgraded following the standard node pool upgrade process.

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.