Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes
Introduction
As eBPF based tools get more popular for cloud native applications, this tutorial describes how to run Tetragon, an eBPF based tool for security observability and enforcement on Oracle Cloud Infrastructure (OCI).
What is eBPF and why is it popular?
An operating system kernel is usually the best place to observe and influence a system as there are very few barriers between the tool performing a function and the activities it is trying to observe and influence. Executing within the kernel space often lets a program have very low overhead but at the cost of security and maintenance.
eBPF provides a method of introducing new functionality that can execute in a sand-boxed and privileged context without changing kernel source code or loading a kernel module. It essentially functions by creating a paradigm similar to a virtual machine-based programming language. In a manner similar to how modern Java programs are compiled into byte-code that the JVM (Java Virtual Machine) compiles to native code using a JIT compiler to get native performance, eBPF programs have a byte-code representation. BPF is deeply tied to the Linux kernel and the in-kernel JIT compiler compiles the eBPF byte-code into native code that can execute in the kernel space.
eBPF uses an event-based model to load programs and eBPF programs are written to βhookβ into network events, systems calls, and more. When an event that an eBPF program hooks into have been called, the eBPF program is loaded into the kernel after verification and JIT compilation. The verification step ensures that the program is safe to run, has the right privileges and can run to completion, while the JIT compilation ensures native performance. In many cases, eBPF programs are written in higher-level languages, and compiled into the byte-code representation. These are then loaded into a running kernel after JIT compilation based on the events that the programs are hooked into.
Tetragon - eBPF based security observability and enforcement
Tetragon is a cloud-native eBPF based tool that performs security observability and enforcement. It is a component of the cilium project. Using eBPF, Tetragon is able to filter and observe events and apply policies in real-time without sending events to an agent that is running outside the kernel. Tetragon can address numerous security and observability use cases by filtering for events like a workload opening a network connection, accessing a file or even starting a process inside a container. For instance, a shell process being started inside an application container could be considered a security event. It could be someone trying to troubleshoot an issue or it could be some malicious activity, either way, its something that should trigger a security check to rule out an attack on the system. The same could be said about network connections being opened or files being read. Tetragon can trace and filter these activities while introducing little to no overhead and usually at the earliest stage that these events can be detected in software.
Tetragon is ideally suited for Kubernetes workloads and it runs as a daemonset
in each node on the cluster. Tetragon can then pull metadata from the Kubernetes API server and correlate that metadata with the events observed within the kernel of each node. Tetragon makes it easy to set up real-time filters for these activities and more using TracingPolicies. TracingPolicy is a custom resource created by Tetragon that lets admins and DevSecOps create and deploy filters for kernel events as Kubernetes resources. A TracingPolicy can match system calls, process attributes, and arguments and trigger an action on matches.
Objective
- Learn how to setup Tetragon on an OKE cluster in OCI.
- Learn how to use TracingPolicies to observe and corelate events.
Prerequisites
- Sign up or Sign in to your Oracle Cloud account
-
Create an OKE cluster
Note: Tetragon can be deployed to Kubernetes clusters like the Oracle Container Engine for Kubernetes (OKE) using the helm chart published by the Tetragon project. Once installed, the TracingPolicy CRD is created and Tetragon runs on the cluster nodes as a
daemonset
.
Prerequisites for Oracle Linux
OKE uses Oracle Linux and Tetragon relies on having the BTF (BPF Type Format) support in the kernel. Recent Oracle Linux kernels include this out-of-the-box, therefore users should use a kernel that is 5.4.17-2136.305.3.el7uek
or newer. Tetragon also does not provide support for Arm (linux/arm64
) architecture and at the time of writing provides only x86 (linux/amd64
) support. If you have arm nodes in your OKE cluster, the daemon set will stay in a Init:CrashLoopBackOff
status.
Note: Recent versions of the OKE node images are based on kernels that include BTF support. This caveat for BTF support is applicable only for clusters where the node OS has not been updated in a while, and not newly created clusters. If you are unsure, the best way to check if you have BTF support is to SSH on to the node and run
ls /sys/kernel/btf
, you should see the kernel (vmlinux) as well as modules listed here. As a general note, Oracle Linux 8 based nodes are preferred for this deployment.
To check the version of the kernel that your nodes are running, run uname -a
on the node. If you are running an older version of the kernel, you can upgrade the version on the node pool configuration. However, this affects only newly created nodes and existing nodes are not upgraded automatically to ensure continuity for the workloads that might be running on them. You can follow the node pool upgrade process to bring your existing nodes up to the newer kernel versions.
-
Once you have ensured that you are running on a recent version of the kernel on your nodes, you can get started with Tetragon installation using the Tetragon helm chart. You can follow the instructions from Tetragonβs github page as well.
helm repo add cilium https://helm.cilium.io helm repo update helm install tetragon cilium/tetragon -n kube-system kubectl rollout status -n kube-system ds/tetragon -w
-
Once the daemon set is ready and the tetragon pods are in the Running state, you can start listening to events on your nodes. Out-of-the-box, it can monitor process execution. Tetragon emits the matching eventsin JSON format, and the logs can be observed with the following command (assuming you have
jq
installed)kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -c export-stdout -f | jq
-
Depending on what activity is occurring on your cluster, you will see a stream of JSON objects that represent these events. The following code snippet is a sample output from the cluster that was running ArgoCD, where it was cloning a git repository.
{ "process_exec": { "process": { "exec_id": "MTAuMC4xMC4yMTg6OTE0MTQ2NjAzODU0MDcwOjEwNDA4Ng==", "pid": 104086, "uid": 999, "cwd": "/tmp/_argocd-repo/83c509d8-f9ba-48c3-a217-a9278134963e/", "binary": "/usr/bin/git", "arguments": "rev-parse HEAD", "flags": "execve clone", "start_time": "2022-06-07T17:03:42.519Z", "auid": 4294967295, "pod": { "namespace": "argocd", "name": "argocd-repo-server-7db4cc4b45-cpvlt", "container": { "id": "cri-o://1c361244fcb1d89c02ef297e69a13bd80fd4d575ae965a92979deec740711e17", "name": "argocd-repo-server", "image": { "id": "quay.io/argoproj/argocd@sha256:85d55980e70f8f7073e4ce529a7bbcf6d55e51f8a7fc4b45d698f0a7ffef0fea", "name": "quay.io/argoproj/argocd:v2.3.4" }, "start_time": "2022-05-31T16:57:53Z", "pid": 319 } }, "docker": "1c361244fcb1d89c02ef297e69a13bd", "parent_exec_id": "MTAuMC4xMC4yMTg6MzA4OTk3NTAyODQyMTEzOjExMjQ3", "refcnt": 1 } }, "node_name": "10.0.10.218", "time": "2022-06-07T17:03:42.519Z" }
The event stream as JSON output is verbose and hard to understand, but it is information dense. There are several ways of ingesting this JSON data and deriving analytical information from it. The obvious one is to use the tetragon
CLI tool. Isovalent, the company behind Cilium and Tetragon also offers a full featured commercial product that can analyze and visualize this data to make it more actionable and easier to assimilate.
Task 1: Install the tetra
CLI
-
The Tetragon cli tol
tetra
is useful to filter events by pod, host, namespace or process. The CLI can be downloaded from the github releases page. You can download the tool based on your operating system and CPU architecture, anduntar
it to a standard location such as/usr/local/bin
or add the path to the binary to yourPATH
variable for your shell. -
Alternatively, if you have
go
installed on your workstation where you want to run the cli, you can download and install it with the following commands:GOOS=$(go env GOOS) GOARCH=$(go env GOARCH) curl -L --remote-name-all https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz{,.sha256sum} sha256sum --check tetra-${GOOS}-${GOARCH}.tar.gz.sha256sum sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz rm tetra-${GOOS}-${GOARCH}.tar.gz{,.sha256sum}
-
After the CLI is installed, the events can be pretty printed by passing the JSON output to
tetra getevents
.kubectl logs -n kube-system ds/tetragon -c export-stdout -f | tetra getevents -o compact
The
-o compact
option displays compact output instead of JSON. The tool also allows you to restrict output display to certain namespaces, processes and more. The complete list of flags are shown belowUsage: tetra getevents [flags] Flags: --color string Colorize compact output. auto, always, or never (default "auto") -h, --help help for getevents --host Get host events -n, --namespace strings Get events by Kubernetes namespaces -o, --output string Output format. json or compact (default "json") --pod strings Get events by pod name regex --process strings Get events by process name regex --timestamps Include timestamps in compact output Global Flags: -d, --debug Enable debug messages --server-address string gRPC server address (default "localhost:54321")
Task 2: Configure TracingPolicies for FileAccess and Network Observability
TracingPolicies are custom resources that make it easy to setup real-time filters for kernel events. A TracingPolicy matches and filters system calls for observability and also triggers an action on these matches. Tetragon offers a few examples that showcase this ability and can be used as a starting point to configure your TracingPolicies.
-
Apply the example tracing policies for FileAccess and Observability
kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/crds/examples/sys_write_follow_fd_prefix.yaml kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/crds/examples/tcp-connect.yaml
-
With these additional TracingPolicies enabled, Tetragon starts tracing file access as well as network activity. The output below shows the activity that is seen from the kernel when a
curl
command is invoked. It shows thecurl
program accessing files likeetc/hosts
and/etc/resolv.conf
and opening TCP connections and transferring data.$ kubectl logs -n kube-system ds/tetragon -c export-stdout -f | tetra getevents -o compact ...[output truncated] π process default/xwing /usr/bin/curl -Lv https://cloud.oracle.com π¬ open default/xwing /usr/bin/curl /etc/ssl/openssl.cnf πͺ close default/xwing /usr/bin/curl π¬ open default/xwing /usr/bin/curl /etc/hosts πͺ close default/xwing /usr/bin/curl π¬ open default/xwing /usr/bin/curl /etc/resolv.conf πͺ close default/xwing /usr/bin/curl π connect default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 π¬ open default/xwing /usr/bin/curl /etc/ssl/certs/ca-certificates.crt πͺ close default/xwing /usr/bin/curl π€ sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 517 π€ sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 126 π€ sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 109 π€ sendmsg default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 bytes 31 π§Ή close default/xwing /usr/bin/curl tcp 10.244.1.152:65175 -> 23.212.250.69:443 π₯ exit default/xwing /usr/bin/curl -Lv https://cloud.oracle.com 0 π₯ exit default/xwing /bin/bash 0 ...[output truncated]
As these events are monitored directly from within the kernel, it adds very little overhead to the act of tracing these calls and very little can be obfuscated or masked by a malicious actor.
The primary downside to this approach is that actions you can take like say, kill a process that reads a file is reactionary, in that we know about the event as it is happening, and not before. Still it is extremely powerful to be able to have a low overhead solution for filtering and matching events at the kernel level and being able to create policies that can help us observe and act on them.
Troubleshooting
Error: If you see that Tetragon pods are in a CrashLoopBackOff
state, this could be caused by two reasons.
Possible reason: The most likely reason is that this is occurring on Arm based nodes, if you have them in your cluster. Tetragon does not yet run on Arm.
Troubleshoot:
-
To confirm, use
kubectl describe pod
-
To see the init container named
tetragon-operator
. This is likely failing, and in a terminated state with an exit code of1
. You can usekubectl logs <pod_name> -c tetragon-operator -n kube-system
-
To view the init container logs, and you might see the reason for the init container to terminate as
standard_init_linux.go:228: exec user process caused: exec format error
, indicating the binary is not meant for use on Arm CPU architecture.
The second reason could be that you are have an old kernel on your node, and BTF support is not included. To verify if this is in fact the issue, get the container logs for the failing container in the pod as described above. If lack of BTF support in the kernel is the issue, you can see an error message similar to
aborting. kernel autodiscovery failed: Kernel version BTF search failed kernel is not included in supported list.
Use --btf option to specify BTF path and/or '--kernel' to specify kernel version
This is expected on nodes that have not had their OS updated recently or older versions of the OS. To resolve this, you may switch to the latest Oracle Linux 8 based OKE or platform images for your worker nodes. The node pool configuration needs to be updated with this choice and the nodes upgraded following the standard node pool upgrade process.
Related Links
- OKE documentation
- Tetragon documentation
- Oracle Cloud Free Tier
- Sign in to your Oracle Cloud Account
Acknowledgments
- Author - Jeevan Joseph (Senior Principal Product Manager)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Install Tetragon and configure TracingPolicies on Oracle Container Engine for Kubernetes
F74876-01
December 2022
Copyright © 2022, Oracle and/or its affiliates.