Note:

This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.

Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service

Introduction

There are many use cases where you need to run multi-step data pipelines that process large files, track intermediate results, and support parallel execution. While Oracle Cloud Infrastructure (OCI) Functions is well-suited for event-driven and stateless tasks, some workflows require a system designed for orchestration and dataflow across multiple steps.

Nextflow addresses this need with built-in support for containers, input/output handling, and scalable execution on OKE. It enables reproducible pipelines that run efficiently in distributed environments.

Objectives

Deploy a minimal Nextflow pipeline on Oracle Cloud Infrastructure Kubernetes Engine (OKE), using Oracle Cloud Infrastructure File Storage service for shared volumes and a bastion host for access and control.

Prerequisites

Access to an OCI tenancy.
Permissions to create and manage OCI Compute instances, OKE clusters, and networking resources.
An existing OKE cluster in a Virtual Cloud Network (VCN) which has a public subnet.
Existing OCI File Storage service volume.
Existing OCI Compute instance to act as OCI Bastion jumphost in a public subnet in the same VCN as the OKE cluster.

Task 1: Prepare the Worker Node(s)

OKE nodes must have Network File System (NFS) client utilities installed to mount the OCI File Storage service volume.

SSH into each worker node and run the following command.

sudo yum install -y nfs-utils
sudo systemctl enable --now nfs-client.target || true

Note: You do not need to manually mount the OCI File Storage service, OKE will handle mounting automatically using the Persistent Volume.

Task 2: Set up the OCI Bastion Host

If you are starting with a new bastion host, install the following:

Oracle Cloud Infrastructure Command Line Interface (OCI CLI). For more information, see Installing the CLI.
kubectl and Kubeconfig setup. For more information, see Accessing a Cluster Using Kubectl.
Cluster connection: Follow the Access Your Cluster instructions, which you can find in the OCI Console cluster details page.
Mount the FSS volume to the bastion for example under /mnt/nextflow-fss. For example, considering the private IP of the mount target from the picture in Task 3 below, export path of the OCI FSS and an existing /mnt/nextflow-fss directory on the bastion, the command would be:
```
sudo mount -t nfs 10.0.10.163:/nextflow /mnt/nextflow-fss
```
Make sure nfs-utils is also installed here, as above, for the worker nodes.

Security Recommendations:

Use NSGs or security lists to restrict SSH access to the bastion. Make sure port 2049 is open for the FSS access.
Ensure private SSH key and kubeconfig are securely stored.

Task 3: Configure Nextflow from the Bastion

Run all the steps from your bastion VM.

Create a Namespace.

Run the following command to create a dedicated namespace.
```
kubectl create namespace nextflow-ns
```
Configure Persistent Volume and Persistent Volume Claim (PVC).
1. Create a file named nextflow-fss.yaml and download the content from here: nextflow-fss.yaml.
2. Make sure to replace <MOUNT_TARGET_IP> with the actual mount target IP (for example, 10.0.10.163), found in the OCI File Storage service mount target details in the OCI Console.
3. Take a note of the export path as well and replace it in this same file.
4. Run the following command to apply the file.
```
kubectl apply -f nextflow-fss.yaml
```

Create a Service Account and Role-based access control (RBAC).

These will be created to ensure that the Nextflow job running in your OKE cluster has the necessary permissions to interact with OKE resources during pipeline execution.

Nextflow, when running on OKE, needs to:

Launch pods for each process step.
Monitor their status.
Access logs.
Bind to the PVC.

However, by default, Kubernetes jobs do not have permission to do these actions unless explicitly granted through a service account with proper RBAC bindings.

Run the following command to create a service account.

kubectl create serviceaccount nextflow-sa -n nextflow-ns

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nextflow-pod-role
namespace: nextflow-ns
rules:
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/status", "persistentvolumeclaims"]
verbs: ["create", "get", "watch", "list", "delete"]
EOF

kubectl apply -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nextflow-pod-binding
namespace: nextflow-ns
subjects:
- kind: ServiceAccount
name: nextflow-sa
namespace: nextflow-ns
roleRef:
kind: Role
name: nextflow-pod-role
apiGroup: rbac.authorization.k8s.io
EOF

(Optional) Create Test Data Files for Nextflow Pipeline.

Run the following command to create some files for later use in Nextflow.

mkdir -p /mnt/nextflow-fss/data
echo -e "line1\nline2\nline3" > /mnt/nextflow-fss/data/test1.txt
echo -e "nextflow\nrocks"     > /mnt/nextflow-fss/data/test2.txt

Create the Pipeline Files.

This section defines the Nextflow pipeline logic (main.nf) and its Kubernetes-specific configuration (nextflow.config), specifying how the workflow should run, which container to use, and how to mount shared storage in the cluster.

Create and download these files on your bastion VM from here:
- main.nf.
- nextflow.config.
Create a Kubernetes ConfigMap.

Create a Kubernetes ConfigMap to package the main.nf and nextflow.config files so they can be injected into the Nextflow pod at runtime.
```
kubectl create configmap nextflow-code \
--from-file=main.nf \
--from-file=nextflow.config \
-n nextflow-ns \
--dry-run=client -o yaml | kubectl apply -f -
```
Create and Run Nextflow Job YAML.

This section defines the Kubernetes job that runs the Nextflow workflow inside a container. It uses the previously created service account for permissions, mounts the shared volume and ConfigMap, and sets the working directory and environment variables needed for Kubernetes execution.
1. Create the files named nextflow-job.yaml and download from here: nextflow-job.yaml.
2. Run the following command to apply the file.
```
kubectl apply -f nextflow-job.yaml
```
By running apply you will create the job, which runs the Nextflow run command with the mounted pipeline code and configuration, launching pipeline processes as Kubernetes pods and handling input/output through the mounted OCI File Storage service volume.
Monitor the pods execution.

You can monitor the pods that are launched and the job log using the following command.
```
kubectl get pods -n nextflow-ns -w
kubectl logs -n nextflow-ns -l job-name=nextflow-job --tail=100
```
Find the output file.

You should find the .count output files using the following command.
```
ls /mnt/nextflow-fss/data/*.count
```

Cleanup for retesting.

# Remove old job and pods
kubectl delete job --all -n nextflow-ns
kubectl delete pod --all -n nextflow-ns

# Delete only count files to rerun
sudo find /mnt/nextflow-fss -name "*.count" -type f -delete

Note: In case you make changes to the main.nf or nextflow.config files, also recreate the ConfigMap.

Troubleshooting tips:

Ensure the volume is correctly mounted by Kubernetes on all pods.
Ensure nfs-utils is installed on all OKE worker nodes.
Make sure the OCI File Storage service export allows access from the OKE node subnet.
If needed, describe failing pods using the following command.
```
kubectl describe pod <pod-name> -n nextflow-ns
```

Task 4: Evaluate CPU Scheduling and Pod Parallelism

Nextflow parallelizes processes by launching a separate pod for each task. If your OKE node has limited CPU resources, such as only 1 vCPU, Kubernetes can only schedule one pod at a time if each pod requests a full CPU.

During execution, you may see the following warning in the job log.

WARN: K8s pod cannot be scheduled -- 0/1 nodes are available: 1 Insufficient cpu.

Why this happens:

Nextflow submits all tasks in parallel by default.
If you have a 1 CPU worker node and the node is already running a pod that uses 1 CPU, additional pods are queued until resources become available.
This results in slower execution, but all jobs will eventually complete.

Solutions:

Option 1: Reduce CPU requests per process.

You can limit CPU usage per task in the nextflow.config file.
```
process {
cpus = 0.5
```
In the provided nextflow.config you will see this line exists already and is set to 0.1 which is enough for our very simple demo data. You can include it or modify the value as required.
Option 2: Use a larger node.

Upgrade your node shape to one with 2+ vCPUs to allow more pods to run in parallel.

Nextflow - Kubernetes Documentation

Acknowledgments

Author - Adina Nicolescu (Senior Cloud Engineer)

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.

Title and Copyright Information

Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service

G34230-01

Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service

Introduction

Objectives

Prerequisites

Task 1: Prepare the Worker Node(s)

Task 2: Set up the OCI Bastion Host

Task 3: Configure Nextflow from the Bastion

Task 4: Evaluate CPU Scheduling and Pod Parallelism

Related Links

Acknowledgments

More Learning Resources