Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service
Introduction
There are many use cases where you need to run multi-step data pipelines that process large files, track intermediate results, and support parallel execution. While Oracle Cloud Infrastructure (OCI) Functions is well-suited for event-driven and stateless tasks, some workflows require a system designed for orchestration and dataflow across multiple steps.
Nextflow addresses this need with built-in support for containers, input/output handling, and scalable execution on OKE. It enables reproducible pipelines that run efficiently in distributed environments.
Objectives
- Deploy a minimal Nextflow pipeline on Oracle Cloud Infrastructure Kubernetes Engine (OKE), using Oracle Cloud Infrastructure File Storage service for shared volumes and a bastion host for access and control.
Prerequisites
-
Access to an OCI tenancy.
-
Permissions to create and manage OCI Compute instances, OKE clusters, and networking resources.
-
An existing OKE cluster in a Virtual Cloud Network (VCN) which has a public subnet.
-
Existing OCI File Storage service volume.
-
Existing OCI Compute instance to act as OCI Bastion jumphost in a public subnet in the same VCN as the OKE cluster.
Task 1: Prepare the Worker Node(s)
OKE nodes must have Network File System (NFS) client utilities installed to mount the OCI File Storage service volume.
SSH into each worker node and run the following command.
sudo yum install -y nfs-utils
sudo systemctl enable --now nfs-client.target || true
Note: You do not need to manually mount the OCI File Storage service, OKE will handle mounting automatically using the Persistent Volume.
Task 2: Set up the OCI Bastion Host
If you are starting with a new bastion host, install the following:
-
Oracle Cloud Infrastructure Command Line Interface (OCI CLI). For more information, see Installing the CLI.
-
kubectl and Kubeconfig setup. For more information, see Accessing a Cluster Using Kubectl.
-
Cluster connection: Follow the Access Your Cluster instructions, which you can find in the OCI Console cluster details page.
-
Mount the FSS volume to the bastion for example under /mnt/nextflow-fss. For example, considering the private IP of the mount target from the picture in Task 3 below, export path of the OCI FSS and an existing /mnt/nextflow-fss directory on the bastion, the command would be:
sudo mount -t nfs 10.0.10.163:/nextflow /mnt/nextflow-fss
Make sure nfs-utils is also installed here, as above, for the worker nodes.
Security Recommendations:
-
Use NSGs or security lists to restrict SSH access to the bastion. Make sure port 2049 is open for the FSS access.
-
Ensure private SSH key and kubeconfig are securely stored.
Task 3: Configure Nextflow from the Bastion
Run all the steps from your bastion VM.
-
Create a Namespace.
Run the following command to create a dedicated namespace.
kubectl create namespace nextflow-ns
-
Configure Persistent Volume and Persistent Volume Claim (PVC).
-
Create a file named
nextflow-fss.yaml
and download the content from here:nextflow-fss.yaml
. -
Make sure to replace
<MOUNT_TARGET_IP>
with the actual mount target IP (for example,10.0.10.163
), found in the OCI File Storage service mount target details in the OCI Console. -
Take a note of the export path as well and replace it in this same file.
-
Run the following command to apply the file.
kubectl apply -f nextflow-fss.yaml
-
-
Create a Service Account and Role-based access control (RBAC).
These will be created to ensure that the Nextflow job running in your OKE cluster has the necessary permissions to interact with OKE resources during pipeline execution.
Nextflow, when running on OKE, needs to:
- Launch pods for each process step.
- Monitor their status.
- Access logs.
- Bind to the PVC.
However, by default, Kubernetes jobs do not have permission to do these actions unless explicitly granted through a service account with proper RBAC bindings.
Run the following command to create a service account.
kubectl create serviceaccount nextflow-sa -n nextflow-ns kubectl apply -f - <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: nextflow-pod-role namespace: nextflow-ns rules: - apiGroups: [""] resources: ["pods", "pods/log", "pods/status", "persistentvolumeclaims"] verbs: ["create", "get", "watch", "list", "delete"] EOF kubectl apply -f - <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: nextflow-pod-binding namespace: nextflow-ns subjects: - kind: ServiceAccount name: nextflow-sa namespace: nextflow-ns roleRef: kind: Role name: nextflow-pod-role apiGroup: rbac.authorization.k8s.io EOF
-
(Optional) Create Test Data Files for Nextflow Pipeline.
Run the following command to create some files for later use in Nextflow.
mkdir -p /mnt/nextflow-fss/data echo -e "line1\nline2\nline3" > /mnt/nextflow-fss/data/test1.txt echo -e "nextflow\nrocks" > /mnt/nextflow-fss/data/test2.txt
-
Create the Pipeline Files.
This section defines the Nextflow pipeline logic (
main.nf
) and its Kubernetes-specific configuration (nextflow.config
), specifying how the workflow should run, which container to use, and how to mount shared storage in the cluster.Create and download these files on your bastion VM from here:
-
Create a Kubernetes ConfigMap.
Create a Kubernetes ConfigMap to package the
main.nf
andnextflow.config
files so they can be injected into the Nextflow pod at runtime.kubectl create configmap nextflow-code \ --from-file=main.nf \ --from-file=nextflow.config \ -n nextflow-ns \ --dry-run=client -o yaml | kubectl apply -f -
-
Create and Run Nextflow Job YAML.
This section defines the Kubernetes job that runs the Nextflow workflow inside a container. It uses the previously created service account for permissions, mounts the shared volume and ConfigMap, and sets the working directory and environment variables needed for Kubernetes execution.
-
Create the files named
nextflow-job.yaml
and download from here: nextflow-job.yaml. -
Run the following command to apply the file.
kubectl apply -f nextflow-job.yaml
By running apply you will create the job, which runs the Nextflow run command with the mounted pipeline code and configuration, launching pipeline processes as Kubernetes pods and handling input/output through the mounted OCI File Storage service volume.
-
-
Monitor the pods execution.
You can monitor the pods that are launched and the job log using the following command.
kubectl get pods -n nextflow-ns -w kubectl logs -n nextflow-ns -l job-name=nextflow-job --tail=100
-
Find the output file.
You should find the
.count
output files using the following command.ls /mnt/nextflow-fss/data/*.count
-
Cleanup for retesting.
# Remove old job and pods kubectl delete job --all -n nextflow-ns kubectl delete pod --all -n nextflow-ns # Delete only count files to rerun sudo find /mnt/nextflow-fss -name "*.count" -type f -delete
Note: In case you make changes to the
main.nf
ornextflow.config
files, also recreate the ConfigMap.
Troubleshooting tips:
- Ensure the volume is correctly mounted by Kubernetes on all pods.
- Ensure
nfs-utils
is installed on all OKE worker nodes. - Make sure the OCI File Storage service export allows access from the OKE node subnet.
-
If needed, describe failing pods using the following command.
kubectl describe pod <pod-name> -n nextflow-ns
Task 4: Evaluate CPU Scheduling and Pod Parallelism
Nextflow parallelizes processes by launching a separate pod for each task. If your OKE node has limited CPU resources, such as only 1 vCPU, Kubernetes can only schedule one pod at a time if each pod requests a full CPU.
During execution, you may see the following warning in the job log.
WARN: K8s pod cannot be scheduled -- 0/1 nodes are available: 1 Insufficient cpu.
Why this happens:
- Nextflow submits all tasks in parallel by default.
- If you have a 1 CPU worker node and the node is already running a pod that uses 1 CPU, additional pods are queued until resources become available.
- This results in slower execution, but all jobs will eventually complete.
Solutions:
-
Option 1: Reduce CPU requests per process.
You can limit CPU usage per task in the
nextflow.config
file.process { cpus = 0.5
In the provided
nextflow.config
you will see this line exists already and is set to 0.1 which is enough for our very simple demo data. You can include it or modify the value as required. -
Option 2: Use a larger node.
Upgrade your node shape to one with 2+ vCPUs to allow more pods to run in parallel.
Related Links
Acknowledgments
- Author - Adina Nicolescu (Senior Cloud Engineer)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service
G34230-01
Copyright ©2025, Oracle and/or its affiliates.