Note:

Run Nextflow Pipelines on OCI using OKE and OCI File Storage Service

Introduction

There are many use cases where you need to run multi-step data pipelines that process large files, track intermediate results, and support parallel execution. While Oracle Cloud Infrastructure (OCI) Functions is well-suited for event-driven and stateless tasks, some workflows require a system designed for orchestration and dataflow across multiple steps.

Nextflow addresses this need with built-in support for containers, input/output handling, and scalable execution on OKE. It enables reproducible pipelines that run efficiently in distributed environments.

Objectives

Prerequisites

Task 1: Prepare the Worker Node(s)

OKE nodes must have Network File System (NFS) client utilities installed to mount the OCI File Storage service volume.

SSH into each worker node and run the following command.

sudo yum install -y nfs-utils
sudo systemctl enable --now nfs-client.target || true

Note: You do not need to manually mount the OCI File Storage service, OKE will handle mounting automatically using the Persistent Volume.

Task 2: Set up the OCI Bastion Host

If you are starting with a new bastion host, install the following:

Security Recommendations:

Task 3: Configure Nextflow from the Bastion

Run all the steps from your bastion VM.

  1. Create a Namespace.

    Run the following command to create a dedicated namespace.

    kubectl create namespace nextflow-ns
    
  2. Configure Persistent Volume and Persistent Volume Claim (PVC).

    1. Create a file named nextflow-fss.yaml and download the content from here: nextflow-fss.yaml.

    2. Make sure to replace <MOUNT_TARGET_IP> with the actual mount target IP (for example, 10.0.10.163), found in the OCI File Storage service mount target details in the OCI Console.

      Mount target IP

    3. Take a note of the export path as well and replace it in this same file.

    4. Run the following command to apply the file.

      kubectl apply -f nextflow-fss.yaml
      
  3. Create a Service Account and Role-based access control (RBAC).

    These will be created to ensure that the Nextflow job running in your OKE cluster has the necessary permissions to interact with OKE resources during pipeline execution.

    Nextflow, when running on OKE, needs to:

    • Launch pods for each process step.
    • Monitor their status.
    • Access logs.
    • Bind to the PVC.

    However, by default, Kubernetes jobs do not have permission to do these actions unless explicitly granted through a service account with proper RBAC bindings.

    Run the following command to create a service account.

    kubectl create serviceaccount nextflow-sa -n nextflow-ns
    
    kubectl apply -f - <<EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
    name: nextflow-pod-role
    namespace: nextflow-ns
    rules:
    - apiGroups: [""]
    resources: ["pods", "pods/log", "pods/status", "persistentvolumeclaims"]
    verbs: ["create", "get", "watch", "list", "delete"]
    EOF
    
    kubectl apply -f - <<EOF
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: nextflow-pod-binding
    namespace: nextflow-ns
    subjects:
    - kind: ServiceAccount
    name: nextflow-sa
    namespace: nextflow-ns
    roleRef:
    kind: Role
    name: nextflow-pod-role
    apiGroup: rbac.authorization.k8s.io
    EOF
    
  4. (Optional) Create Test Data Files for Nextflow Pipeline.

    Run the following command to create some files for later use in Nextflow.

    mkdir -p /mnt/nextflow-fss/data
    echo -e "line1\nline2\nline3" > /mnt/nextflow-fss/data/test1.txt
    echo -e "nextflow\nrocks"     > /mnt/nextflow-fss/data/test2.txt
    
  5. Create the Pipeline Files.

    This section defines the Nextflow pipeline logic (main.nf) and its Kubernetes-specific configuration (nextflow.config), specifying how the workflow should run, which container to use, and how to mount shared storage in the cluster.

    Create and download these files on your bastion VM from here:

  6. Create a Kubernetes ConfigMap.

    Create a Kubernetes ConfigMap to package the main.nf and nextflow.config files so they can be injected into the Nextflow pod at runtime.

    kubectl create configmap nextflow-code \
    --from-file=main.nf \
    --from-file=nextflow.config \
    -n nextflow-ns \
    --dry-run=client -o yaml | kubectl apply -f -
    
  7. Create and Run Nextflow Job YAML.

    This section defines the Kubernetes job that runs the Nextflow workflow inside a container. It uses the previously created service account for permissions, mounts the shared volume and ConfigMap, and sets the working directory and environment variables needed for Kubernetes execution.

    1. Create the files named nextflow-job.yaml and download from here: nextflow-job.yaml.

    2. Run the following command to apply the file.

      kubectl apply -f nextflow-job.yaml
      

    By running apply you will create the job, which runs the Nextflow run command with the mounted pipeline code and configuration, launching pipeline processes as Kubernetes pods and handling input/output through the mounted OCI File Storage service volume.

  8. Monitor the pods execution.

    You can monitor the pods that are launched and the job log using the following command.

    kubectl get pods -n nextflow-ns -w
    kubectl logs -n nextflow-ns -l job-name=nextflow-job --tail=100
    
  9. Find the output file.

    You should find the .count output files using the following command.

    ls /mnt/nextflow-fss/data/*.count
    
  10. Cleanup for retesting.

    # Remove old job and pods
    kubectl delete job --all -n nextflow-ns
    kubectl delete pod --all -n nextflow-ns
    
    # Delete only count files to rerun
    sudo find /mnt/nextflow-fss -name "*.count" -type f -delete
    

Note: In case you make changes to the main.nf or nextflow.config files, also recreate the ConfigMap.

Troubleshooting tips:

Task 4: Evaluate CPU Scheduling and Pod Parallelism

Nextflow parallelizes processes by launching a separate pod for each task. If your OKE node has limited CPU resources, such as only 1 vCPU, Kubernetes can only schedule one pod at a time if each pod requests a full CPU.

During execution, you may see the following warning in the job log.

WARN: K8s pod cannot be scheduled -- 0/1 nodes are available: 1 Insufficient cpu.

Why this happens:

Solutions:

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.