Note:

Access OCI Object Storage Buckets from Oracle Big Data Service Cluster Using Resource Principal

Introduction

Oracle Big Data Service is a cloud-based service provided by Oracle that enables users to create and manage Hadoop clusters, Spark clusters, Trino and other big data services. Oracle Cloud Infrastructure (OCI) allows seamless integration between Oracle Big Data Service clusters and OCI Object Storage using resource principal.

This tutorial will guide you through the tasks to configure and access OCI Object Storage bucket from an Oracle Big Data Service cluster, leveraging the resource principal for secure authentication.

Objectives

Prerequisites

Task 1: Create Policies

  1. Log in to the OCI Console, navigate to Identity & Security and click Policies.

  2. Create the following policies to grant the Oracle Big Data Service cluster access to the OCI Object Storage bucket.

    allow any-user to read buckets in tenancy where ALL {request.principal.id='<BDS OCID>', target.bucket.name='<bucket name>'}
    
    allow any-user to read objects in tenancy where ALL {request.principal.id='<BDS OCID>', target.bucket.name='<bucket name>'}
    

    Note: Replace <BDS OCID> and <bucket name> with your Oracle Big Data Service cluster OCID and bucket name, respectively.

Task 2: Create Resource Principal in Oracle Big Data Service Cluster

  1. Go to the OCI Console, navigate to Analytics and AI and click Big data Service.

  2. Click your deployed cluster.

  3. Click Resource Principal and Create Resource Principal.

  4. Enter Display name and Session token life-span duration (in hours) and click Create.

Task 3: Update Hadoop Distributed File System (HDFS) Configuration

  1. Go to the Apache Ambari Console, navigate to HDFS, Configurations and Advanced.

  2. Update the following properties for your HDFS configuration.

    fs.oci.client.custom.authenticator=com.oracle.oci.bds.commons.auth.BDSResourcePrincipalAuthenticator
    fs.oci.client.regionCodeOrId=us-region-1
    fs.oci.rp.pem.passphrase.path=/etc/security/tokens/rpst.pass
    fs.oci.rp.pem.path=/etc/security/tokens/rpst.pem
    fs.oci.rp.rpst.path=/etc/security/tokens/rpst.token
    

    Note: bds_rp_users group owns the rpst token and keys for this resource principal in the cluster.

  3. Save the changes and restart the necessary services in Apache Ambari.

Task 4: Set Environment Variables (Optional)

For applications accessing the bucket through the OCI Software Development Kit (SDK) or other clients, ensure the following environment variables are set.

OCI_RESOURCE_PRINCIPAL_VERSION=2.2
OCI_RESOURCE_PRINCIPAL_PRIVATE_PEM=/etc/security/tokens/rpst.pem
OCI_RESOURCE_PRINCIPAL_REGION=us-region-1
OCI_RESOURCE_PRINCIPAL_RPST=/etc/security/tokens/rpst.token

Task 5: Test OCI Object Storage Bucket Access

SSH into the Oracle Big Data Service cluster and test access to the bucket using HDFS commands.

hdfs dfs -ls oci://<bucket name>@<namespace>/

Note: Ensure to replace <bucket name> with your OCI bucket used in policies and <namespace> value.

Troubleshooting and Tips

Next Steps

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.