Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Access OCI Object Storage Buckets from Oracle Big Data Service Cluster Using Resource Principal
Introduction
Oracle Big Data Service is a cloud-based service provided by Oracle that enables users to create and manage Hadoop clusters, Spark clusters, Trino and other big data services. Oracle Cloud Infrastructure (OCI) allows seamless integration between Oracle Big Data Service clusters and OCI Object Storage using resource principal.
This tutorial will guide you through the tasks to configure and access OCI Object Storage bucket from an Oracle Big Data Service cluster, leveraging the resource principal for secure authentication.
Objectives
-
Configure Oracle Big Data Service to access OCI Object Storage using resource principal.
-
Set up policies in OCI.
-
Test bucket access from the Oracle Big Data Service cluster.
Prerequisites
-
Oracle Big Data Service cluster deployed.
-
Access to an OCI tenancy.
-
Permissions to create policies in OCI.
Task 1: Create Policies
-
Log in to the OCI Console, navigate to Identity & Security and click Policies.
-
Create the following policies to grant the Oracle Big Data Service cluster access to the OCI Object Storage bucket.
allow any-user to read buckets in tenancy where ALL {request.principal.id='<BDS OCID>', target.bucket.name='<bucket name>'} allow any-user to read objects in tenancy where ALL {request.principal.id='<BDS OCID>', target.bucket.name='<bucket name>'}
Note: Replace
<BDS OCID>
and<bucket name>
with your Oracle Big Data Service cluster OCID and bucket name, respectively.
Task 2: Create Resource Principal in Oracle Big Data Service Cluster
-
Go to the OCI Console, navigate to Analytics and AI and click Big data Service.
-
Click your deployed cluster.
-
Click Resource Principal and Create Resource Principal.
-
Enter Display name and Session token life-span duration (in hours) and click Create.
Task 3: Update Hadoop Distributed File System (HDFS) Configuration
-
Go to the Apache Ambari Console, navigate to HDFS, Configurations and Advanced.
-
Update the following properties for your HDFS configuration.
fs.oci.client.custom.authenticator=com.oracle.oci.bds.commons.auth.BDSResourcePrincipalAuthenticator fs.oci.client.regionCodeOrId=us-region-1 fs.oci.rp.pem.passphrase.path=/etc/security/tokens/rpst.pass fs.oci.rp.pem.path=/etc/security/tokens/rpst.pem fs.oci.rp.rpst.path=/etc/security/tokens/rpst.token
Note:
bds_rp_users
group owns therpst
token and keys for this resource principal in the cluster. -
Save the changes and restart the necessary services in Apache Ambari.
Task 4: Set Environment Variables (Optional)
For applications accessing the bucket through the OCI Software Development Kit (SDK) or other clients, ensure the following environment variables are set.
OCI_RESOURCE_PRINCIPAL_VERSION=2.2
OCI_RESOURCE_PRINCIPAL_PRIVATE_PEM=/etc/security/tokens/rpst.pem
OCI_RESOURCE_PRINCIPAL_REGION=us-region-1
OCI_RESOURCE_PRINCIPAL_RPST=/etc/security/tokens/rpst.token
Task 5: Test OCI Object Storage Bucket Access
SSH into the Oracle Big Data Service cluster and test access to the bucket using HDFS commands.
hdfs dfs -ls oci://<bucket name>@<namespace>/
Note: Ensure to replace
<bucket name>
with your OCI bucket used in policies and<namespace>
value.
Troubleshooting and Tips
-
Core-site.xml
Verification: If the bucket content is not accessible, check if thecore-site.xml
file was properly generated by Apache Ambari and contains the correct resource principal configuration values. -
HDFS Restart: After updating the HDFS configuration, ensure all necessary services are restarted to apply changes.
-
Policy Scope: Double-check that the policies are correctly defined and applied to your Oracle Big Data Service cluster.
Next Steps
-
Explore advanced OCI SDK integrations for more sophisticated data access.
-
Upload and Download Data Using
DistCp
: After setting up bucket access, you can use Hadoop’sDistCp
command to efficiently transfer large datasets between HDFS and OCI Object Storage. This is especially useful for backup, migration, or large-scale data movement. -
Schedule Data Transfer with Oozie: Automate the upload and download process by scheduling regular DistCp jobs using Oozie for recurring backups or data synchronization.
Related Links
Acknowledgments
- Authors - Pavan Upadhyay (Principal Cloud Engineer), Saket Bihari (Principal Cloud Engineer)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Access OCI Object Storage Buckets from Oracle Big Data Service Cluster Using Resource Principal
G15552-01
September 2024