About the Advantages of Deploying Hadoop on Oracle Cloud Infrastructure

If you have an on premise Hadoop deployment, you may be considering either migrating or extending it with the cloud. You are seeking approaches that can leverage the dynamic nature of cloud to enhance the agility of your business and increase your price vs. performance efficiency. By moving to the cloud, you can take advantage of elasticity to pay only for the resources you use, gain interoperability with open source standards and software, and improve your storage performance with your data always running on the latest, fastest hardware. This solution presents the Oracle-recommended approach for deploying Hadoop to Oracle Cloud Infrastructure by describing the key implementation concerns, technical requirements and existing business challenges that need to be addressed as part of a migration or extension.In addition, it summarizes the supporting cloud services, third-party integrations, and deployment practices that can best align with your application environment and requirements. It provides reference architectures across several use cases that have been validated by prior successful deployments, and provides templates for deploying Hadoop using Cloudera, Hortonworks, MapR, and Apache.

Value Proposition

Most on premise Hadoop deployments can be migrated to run on Oracle Cloud Infrastructure without requiring significant configuration, integration, or process changes. The resulting implementation will be more flexible and more reliable, perform better, and cost less than on premise or other cloud deployments.

Hadoop benefits from the dynamic nature of cloud IaaS, enhanced by data tiering (leveraging multiple tiers of cloud storage). This provides a more robust and cost-effective solution that lets customers tune their compute requirements to meet workload demands with flexible storage density for HDFS.

Oracle has a validated solution to accomplish these goals, quickly and reliably. This solution includes procedures, supporting Oracle Cloud Infrastructure platform services, and reference architectures. These consider real production needs, like security, network configuration, high availability (HA), disaster recovery (DR), identity integration, and cost management.

Oracle's solution provides:

  • 37% lower total cost of ownership (TCO) than on premise deployments and 68% lower TCO than competing cloud solutions
  • CAPEX management and reduction, ensuring that the data centers you maintain are efficient, while eliminating server hardware, and taking advantage of cloud flexibility where possible
  • Rapid in-place technology refresh and patching
  • Proactive monitoring of usage and costs
  • Near-instant scaling up or down to handle business growth or workload bursts
  • Federated identity management with your existing systems
  • Rapid deployment that leverages Terraform templates, to deploy a Hadoop cluster in minutes instead of days
  • Extreme performance of non-volatile memory express (NVMe)-backed Hadoop Distributed File System (HDFS)

Total Cost of Ownership Analysis

Beyond the benefits of being straightforward to migrate, easier to manage, and more flexible to scale, running Hadoop on Oracle Cloud Infrastructure is cheaper than running it on premise or in another cloud system.

The estimated total cost of ownership of this solution can be 37% less than running Hadoop on premise and 68% less than running on another cloud, based on:

  • Assuming two environments: one for production and one combined for development and testing
  • Oracle's significant cost advantages for Block Volumes and Database storage
  • Assuming 21 nodes for Hadoop, 3 for Hadoop services, 2 for active/backup Cloudera manager, 3 for perimiter access, 500 TB of object storage, and 7.25 TB for block volumes
  • Compared to two on premise environments with 58 servers with 8 to 52 cores, 64 to 768 GB memory, 2.2 PB NAS storage, and 500 GB backups
  • Compared to similar available resources from competing non-Oracle clouds

Unique Infrastructure and Tools

Oracle offers ideal infrastructure and tools for hosting Hadoop.

Bare metal dense shapes offer the best performance for Hadoop workloads, with high memory density and blazing fast local NVMe storage for HDFS. Block volumes can be used to augment local storage, so you can achieve your HDFS storage targets without scaling wide on the number of worker nodes. One of the bigger problems for many Hadoop deployments is that storage requirements typically scale much faster than workload requirements, and the static nature of physical hardware deployments can lead to idle compute resources. Additionally, bare metal instances have dual 25-Gbps network interfaces, which drive high-speed, low-latency, intracluster communication. Combine all this with high availability deployments across fault domains, and you have a robust, scalable, performant cloud-based Hadoop solution.

Proven Customer Success

A financial services industry customer chose Oracle Cloud Infrastructure to migrate their on-premises Hadoop cluster. Their deployment serves as a useful real-world example of an Oracle Cloud Infrastructure-based Hadoop deployment and the advantages it provides to customers.

This migration included moving all production data, data feeds, and additional application infrastructure. The Hadoop environment included bare metal DenseIO Intel hosts, which leverage local NVMe for Hadoop. The customer was able to “right size” their environment after initially sizing a 1:1 server ratio. They then were able to reduce the footprint to hit a specific memory target (for HBase and Spark) because the Oracle Cloud Infrastructure deployment had three times the memory of their on premise deployment with the same node count.

Moving to Oracle Cloud Infrastructure enabled the customer to achieve the following goals:

  • Increase performance
  • Reduce cost
  • Decrease provisioning time for resources, from 120 days, to just a few hours
  • Improve the ability to scale to meet demand in minutes, compared to weeks or months

The following diagram represents the customer’s production deployment in Oracle Cloud Infrastructure:


Description of architecture-customer-reference.png follows
Description of the illustration architecture-customer-reference.png

The deployment has the following configuration:

  • 21 BM.DenseIO2.52 bare metal workers for Hadoop (1-PB raw NVMeE for HDFS)
  • 3 VM.Standard2.24 master nodes for Hadoop services
  • 2 VM.Standard2.16 utility nodes for Active/Backup Cloudera Manager
  • 3 VM.Standard2.8 edge VMs for perimeter access
  • 500 TB of Object Storage for cold data
  • 7.25 TB of block volumes to augment the OS for logs, parcels, and application data

Validated Solutions That Address Your Business Requirements

Cloudera, Hortonworks, and MapR are validated, supported Hadoop independent software vendors (ISVs) on Oracle Cloud Infrastructure. Details for each of these products are included in this solution.

Because Oracle Cloud Infrastructure was built for the usage patterns of enterprise production applications, existing Hadoop deployments can be easily moved to—and even improved in—Oracle Cloud Infrastructure. Oracle provides architectural patterns that meet all your networking, connectivity, performance, HA, DR, and multiple-region requirements. In fact, most customers find that the performance of Hadoop on Oracle Cloud Infrastructure exceeds the performance of their on premise deployments.

Oracle has also developed Terraform templates for rapid deployment and configuration of Hadoop on Oracle Cloud Infrastructure. These templates reduce the complexity and time to provision Hadoop on Oracle Cloud Infrastructure, resulting in frameworks that customers can customize and leverage to streamline Hadoop migrations or deployments.