Considerations for Selecting a Solution Pattern

When implementing your data lake on the cloud, consider our recommended design patterns for migrating your current data lake to Oracle Cloud.

Prepare for Migration Projects

When migrating your data to Oracle Cloud, you should plan your project and staffing. Gather information about networking and storage and weigh the advantages and disadvantages before selecting a solution pattern. Create a high-level description for the systems and applications in scope for migration.

Consider our recommendations based on your environment, timelines, and team's skill level.

Plan your project and scope. Identify your project team including the project manager, application owner, big data engineers, OCI engineers for infrastructure and security, and developers. Ensure that you include application developers and performance and test engineers. Determine the key dates and project milestones.

Use the following example to create a high-level description of systems and applications.

Component Description
Big Data Appliance (BDA)

Running BDA appliance with CDH distribution

24 node BDA (6x Dev, 6x DR, 12x Prod)

  • 2x 22-Core Xeon
  • 2x40 IB, 4x10 Ethernet
  • 96 TB Disk and 256 GB RAM
Usage
  • 300TB HDFS (ingesting 500 GB/day)
  • 30 percent CPU
  • 1 TB RAM
  • Online 24x7
Environments

Production, Development, Disaster Recovery

Solution components
  • Hive
  • HBase
  • HDFS
  • Spark (Scala)
  • Kerberos and Active Directory
  • Sqoop
  • Oozie
  • Analytics: OBIEE
  • JDBC drivers to connect to external sources

Considerations for Networking and Storage

When planning your data lake migration, gather information about all networking and storage assets and determine the most suitable method to migrate your data to OCI.

The following table provides a general, high-level guidance of data migration options for OCI.

Migration Source Data Volumes < 1 TB Data Volumes between 1 and 50 TB Data Volumes > 50 TB
Big Data Appliance (BDA) or on-premises self-managed Hadoop clusters

Hardware VPN tunnels

(if FastConnect is not available)

FastConnect (preferred)

Hardware VPN tunnels may be used if bandwidth > 100 Mbps
Data Transfer Appliance
Big Data Cloud Service (BDCS) Software VPN Tunnels

Select one out of these options based on your organizational requirements and constraints. The time required for data transfer will depend on the migration method you chose.

  • For offline transfer with a single Data Transfer Appliance, you can transfer up to 150 TB of data at a time and multiple appliances for each data transfer job. Including the shipping time, the migration will take a few days to complete.
  • For online data transfer over the internet using VPN tunnels or FastConnect, you can use this formula to get approximate time required:

    Number of days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)

    Using this formula to transfer up to 50 TB of data with a 1 Gbps FastConnect connection having 100 percent network utilization, the data transfer will complete in 6 days. You can use FastConnect for lower volumes as well if you have it configured. With 10 Gbps FastConnect the time will be 1/10th of this.

  • For VPN tunnels to transfer 1 TB at 10 Mbps connectivity and 80 percent network utilization, the data transfer will take around 13 days. Alternatively, use the Data Transfer Appliance if your network connectivity is lower than this or not very reliable.

The following table presents a estimate of the approximate data upload time to OCI, based on the connection bandwidth and the size of your data set.

Data Set Size 10 Mbps 100 Mbps 1 Gbps 10 Gbps Data Transfer Service
10 TB 92 days 9 days 22 hours 2 hours 1 week
100 TB 1,018 days 101 days 10 days 24 hours 1 week
500 TB 5,092 days 509 days 50 days 5 days 1 week
1 PB 10,185 days 1,018 days 101 days 10 days 2 weeks

Design Your Solution Architecture

When planning your solution pattern, consider the advantages and disadvantages in the following table before making your decision.

Solution Pattern Advantages Disadvantages
Cloud Native (Greenfield)
  • You can transition to a modern and future-proof stack
  • Least ongoing operations and management overhead
  • Maximum Return-on-Investment (ROI) and lowest cost option for most customers
  • There can be some gaps in functionality that require your own implementation of certain components
  • More work required for implementation than some of the other patterns
Big Data Service (Greenfield)
  • You benefit from lower cost and operational overhead from using managed data and AI services
  • Works as a long-term and short-term solution as you transition into Oracle Cloud
  • More work required for implementation than some of the other patterns
Rebuild (Migration)
  • You can transition to a modern and future-proof stack
  • Least ongoing operations and management overhead
  • Maximum ROI and lowest cost option for most customers
  • There may be some gaps in functionality that may require your own implementation of some components
  • More work required for implementation than some of the other patterns
Replatform (Migration)
  • You benefit from lower cost and operational overhead from using managed data and AI services
  • Works as a long-term as well as a short-term solution as you transition into Oracle Cloud
  • More work required for implementation than some of the other patterns
Rehost (Migration)
  • Minimum disruption in functionality
  • Nothing new to learn from a usage point of view
  • Your responsibility increases for operations and support
  • Existing licensing may not be valid

Review Criteria for Solution Pattern Selection

Consider these criteria when you're making a decision about the most suitable pattern to use for your organization. Consider criteria such as relative degree of modernization, Return-on-Investment (ROI) and Total Cost of Ownership (TCO) savings, ease and duration of implementation, ongoing costs, operational efficiency, elasticity, scalability, availability, and relative changes to existing code.

The following table lists some high-level criteria to help you decide which patterns meet the needs of your organization.

Solution Pattern Relative Degree of Modernization Relative Potential for ROI and TCO Saving Relative Ease and Duration of Implementation Relative Ongoing Cost-Savings, Operational Efficiency Relative Elasticity, Scalability and Availability Relative Changes to Existing Code and Workflows
Cloud Native (Greenfield) High (Best) High (Best) Medium (Better) High (Best) High (Best) NA
Big Data Service (Greenfield) Medium (Better) Medium (Better) Medium (Better) Medium (Better) Medium (Better) NA
Rebuild (Migration) High (Best) High (Best) Low (Good) High (Best) High (Best) High (Good)
Replatform (Migration) Medium (Better) Medium (Better) Medium (Better) Medium (Better) Medium (Better) Medium (Better)
Rehost (Migration) Low (Good) Low (Good) High (Best) Low (Good) Low (Good) Low (Best)

Depending on your environment requirements, timeline, and team skills, Oracle recommends using the pattern that best meets your needs.

Consider these points when you're deciding the most suitable solution for your organization.

  • Many customers use more than one pattern in their cloud adoption journey.
  • The actual ranking depends on the specific customer context and use cases.
  • There is no single pattern that fits all our customer's needs.
  • Additional criteria include customer preferences, expertise, and unique requirements.