Considerations for Selecting a Solution Pattern
When implementing your data lake on the cloud, consider our recommended design patterns for migrating your current data lake to Oracle Cloud.
Prepare for Migration Projects
When migrating your data to Oracle Cloud, you should plan your project and staffing. Gather information about networking and storage and weigh the advantages and disadvantages before selecting a solution pattern. Create a high-level description for the systems and applications in scope for migration.
Consider our recommendations based on your environment, timelines, and team's skill level.
Plan your project and scope. Identify your project team including the project manager, application owner, big data engineers, OCI engineers for infrastructure and security, and developers. Ensure that you include application developers and performance and test engineers. Determine the key dates and project milestones.
Use the following example to create a high-level description of systems and applications.
Component | Description |
---|---|
Big Data Appliance (BDA) |
Running BDA appliance with CDH distribution 24 node BDA (6x Dev, 6x DR, 12x Prod)
|
Usage |
|
Environments |
Production, Development, Disaster Recovery |
Solution components |
|
Considerations for Networking and Storage
When planning your data lake migration, gather information about all networking and storage assets and determine the most suitable method to migrate your data to OCI.
The following table provides a general, high-level guidance of data migration options for OCI.
Migration Source | Data Volumes < 1 TB | Data Volumes between 1 and 50 TB | Data Volumes > 50 TB |
---|---|---|---|
Big Data Appliance (BDA) or on-premises self-managed Hadoop clusters |
Hardware VPN tunnels (if FastConnect is not available) |
FastConnect (preferred) Hardware VPN tunnels may be used if bandwidth > 100 Mbps |
Data Transfer Appliance |
Big Data Cloud Service (BDCS) | Software VPN Tunnels |
Select one out of these options based on your organizational requirements and constraints. The time required for data transfer will depend on the migration method you chose.
- For offline transfer with a single Data Transfer Appliance, you can transfer up to 150 TB of data at a time and multiple appliances for each data transfer job. Including the shipping time, the migration will take a few days to complete.
-
For online data transfer over the internet using VPN tunnels or FastConnect, you can use this formula to get approximate time required:
Number of days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)
Using this formula to transfer up to 50 TB of data with a 1 Gbps FastConnect connection having 100 percent network utilization, the data transfer will complete in 6 days. You can use FastConnect for lower volumes as well if you have it configured. With 10 Gbps FastConnect the time will be 1/10th of this.
- For VPN tunnels to transfer 1 TB at 10 Mbps connectivity and 80 percent network utilization, the data transfer will take around 13 days. Alternatively, use the Data Transfer Appliance if your network connectivity is lower than this or not very reliable.
The following table presents a estimate of the approximate data upload time to OCI, based on the connection bandwidth and the size of your data set.
Data Set Size | 10 Mbps | 100 Mbps | 1 Gbps | 10 Gbps | Data Transfer Service |
---|---|---|---|---|---|
10 TB | 92 days | 9 days | 22 hours | 2 hours | 1 week |
100 TB | 1,018 days | 101 days | 10 days | 24 hours | 1 week |
500 TB | 5,092 days | 509 days | 50 days | 5 days | 1 week |
1 PB | 10,185 days | 1,018 days | 101 days | 10 days | 2 weeks |
Design Your Solution Architecture
When planning your solution pattern, consider the advantages and disadvantages in the following table before making your decision.
Solution Pattern | Advantages | Disadvantages |
---|---|---|
Cloud Native (Greenfield) |
|
|
Big Data Service (Greenfield) |
|
|
Rebuild (Migration) |
|
|
Replatform (Migration) |
|
|
Rehost (Migration) |
|
|
Review Criteria for Solution Pattern Selection
Consider these criteria when you're making a decision about the most suitable pattern to use for your organization. Consider criteria such as relative degree of modernization, Return-on-Investment (ROI) and Total Cost of Ownership (TCO) savings, ease and duration of implementation, ongoing costs, operational efficiency, elasticity, scalability, availability, and relative changes to existing code.
The following table lists some high-level criteria to help you decide which patterns meet the needs of your organization.
Solution Pattern | Relative Degree of Modernization | Relative Potential for ROI and TCO Saving | Relative Ease and Duration of Implementation | Relative Ongoing Cost-Savings, Operational Efficiency | Relative Elasticity, Scalability and Availability | Relative Changes to Existing Code and Workflows |
---|---|---|---|---|---|---|
Cloud Native (Greenfield) | High (Best) | High (Best) | Medium (Better) | High (Best) | High (Best) | NA |
Big Data Service (Greenfield) | Medium (Better) | Medium (Better) | Medium (Better) | Medium (Better) | Medium (Better) | NA |
Rebuild (Migration) | High (Best) | High (Best) | Low (Good) | High (Best) | High (Best) | High (Good) |
Replatform (Migration) | Medium (Better) | Medium (Better) | Medium (Better) | Medium (Better) | Medium (Better) | Medium (Better) |
Rehost (Migration) | Low (Good) | Low (Good) | High (Best) | Low (Good) | Low (Good) | Low (Best) |
Depending on your environment requirements, timeline, and team skills, Oracle recommends using the pattern that best meets your needs.
Consider these points when you're deciding the most suitable solution for your organization.
- Many customers use more than one pattern in their cloud adoption journey.
- The actual ranking depends on the specific customer context and use cases.
- There is no single pattern that fits all our customer's needs.
- Additional criteria include customer preferences, expertise, and unique requirements.