Load

You learn about three architecture and intermediate storage options to load Oracle Cloud ERP data into Snowflake.

Load Data into Snowflake Using Low-Code

This architecture uses OCI GoldenGate and Oracle Data Transforms as the main data integration tools. Use this option if you're already using these tools and have at least one Oracle Database running on OCI that can be used as staging for Snowflake.



migrate-fa-snowflake-goldengate-data-int-oracle.zip

The option uses Oracle Data Transforms and OCI GoldenGate as main extraction and replication tools. The data is first loaded into an Oracle Database staging area and then into Snowflake. The GoldenGate-Snowflake connector has different configuration options. This architecture uses the default configuration. The data is refreshed on the target every 30 seconds. It's real time on source but near real time on target.

Follow the OCI GoldenGate Data Transforms extracting data from Oracle Fusion ERP blog to configure Oracle Data Transforms to extract data from Oracle Cloud ERP. On OCI GoldenGate, you need to create two deployments. The first one for Oracle technology where an extract is configured for the list of tables created by Oracle Data Transforms. The second deployment is for Big Data technologies where Snowflake is included.

Review this blog to Use OCI GoldenGate for Snowflake Initial Load and Real-time Data Sync to configure both deployments and the extract and replication process to Snowflake.

This architecture supports the following components:

  • Autonomous Data Warehouse

    Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating, backing up, patching, upgrading, and tuning the database.

  • Object storage

    OCI Object Storage provides access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability.

    Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Oracle Data Transforms

    Oracle Data Transforms allows you to design graphical data transformations in the form of data loads, data flows, and workflows without requiring you to write any code. Data Loads provide a convenient way of loading data into Oracle Autonomous Database. Data flows define how the data is moved and transformed between different systems, while the workflows define the sequence in which the data flows are executed.

  • OCI GoldenGate

    Oracle Cloud Infrastructure GoldenGate is a managed service that provides a real-time data mesh platform, replication to keep data highly available, and real-time analysis. You can design, run, and monitor your data replication and data streaming solutions without allocating or managing compute environments.

Load Data into Snowflake Using Spark

This architecture uses Spark (Data Flow) to read data from OCI Object Storage, followed by the Snowflake Python API to load the data into Snowflake.

After you configure the BICC jobs, the ZIP files will land on OCI Object Storage. This reference architecture uses two different tools. Data Science for development and testing. Data Flow as a Spark tool to run the code and load the data from OCI Object Storage to Snowflake. The spark uses a JDBC driver that can be downloaded from Snowflake documentation.



migrate-fa-snowflake-spark-object-storage-oracle.zip

This architecture supports the following components:

  • Data Science

    Oracle Cloud Infrastructure Data Science is a fully-managed, serverless platform that data science teams can use to build, train, and manage machine learning (ML) models on Oracle Cloud Infrastructure (OCI). It can easily integrate with other OCI services such as Oracle Autonomous Data Warehouse, Oracle Cloud Infrastructure Object Storage, and more. You can build and evaluate high-quality machine learning models that increase business flexibility by putting enterprise-trusted data to work quickly, and you can support data-driven business objectives with easier deployment of ML models. Data Science enables data scientists and machine learning engineers to use packages from the Anaconda Repository for free.

  • Object storage

    OCI Object Storage provides access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability.

    Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Data Flow

    Data Flow is a fully-managed Apache Spark service that performs processing tasks on extremely large datasets — without infrastructure to deploy or manage. Data Flow is a visual program representing the flow of data from source data assets, such as a database or flat file, to target data assets, such as a data lake or data warehouse. The OCI Data Integration intuitive UI designer opens when you create, view, or edit a data flow.

Load Data into Cloud Storage and Copy into Snowflake

This architecture uses various cloud storage APIs with Python to copy the data from OCI Object Storage to other cloud providers such as AWS S3, Google Cloud Storage, or Azure Blob Storage.

You must first install and configure the relevant cloud storage for your organization:

  1. Load data into Amazon S3 and copy into Snowflake:
    1. Install Boto3
    2. S3 configuration for Snowflake storage integration
    The process reads the ZIP files from OCI Object Storage and extracts the contents in the target Amazon S3. After the files are copied, you can use the COPY INTO Snowflake command to load the data into tables.
  2. Load data into Google Cloud Storage and copy into Snowflake:
    1. Install Google Cloud SDK
    2. Google Cloud Storage configuration for Snowflake storage integration
    The process reads the ZIP files from OCI Object Storage and extracts the contents in the target Google Cloud. After the files are copied, you can use COPY INTO Snowflake command to load the data into tables.
  3. Load data into Azure Blob Storage and copy into Snowflake: Azure container configuration for Snowflake storage integration. The process reads the ZIP files from OCI Object Storage and extracts the contents in the target Azure Blob Storage. After the files are copied, you can use COPY INTO Snowflake command to load the data into tables.


migrate-fa-snowflake-third-party-storage-oracle.zip

This architecture supports the following components:

  • Data Science

    Oracle Cloud Infrastructure Data Science is a fully-managed, serverless platform that data science teams can use to build, train, and manage machine learning (ML) models on Oracle Cloud Infrastructure (OCI). It can easily integrate with other OCI services such as Oracle Autonomous Data Warehouse, Oracle Cloud Infrastructure Object Storage, and more. You can build and evaluate high-quality machine learning models that increase business flexibility by putting enterprise-trusted data to work quickly, and you can support data-driven business objectives with easier deployment of ML models. Data Science enables data scientists and machine learning engineers to use packages from the Anaconda Repository for free.

  • Object storage

    OCI Object Storage provides access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability.

    Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.