Skip to content

oracle-quickstart/oci-arch-load-file-into-adw-python

Repository files navigation

oci-arch-load-file-into-adw-python

Oracle Functions is a serverless, highly scalable, fully managed Functions-as-a-Service platform built on Oracle Cloud Infrastructure and powered by the open-source Fn Project engine. Developers can use Oracle Functions to write and deploy code that delivers business value without worrying about provisioning or managing the underlying infrastructure. Oracle Functions is container-native, with functions packaged as Docker container images.

This reference architecture shows how you can use a serverless function to automate the process of extracting data from files generated by various databases or applications and loading the data into a data warehouse for analysis.

The architecture diagram shows the data flow. When zipped CSV files are uploaded to a specific bucket in Oracle Cloud Infrastructure Object Storage, the Events service is triggered. The emitted event invokes a function, which extracts the data from the uploaded files and loads the data into an Oracle Autonomous Data Warehouse instance. After the data is loaded to the data warehouse, the processed CSV files are moved to a different bucket in Oracle Cloud Infrastructure Object Storage.

For details of the architecture, see Automate loading data to a data warehouse using a serverless application

Prerequisites

  • Permission to manage the following types of resources in your Oracle Cloud Infrastructure tenancy: vcns, internet-gateways, route-tables, security-lists, subnets, autonomous-database-family, and functionss.

  • Quota to create the following resources: 1 VCN, 1 subnets, 1 Internet Gateway, 1 route rules, 1 function, and ADW instance.

If you don't have the required permissions and quota, contact your tenancy administrator. See Policy Reference, Service Limits, Compartment Quotas.

Deploy Using Oracle Resource Manager

  1. Click Deploy to Oracle Cloud

    If you aren't already signed in, when prompted, enter the tenancy and user credentials.

  2. Review and accept the terms and conditions.

  3. Select the region where you want to deploy the stack.

  4. Follow the on-screen prompts and instructions to create the stack.

  5. After creating the stack, click Terraform Actions, and select Plan.

  6. Wait for the job to be completed, and review the plan.

    To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

  7. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.

Deploy Using the Terraform CLI

Clone the Module

Now, you'll want a local copy of this repo. You can make that with the commands:

git clone https://github.com/oracle-quickstart/oci-arch-load-file-into-adw-python.git
cd oci-arch-load-file-into-adw-python
ls

Prerequisites

First off, you'll need to do some pre-deploy setup for Docker and Fn Project inside your machine:

sudo su -
yum update
yum install yum-utils
yum-config-manager --enable *addons
yum install docker-engine
groupadd docker
service docker restart
usermod -a -G docker opc
chmod 666 /var/run/docker.sock
exit
curl -LSs https://raw.githubusercontent.com/fnproject/cli/master/install | sh
exit

Set Up and Configure Terraform

  1. Complete the prerequisites described here.

  2. Create a terraform.tfvars file, and specify the following variables:

# Authentication
tenancy_ocid         = "<tenancy_ocid>"
user_ocid            = "<user_ocid>"
fingerprint          = "<finger_print>"
private_key_path     = "<pem_private_key_path>"

# Region
region = "<oci_region>"

# Compartment
compartment_ocid = "<compartment_ocid>"

# dbpwd-cipher
dbpwd-cipher = "<ADW_password>"

# OCIR
ocir_user_name         = "<ocir_user_name>"
ocir_user_password     = "<ocir_user_password>"

Create the Resources

Run the following commands:

terraform init
terraform plan
terraform apply

Test the stack

You can test the stack by uploading CSV files into input-bucket bucket (files can be found in csv_files subdirectory of this repo). If event emission will be successful, the function should be executed and all CVS files should be analyzed and moved to the processed-bucket bucket. Ultimate outcome should be also visible in the ADW instance. Below you can find example outputs from the automation in a form of the command to be executed in the bash shell. This command will query ADW instance and database table date will be printed out:

Outputs:

ADW_query_URL_for_JSON_formatted_with_python = curl -X POST -u 'ADMIN:BEstrO0ng_#11' -H "Content-Type: application/json" --data '{}' https://BFGORDHAP4HZNLE-ADWDB.adb.us-ashburn-1.oraclecloudapps.com/ords/admin/soda/latest/regionsnumbers?action=query | python -m json.tool 

$ curl -X POST -u 'ADMIN:BEstrO0ng_#11' -H "Content-Type: application/json" --data '{}' https://BFGORDHAP4HZNLE-ADWDB.adb.us-ashburn-1.oraclecloudapps.com/ords/admin/soda/latest/regionsnumbers?action=query | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1682    0  1680  100     2   1044      1  0:00:02  0:00:01  0:00:01  1045
{
    "count": 4,
    "hasMore": false,
    "items": [
        {
            "created": "2021-05-13T15:32:12.377319000Z",
            "etag": "1DB1E870628E4890BCB02992C01E5F8C",
            "id": "A55A0BC8565745829880D9C1B5DBE6A4",
            "lastModified": "2021-05-13T15:32:12.377319000Z",
            "links": [
                {
                    "href": "https://bfgordhap4hznle-adwdb.adb.us-ashburn-1.oraclecloudapps.com:443/ords/admin/soda/latest/regionsnumbers/A55A0BC8565745829880D9C1B5DBE6A4",
                    "rel": "self"
                }
            ],
            "value": {
                "col1": "1",
                "col2": "2",
                "col3": "3",
                "region": "AMER"
            }
        },
        {
            "created": "2021-05-13T15:32:21.636972000Z",
            "etag": "49915FAFE75041A890BC1B79A0674A16",
            "id": "D18048C5D68F44E6A9008113D71
(...)

Destroy the Deployment

When you no longer need the deployment, you can run this command to destroy the resources:

terraform destroy

Architecture Diagram