Analysts need an efficient way to consolidate data from multiple spreadsheets and other flat-file data sources into a trusted, maintainable, and query-optimized source.
With Oracle Autonomous Data Warehouse and Oracle Analytics Cloud, you can load and optimize data from multiple flat-file sources into a centralized data warehouse so departments can analyze the data and gain actionable insights.
This reference architecture positions the technology solution within the overall business context:
Lines of business typically don't have timely or efficient access to data and information. Analysts gather the data manually, work with it on an individual basis, and then share copies of files through email or file servers. The data is not centralized, so ensuring the accuracy and the security of the data is difficult. Analysis can take a long time and the results are not easily repeatable.
A data mart is a simple form of a data warehouse that is focused on a single subject or functional area, such as sales, marketing, or finance and is often built and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could include internal systems, a central data warehouse, or external data.
Governed data warehouses and data marts can provide rich information to business users and more effectively deliver the organization's key performance indicators without relying heavily on IT resources and availability.
At a conceptual level, the technology solution addresses the problem as follows:
This architecture uses Oracle Autonomous Data Warehouse to load and optimize data from multiple flat-file sources into a centralized data warehouse and then uses Oracle Analytics Cloud to analyze the data to provide actionable insights.
The architecture focuses on the following logical divisions:
- Data refinery
Ingests and refines the data for use in each of the data layers in the architecture. The shape is intended to illustrate the differences in processing costs for storing and refining data at each level and for moving data between them.
- Data persistence platform (curated information layer)
Facilitates access and navigation of the data to show the current business view. For relational technologies, data may be logically derived or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For non-relational data, this layer contains one or more pools of data, either output from an analytical process or data optimized for a specific analytical task.
- Access and interpretation
Abstracts the logical business view of the data for the consumers. This abstraction facilitates agile approaches to development, migration to the target architecture, and the provision of a single reporting layer from multiple federated sources.
The architecture has the following components:
- Network access control lists (ACLs)
You can control access to your Oracle Autonomous Data Warehouse by setting network access control lists (ACLs). Access control lists specify client IP addresses that are allowed to connect to an Oracle Autonomous Data Warehouse instance and block all other IP addresses.
- Autonomous Data Warehouse
Oracle Autonomous Data Warehouse is a self-driving, self-securing, self-repairing database service that is optimized for data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.
Oracle Analytics Cloud is a scalable and secure public cloud service that provides a full set of capabilities to explore and perform collaborative analytics for you, for your workgroup, and for your enterprise.
With Oracle Analytics Cloud you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.
Use the following recommendations as a starting point to load and optimize data from multiple flat-file sources into a centralized data warehouse location for analysis.
Your requirements might differ from the architecture described here.
- Oracle SQL Developer
Use Oracle SQL Developer on your desktop or Oracle SQL Developer Web from a browser to run SQL statements and scripts in a worksheet and to perform other tasks with Oracle Autonomous Data Warehouse.
Before you connect Oracle Analytics Cloud to Oracle Autonomous Data Warehouse, have a database administrator add the IP address (or address range) for your Oracle Analytics Cloud instance to the allowlist. The database administrator must add a security rule that allows TCP/IP traffic from Oracle Analytics Cloud to the database.
When loading and optimizing data from multiple flat-file sources into a centralized data warehouse location for analysis, consider these implementation options.
|Guidance||Data Refinery||Data Persistence Platform||Access & Interpretation|
|Recommended||Oracle Autonomous Data Warehouse||Oracle Autonomous Data Warehouse||Oracle Analytics Cloud|
|Other Options||Oracle Data Integrator||Oracle Database Exadata Cloud Service|
|Rationale||Oracle Autonomous Data Warehouse is an easy-to-use, fully autonomous database that offers direct access to and ingestion of a variety of Data Sources such as CSV files and relational data sources.||Oracle Autonomous Data Warehouse is an easy-to-use, fully autonomous database that scales elastically, delivers fast query performance and requires no database administration. It also offers direct access to the data from object storage via external tables.||Oracle Analytics Cloud is a fully managed and tightly integrated with the Curated Data Layer (Oracle Autonomous Data Warehouse).|
The Terraform code for this reference architecture is available in GitHub. You can pull the code into Oracle Cloud Infrastructure Resource Manager with a single click, create the stack, and deploy it. Alternatively, you can download the code from GitHub to your computer, customize the code, and deploy the architecture by using the Terraform command line interface (CLI).
- Deploy by using Oracle Cloud Infrastructure Resource
If you aren't already signed in, enter the tenancy and user credentials.
- Review and accept the terms and conditions.
- Select the region where you want to deploy the stack.
- Follow the on-screen prompts and instructions to create the stack.
- After creating the stack, click Terraform Actions, and select Plan.
- Wait for the job to complete, then review the plan.
To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.
- If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
- Deploy using the Terraform code in GitHub:
- Go to GitHub.
- Clone or download the repository to your local computer.
- Follow the instructions in the
Learn more about related architectures and about the features of this architecture.
This log lists only the significant changes:
|December 11, 2020||Updated the Deploy section to deploy the stack in GitHub directly to Oracle Cloud Infrastructure Resource Manager with a single click.|
|September 21, 2020||Updated the Deploy section to include a link to Oracle Cloud Infrastructure Resource Manager as a source for the Terraform stack.|
|June 29, 2020||Updated the GitHub link in the Deploy section to point to the specific use-case folder in the project folder.|