Overview to Analyzing OCI Data Flow SQL Endpoints

You use Oracle Analytics Cloud to analyze data from OCI Data Flow SQL Endpoints in object storage, data lakes, and applications.

Data Flow SQL Endpoints are designed for developers, data scientists, and advanced analysts to interactively query data directly where it lives in a data lake.

Benefits of Using OCI Data Flow SQL Endpoints

You can analyze large volumes of event and time-series data in-place on the data lake without having to move and summarize it for performance.
You can consolidate data from multiple applications and data stores (for example, in Enterprise Resource Planning) into object storage and perform ad hoc queries regardless of where the data originates.
You can dispense with extracts and pre-aggregation and work on live data at any level of granularity. So not only can you save the time and effort when preparing the data, you have more powerful analysis capabilities.

Best Practices for Performance

Description of the illustration oci-data-flow-1.png

To take advantage of the indexing and caching at the Spark Cluster tier, create a dataset based on a single table or view. Datasets based on multi-table joins are supported, but not recommended.
When you configure the OCI Data Flow SQL Entpoints cluster, set incrementalCollect to true, for example:
spark.sql.thriftServer.incrementalCollect=true;

Visualizing Data From OCI Data Flow SQL Endpoints

In the Oracle Analytics Cloud workbook editor, add multiple OCI Data Flow SQL Endpoints tables or cubes. When you select a table or cube, you can add dimension columns and measure columns to your datasets for analysis.