3 Big Data and Data Warehousing Solutions

Analytic Views

Analytic View Enhancements to SQL and PL/SQL

SQL and PL/SQL enhancements for analytic views are the following:

  • Star views can be created on fact and dimension tables
  • Analytic views are created automatically for queries of the tables and views
  • Level group caches are created and maintained autonomously
  • Query-scoped base measures
  • Support for remote sources
  • Support for aggregate tables
  • Star support for dimension caches

With these enhancements:

  • Star view and star table queries can use the benefits of analytic views without any code changes
  • Front-end tools can use analytic view calculations by inserting a new expression
  • Level group caches can become more efficient over time based on usage patterns

Related Topics

Analytical SQL and Statistical Functions

Bitwise Aggregate Functions

New aggregate functions BIT_AND_AGG, BIT_OR_AGG, and BIT_XOR_AGG enable bitwise aggregation of integer columns and columns that can be converted or rounded to integer values.

Bitwise aggregation functions enable bitwise type processing directly in SQL. Use of these new functions improves overall query performance by eliminating unnecessary data movement and by taking full advantage of other database capabilities such as parallel processing.

Related Topics

Enhanced Analytic Functions

Window functions now support the EXCLUDE options of the SQL standard window frame clause. The query_block clause of a SELECT statement now supports the window_clause, which implements the window clause of the SQL standard table expression as defined in the SQL:2011 standard.

Supporting the full ANSI standard enables easier migration of applications that were developed against other standard-compliant database systems.

Related Topics

New Analytical and Statistical Aggregate Functions

New analytical and statistical aggregate functions are available in SQL:

CHECKSUM computes the checksum of the input values or expression.

KURTOSIS functions KURTOSIS_POP and KURTOSIS_SAMP measure the tailedness of a data set where a higher value means more of the variance within the data set is the result of infrequent extreme deviations as opposed to frequent modestly sized deviations. Note that a normal distribution has a kurtosis of zero.

SKEWNESS functions SKEWNESS_POP and SKEWNESS_SAMP are measures of asymmetry in data. A positive skewness is means the data skews to the right of the center point. A negative skewness means the data skews to the left.

All of these new aggregate functions support the keywords ALL, DISTINCT, and UNIQUE.

ANY_VALUE, a function to simplify and optimize the performance of GROUP BY statements, returns a random value in a group and is optimized to return the first value in the group. It ensures that there are no comparisons for any incoming row and eliminates the necessity to specify every column as part of the GROUP BY clause.

With these additional SQL aggregation functions, you can write more efficient code and benefit from faster in-database processing.

Related Topics

Machine Learning for SQL

Adam Optimization Solver for the Neural Network Algorithm

Adam is an optimization solver for the Neural Network algorithm that is computationally efficient, requires little memory, and is well suited for problems that are large in terms of data or parameters or both.

Adam is a popular extension to stochastic gradient descent. It uses mini-batch optimization and can make progress faster while seeing less data than the other Neural Network optimization solver, Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) with line search.

Related Topics

Oracle Machine Learning MSET-SPRT Algorithm

The Multivariate State Estimation Technique-Sequential Probability Ratio Test (MSET-SPRT) algorithm is a nonlinear, nonparametric anomaly detection technique for monitoring critical processes.

The DBMS_DATA_MINING.ALGO_MSET_SPRT algorithm detects subtle anomalies while producing minimal false alarms. It calibrates expected behavior from available, historical data of the normal operational sequence of monitored signals. It incorporates the learned behavior of the system into a persistent MSET-SPRT model. You can apply the model to new records to detect anomalous behavior.

Related Topics

Oracle Machine Learning XGBoost Algorithm

XGBoost is a highly-efficient, scalable gradient tree boosting machine learning algorithm for regression and classification.

The DBMS_DATA_MINING.ALGO_XBGOOST algorithm prepares training data, builds and persists a model, and applies the model for prediction. You can use it as a stand-alone predictor or incorporate it into real-world production pipelines for a wide range of problems such as ad click-through rate prediction, hazard risk prediction, web text classification, and so on.

Related Topics

Machine Learning for Python

Oracle Machine Learning for Python (OML4Py)

Oracle Machine Learning for Python (OML4Py) enables the open source Python programming language and environment to operate on database data at scale. Python users can run Python commands and scripts for statistical analysis and machine learning on data stored in Oracle Database.

With OML4Py, you can do the following:

  • Use a wide range of in-database machines learning algorithms
  • Minimize data movement
  • Leverage Oracle Database as a high performance compute engine for data exploration and preparation
  • Use AutoML for automatic algorithm selection, feature selection, and model tuning
  • Execute user-defined Python functions in non-parallel, data-parallel, and task-parallel fashion

Related Topics

Oracle Machine Learning for Python Configuration in DBCA

The Database Configuration Assistant (DBCA) supports configuring Oracle Machine Learning for Python (OML4Py).

Configuration through DBCA is supported for container databases and pluggable databases.

Related Topics

Query Optimization

In-Memory Vectorized Joins

The In-Memory vectorized joins feature is based on the deep vectorization framework. Using SIMD vector processing, the framework optimizes aspects of hash joins such as hashing, building, probing, and gathering. This optimization can improve the performance of join processing by 100% or more. The In-Memory vectorized joins feature is transparent to the user, requiring no plan changes.

In-Memory deep vectorization is a SIMD-based query processing framework that supports vectorization for higher-level query operators in the query plan. The framework includes optimizations such as SIMD, hardware acceleration, and pipelined execution.

Joins can account for a large percentage of SQL execution time for data warehouse workloads. Improving performance of hash joins by 100% or more can significantly improve performance.

Related Topics

Spatial and Graph

Property Graph: Graph Server and Client Kit

You no longer have to copy the Property Graph in-memory analytics server (PGX) and client tools and libraries from $ORACLE_HOME.

With the simplified packaging and the availability of the Oracle Graph Server and Client kit on OCI Marketplace, application developers can quickly and securely install and deploy the components required to work with Property Graphs. This makes it easier to start developing applications.

Related Topics

Property Graph: Native Python Client

Property Graph has a native Python API that allows you to create a graph, run graph queries (using PGQL), and analyze the graph using Python and Jupyter notebooks.

This makes it easier to get started with graph and simplifies integration in data science applications where Python and Jupyter are especially popular.

Related Topics

Property Graph: New Features in PGQL

You can now do graph DDL and graph DML operations with the graph query language PGQL. Additionally, you can use PGQL for CHEAPEST path queries using cost functions.

This simplifies development by eliminating the need for configuration files and Java APIs when doing several common graph operations.

Related Topics

Property Graph: Optimized Graph Representation for Faster Performance

The In-memory analytics server (PGX, or Property Graph AnalytiX) has an optimized representation of a property graph that uses less memory. Larger graphs can be analyzed in the same amount of memory.

Graph sizes are continuously growing larger. With this optimization you can analyze larger graphs using less memory than previously required. This not only enables analysis of more data, but also reduces system costs. The optimized graph representation gives you faster performance and is transparent; existing applications will run faster with no change.

Related Topics

Property Graph: User-defined Graph Algorithms

You can create or extend graph algorithms using Java syntax, in addition to the dozens of pre-built graph analytics that come with the product. These user-defined algorithms will execute as fast as native algorithms in the product because they are compiled with the same optimizations.

For unique and specialized use cases, customizing graph algorithms lets you add analysis that analysts and data scientists design specifically for your applications.

Related Topics

Property Graph Visualization

A rich set of visualization features lets you interactively explore the graph, customize layouts, and highlight interesting relationships in your data.

Seeing graph data and relationships visually lets analysts, data scientists, and developers quickly understand and explore clusters, outliers, anomalies, patterns, communities, and critical connections in their data. This makes you more productive and helps you share and communicate results more clearly.

Related Topics

RDF Graph: Native Unicode Storage and Processing

When creating the RDF (Resource Description Framework) network, you can now store RDF data in a native unicode format for virtually all use cases.

This reduces the storage required and enhances query performance.

Related Topics

Self-service Low-code Spatial Studio

Oracle Spatial Studio is a self-service web application that makes it easy for you to create interactive maps and perform spatial analysis on your business data. You can also use Spatial Studio to publish spatial analyses as REST services and generate SQL statements for spatial analysis using low-code and UI components.

With Spatial Studio, you no longer need to write Javascript or SQL or use third-party tools to take advantage of the spatial capabilities in Oracle Database.

Related Topics

Spatial Network Data Model Contraction Hierarchy

Contraction hierarchy, a precomputed in-memory approach, speeds up path computations in the Oracle Spatial and Graph network data model.

By using the contraction hierarchy Java API, you can evaluate shortest path computations, drive time polygon analysis, and traveling salesperson analysis functions more efficiently. These functions will perform 10 to 100 times faster than with previous releases. You can run more network analysis functions using fewer CPUs, and support more concurrent requests using the same hardware.

Related Topics

Spatial Support for Database In-Memory

A new In-Memory Spatial index is created when the INMEMORY SPATIAL clause is specified for a spatial geometry column during a CREATE or ALTER TABLE statement.

This means that you no longer have to create and maintain conventional disk-based spatial indexes when spatial tables are stored using Database In-Memory. The in-memory spatial index provides much faster query performance for SDO_FILTER() operations without having to maintain a separate, on-disk spatial index.

Related Topics