Change History for Previous Releases

JSON CLOB Predicate Pushdown

Much improved filtering and parsing of JSON CLOB data in Hadoop enables Oracle Big Data SQL to push more processing for these large objects down to the Hadoop cluster. JSON Data can now be filtered on the Oracle Big Data SQL cells in Hadoop for CLOB columns up to 1 MB, depending on character set of the input document. The eligible JSON filter expressions for storage layer evaluation include simplified syntax, JSON_VALUE, and JSON_QUERY. In addition, Oracle Big Data SQL can project up to 32 KB of CLOB data from select list expression evaluation in Hadoop to Oracle Database. Processing falls back to Oracle Database only when column sizes exceed these two values.

Customers can disable or re-enable this functionality to suit their own needs.

In Release 3.2, this enhancement currently applies only to JSON expressions returning CLOB data. The same support will be provided for other CLOB types (such as substr and instr) as well as for BLOB data in a future release.

Note:

The new JSON CLOB predicate pushdown functionality requires Oracle Database version 12.1.0.2.180417 or greater, as well as the following patches:

The April 2018 Proactive DBBP (Database Bundle Patch). This is patch 27486326.
The one-off patch 27767148.

Install the one-off patch on all database compute nodes.

The one-off patch 26170659, which is required on top of earlier DBBPs, is not required on top of the April DBBP.

This functionality is not available through the January 2018 and August 2017 Proactive DBBPs

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) for the most up-to-date information on software version and patch requirements.

Support for Querying Kafka Topics

Release 3.2 provides Hive and Oracle Big Data SQL the ability to query Kafka topics via a new Hive storage handler. You can use this storage handler to create external Hive tables backed by data residing in Kafka. Oracle Big Data SQL or Hive can then query the Kafka data through the external tables. The Kafka key, value, offset, topic name, and partition id are mapped to Hive columns. You can explicitly designate the offset for each topic/partition pair, otherwise the offset will start from the earliest offset in the topic and end with the latest offset in the topic for each partition.

Improved Processing of Parquet Files

Oracle has introduced its own Parquet reader for processing data in Parquet format. This new reader provides significant performance and resource utilization improvements over the existing Hive Parquet driver. These include:

More intelligent column processing retrieval. The reader uses “lazy materialization” to process only columns with rows that satisfy the filter, thereby improving I/O.
Leveraging of dictionaries during filter predicate processing to improve CPU usage.
Streamlined data conversion, which also contributes to more efficient CPU usage.

The Big Data SQL installation enables the Oracle's Parquet reader by default. You have the option to disable it and revert to the generic Parquet reader.

Multi-User Authorization

In previous releases of Oracle Big Data SQL, all queries against Hadoop and Hive data are executed as the oracle user and there is no option to change users. Although oracle is still the underlying user in all cases, Oracle Big Data SQL 3.2 now uses Hadoop Secure Impersonation to direct the oracle account to execute tasks on behalf of other designated users. This enables HDFS data access based on the user that is currently executing the query, rather than the singular oracle user.

Administrators set up the rules for identifying the query user. They can provide rules for identifying the currently connected user and mapping the connected user to the user that is impersonated. Because there are numerous ways in which users can connect to Oracle Database, this user may be a database user, a user sourced from LDAP, from Kerberos, or a user from other sources. Authorization rules on the files apply for that user and HDFS auditing identifies the actual user running the query.

See Also:

Administration for Multi-User Authorization is done through the DBMS_BDSQL PL/SQL Package.

Authentication Between Oracle Database and Oracle Big Data SQL Cells

This authentication is between Oracle Database and the Big Data SQL cells on the Hadoop cluster, facilitating secure communication. The Database Authentication enhancement provides a safeguard against impersonation attacks, in which a rogue service attempts to connect to the Oracle Big Data offload server process running on a cluster node.

Kerberos Ticket Renewal Automation

On a Kerberos-secured network you can configure the installation to set up automated Kerberos ticket renewal for the oracle account used by Oracle Big Data SQL. This is done for both the Hadoop cluster and Oracle Database sides of the installation. You must provide the principal name and the path to the keytab file.in the bds-config.json configuration file. A template is provided in the configuration file:

"kerberos" : {
"principal" : "oracle/mycluster@MY.DOMAIN.COM",
"keytab" : "/home/oracle/security/oracle.keytab"
}

If you provide the Kerberos parameters in the configuration file, then Oracle Big Data SQL installation sets up cron jobs on both the Hadoop cluster and Oracle Database servers. These jobs renew the Kerboeros tickets for the principal once per day.

The principal and keytab file must already exist.

Automatic Upgrade

The current release can now be installed over an earlier release with no need to remove the older software on either the Hadoop or Oracle Database side. The previous installation is upgraded to the current release level.

Common Installation Bundle for all Platforms

In previous releases, customers needed to unpack the Oracle Big Data SQL installation bundle and choose the correct package for their Hadoop system (CDH or HDP). Now the bundle contains a single installation package that works for all supported Hadoop systems.

Simpler and Faster Installation with the new “Jaguar” Installer

The Jaguar installer replaces setup-bds.sh , the installer in previous releases. Jaguar includes these changes:

Automatic Check for Installation Prerequisites on Hadoop Nodes

Jaguar checks for installation readiness on each Hadoop DataNode and reports any missing prerequisites.
No Need to Manually Generate the Database-Side Installation Bundle

The database-side installation bundle that previously was manually generated by the customer can now be generated automatically. You still need to copy the bundle to the Oracle Database nodes and install it.
Faster Overall Installation Time on the Hadoop Side

Installation time will vary, but on the Hadoop Side the installation may take approximately eight minutes if all resources are local, possibly 20 minutes if Hadoop clients must be downloaded from the Internet, depending on download speed.
Prerequisite Apache Services on CDH can now be Installed as Either Packages or Parcels

Previously on CDH systems, the Oracle Big Data SQL installation required that the HDFS, YARN, and HIVE components had been installed as parcels. These components can now be installed on CDH as either packages or parcels. There is no change for HDP, where they must be installed as stacks.

Note:
On CDH systems, if the Hadooop services required by Oracle Big Data SQL are installed as packages, be sure that they are installed from within Cloudera Manager. Otherwise, Cloudera Manager will not be able to manage these services. This is not an issue with parcels.
In the CLI, the Jaguar utility Replaces ./setup-bds
The Jaguar utility is now the primary tool for Hadoop-side installation, de-installation, and configuration changes, as in these examples:
```
# ./jaguar install bds-config.json
# ./jaguar reconfigure bds-config.json
# ./jaguar uninstall bds-config.json 
```
The Default Configuration File Name is bds-config.json, but Alternate File Names are Also Accepted
You can now drop the explicit bds-config.json argument and allow the installer default to bds-config.json , as in the first example below. You can also specify an alternate configuration file of any name, though it must adhere to the same internal format as bds-config.json and should be given the .json file type.
```
# ./jaguar install 
# ./jaguar install cluster2-config.json
```
You can create configurations files with settings that are tailored to the requirements of each cluster. For example, you may want to apply different security parameters to Oracle Big Data SQL installations on test and production clusters.
Configuration Parameters Have Changed Significantly

Users of previous releases will see that the Jaguar configuration file includes a number of new parameters. Most of them are “optional” in the sense that they are not uniformly required, although your particular installation may require some of them. See the Related Links section below for links to the table of installer parameters as well as an example of a configuration file that uses all available parameters.
New updatenodes Command for Easier Maintenance

Oracle Big Data SQL must be installed on each Hadoop cluster node that is provisioned with the DataNode role. It has no function on nodes where DataNode is not present. The new Jaguar utility includes the updatenodes command which scans the cluster for instances of the DataNode within the cluster. If the DataNode role has been removed or relocated, or if nodes provisioned with the DataNode have been added or removed, then the script installs or uninstalls Oracle Big Data SQL components from nodes as needed.
An Extra Installation Step is Required to Enable Some Security Features

If you choose to enable Database Authentication between Oracle Database and Oracle Big Data SQL cells in the Hadoop cluster, or, Hadoop Secure Impersonation, then an additional “Database Acknowledge” step is required. In this process, the installation on the database server generates a ZIP file of configuration information that you must copy back to the Hadoop cluster management server for processing.
On the Database Side, Connections to Clusters are no Longer Classified as Primary and Secondary.

An Oracle Database system can have Oracle Big Data SQL connections to multiple Hadoop clusters. In previous releases, the first these connections was considered the primary (and had to be uninstalled last) and the others were secondary. In the current release, management of multiple installation is simpler and --uninstall-as-primary and --uninstall-as-secondary parameters of the database-side installer are obsolete. However there is now a default cluster. The Important Terms and Concepts section of this guide explains the significance of the default cluster.

Support for Oracle Tablespaces in HDFS Extended to Include All Non-System Permanent Tablespaces

Previous releases supported the move of permanent online tablespaces only to HDFS. This functionality now supports online, read-only, as well as offline permanent tablespaces.

Important Change in Behavior of the “mtactl start” Command

Oracle Big Data SQL 3.1 introduced the option to install Oracle Big Data SQL on servers where Oracle Grid Infrastructure is not present. In these environments, you can use the start subcommand of the mtactl utility (mtactl start) to start the MTA (Multi-Threaded Agent) extproc.

Note that in the current release, the mtactl start command works differently from the original Release 3.1 implementation.

Current behavior: mtactl start starts an MTA extproc using the init parameter values that are stored in the repository. It uses the default values only if the repository does not exist.
Previous behavior (Oracle Big Data SQL 3.1): mtactl start always uses the default init parameters regardless of whether or not init parameter values are stored in the repository.

E Change History for Previous Releases

E.1 Changes in Oracle Big Data SQL 3.2