Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL

3 Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL

Oracle Big Data SQL must be installed on both the Hadoop cluster management server and the Oracle Database server. This section describes the installation of Oracle Big Data SQL on Oracle Database systems, including single-node and RAC systems.

3.1 Before You Start the Database-Side Installation

Review the points in this section before starting the installation.

If the current version of Oracle Big Data SQL is already installed on the Oracle Database system and you are making changes to the existing configuration, you do not need to repeat the entire installation process. See Reconfiguring an Existing Oracle Big Data SQL Installation.

Important:

For multi-node databases (such as Oracle RAC systems), you must repeat this installation on every compute node of the database. If this is not done, you will see RPC connection errors when the Oracle Big Data SQL service is started.
It is recommended that you set up passwordless SSH for root on the database nodes where Grid is running. Otherwise, you will need to supply the credentials during the installation on each node.
If the diskmon process is not already running prior to the installation, a Grid infrastructure restart will be required in order to complete the installation.
If you set up Oracle Big Data SQL connections from more than one Hadoop cluster to the same database, be sure that the configurations for each connection are the same. Do not set up one connection to use Infiniband and another to use Ethernet. Likewise, if you enable database authentication in one configuration, then this feature must be enabled in all Oracle Big Data SQL connections between different Hadoop clusters and the same database.
If the database system is Kerberos-secured, then it is important to note that authentication for the database owner principal can be performed by only one KDC. Oracle Big Data SQL currently does not support multiple Kerberos tickets. If two or more Hadoop cluster connections are installed by the same database owner, all must use the same KDC.

The KRB5_CONF environment variable must point to only one configuration file. The configuration file must exist, and the variable itself must be set for the database owner account at system startup.
For database running under Oracle Grid Infrastructure, if the system has more than one network interfaces of the same type (two or more Ethernet, two or more InfiniBand, or two or more Ethernet-over-InfiniBand interfaces), then the installation always selects the one whose name is first in alphabetical order.

See Also:

Details are provide in Special Considerations When a System Under Grid Infrastructure has Multiple Network Interfaces of the Same Type.

3.1.1 Check for Required Grid Patches With bds-validate-grid-patches.sh

The script bds-validate-grid-patches.sh checks to determine whether or not patches to the Oracle Grid Infrastructure that are required by Oracle Big Data SQL have been installed.

Note:

If you want to run this script, do so before Oracle Big Data SQL is installed on the database node.

Prerequisites for running this script:

ORACLE_HOME must be set to the Grid home.

$ echo $ORACLE_HOME
/u01/app/18c/gridhome

Copy the file bdsql_db_patches from the Oracle Big Data SQL installation directory on the Hadoop cluster to the same directory where you will run bds-validate-grid-patches.sh on the database node. The script uses bdsql_db_patches to determine if the required Grid patches are present for this release of Orace Big Data SQL on the currently-installed Oracle Database release.
You can find bdsql_db_patches at the following location on the Hadoop cluster node where you ran that side of the installation:
```
<Oracle Big Data SQL installation directory>/BDSJaguar/bdsrepo/db/bdsql_db_patches>
```
By default, the installation directory on the Hadoop cluster is /opt/oracle.
You must run this script as the Oracle Grid Infrastructure installation owner (Grid user).

3.1.2 Potential Requirement to Restart Grid Infrastructure

In certain database environments bds-database-install.sh needs to create cellinit.ora and/or celliniteth.ora. In these cases, the script will attempt to propagate similar changes across all nodes in the Grid Infrastructure. To do this, the script expects that passwordless SSH is set up between oracle and root or it will prompt for the root password for each node during execution. If the nature of the changes requires a restart of the Grid Infrastructure, the script will also display messages indicating that grid infrastructure needs to be restarted manually. Because the installlation cannnot complete without Grid credentials if a restart is necessary, be sure that you have the Grid password at hand.

3.1.2.1 Understanding When Grid or Database Restart is Required

On the database side of Oracle Big Data SQL, the diskmon process is the agent in charge of communications with the Hadoop cluster. This is similar to its function on the Oracle Exadata Database Machine, where it manages communications between compute nodes and storage nodes.

In Grid environments, diskmon is owned by the Grid user. In a non-Grid environment, it is owned by the Oracle Database owner.

In Oracle Big Data SQL, diskmon settings are stored on cellinit.ora and celliniteth.ora files in the /etc/oracle/cell/network-config/ directory. The installer updates these files in accordance with the cluster connection requirements.

This is how the installer determines when Grid or Oracle Database needs to be restarted:

If the installer detects that no previous cellinit.ora or celliniteth.ora file exists, this means that no diskmon process is running. In this case, if the environment includes Oracle Grid then you must restart the Grid. If the environment does not include Grid, then you must restart Oracle Database.
If previous cellinit.ora and/or celliniteth.ora file exist, this indicates that diskmon process is running. In this case, if the installer needs to make a change to these files, then only the database needs to be restarted.
In multi-node Grid environments, diskmon works on all nodes as a single component and cellinit.ora and celliniteth.ora must be synchronized on all nodes. This task is done through SSH. If passwordless SSH is set up on the cluster, no user interaction is required. If passwordless SSH is not set up, then the script will pause for you to input the root credentials for all nodes. When the cellinit.ora and celliniteth.ora files across all nodes are synchronized, then the script will continue. Then the script finishes and in this case, you must restart the Grid infrastructure.

3.1.3 Special Considerations When a System Under Grid Infrastructure has Multiple Network Interfaces of the Same Type

The Oracle Big Data SQL installation or SmartScan operation may sometimes fail in these environments because the wrong IP address is selected for communication with the cells.

When installing Oracle Big Data SQL within Oracle Grid Infrastructure, you cannot provide the installer with a specific IP address to select for communication with the Oracle Big Data SQL cells on the nodes. Network interface selection is automatically determined in this environment. This determination is not always correct and there are instances where the database-side installation may fail, or, you may later discover that SmartScan is not working.

You can manually correct this problem, but first it is helpful to understand how the installation decides which network interfaces to select.

How the Installation Selects From Among Multiple Interfaces of the Same Type on a System Under Grid Infrastructure

The diskmon process is controlled by Oracle Grid Infrastructure and not by the database. Grid manages communications with the Oracle Big Data SQL cells.
The Oracle Big Data SQL installer in these cases does not create cellinit.ora and celliniteth.ora, nor does it update the cell settings stored in these files. In these environments, the task is handled by Grid, because it is a cluster-wide task that must be synchronized across all nodes.
If there are multiple network interfaces, the Grid-managed update to the cells automatically selects the first network interface of the appropriate type on each node. It selects the interface whose name is first in alphabetical order.

For example, here is a system that is under Grid Infrastructure. It has multiple InfiniBand, Ethernet, and Ethernet over InfiniBand (bondeth*) network interfaces. This is the list of interfaces:

[root@mynode ~]# ip -o -f inet addr show
1: lo    inet 127.0.0.1/8 
2: eth0    inet 12.17.207.156/21 
3: eth1    inet 16.10.145.12/21 
19: bondeth0    inet 12.17.128.15/20 
20: bondeth2    inet 16.10.230.160/20 
21: bondeth4    inet 192.168.1.45/20 
30: bondib0    inet 192.168.31.178/21 
31: bondib1    inet 192.168.129.2/21 
32: bondib2    inet 192.168.199.205/21 
33: bondib3    inet 192.168.216.31/21 
34: bondib4    inet 192.168.249.129/21

When the Oracle Big Data SQL installer runs on this system, the following interfaces would be selected.

192.168.31.178/21 is selected for the InfiniBand connection configured in cellinit.ora.

Among the InfiniBand interfaces on this system, the interface name bondib0 is first in an ascending alphabetical sort.
12.17.128.15/20 is selected for an Ethernet-over-InfniBand connection configured in celliniteth.ora.

Note:
This example demonstrates an additional selection factor – Ethernet over InfiniBand takes precedence over standard Ethernet.
The bondeth0 interface name is first in this case.

How the Installation (or SmartScan) may Fail Under These Conditions

It is possible that diskmon cannot connect to the Oracle Big Data SQL cells via the network interface selected according to the logic described above. The correct subnet (one that diskmon can reach) may not appear first in an alphabetical sort.

How to Fix the Problem

You can manually change the IP addresses in thecellinit.ora and celliniteth.ora files. These files are at /etc/oracle/cell/network-config on each node.

Stop the CRS process. (Be sure to do this before the cell edit. If you do not, diskmon may hang.)
Edit cellinit.ora and/or celliniteth.ora . Change the IP addresses as needed.
Restart CRS.

3.2 About the Database-Side Installation Directory

You start the database side of the installation by unpacking the database-side installation bundle and executing the run file it contains. The run file creates an installation directory under $ORACLE_HOME/BDSJaguar-4.0.0. For example:

$ORACLE_HOME/BDSJaguar-4.0.0/cdh510-6-node1.mycluster.mydomain.com

The installation of Oracle Big Data SQL is not finished when this directory is created. The directory is a staging area that contains all of the files needed to complete the installation on the database node.

There can be Oracle Big Data SQL connections between the database and multiple Hadoop clusters. Each connection is established through a separate database-side installation and therefore creates a separate installation directory. The segments in the name of the installation directory enable you to identify the Hadoop cluster in this specific connection:

<Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>

You should keep this directory. It captures the latest state of the installation and you can use it to regenerate the installation if necessary. Furthermore, if in the future you need to adjust the database-side of the installation to Hadoop cluster changes, then the updates generated by the Jaguar reconfigure command are applied to this directory.

Consider applying permissions that would prevent the installation directory from being modified or deleted by any user other than oracle (or other database owner).

3.3 Steps for Installing on Oracle Database Nodes

To install the database side of Oracle Big Data SQL, copy over the zip file containing the database-side installation bundle that was created on the Hadoop cluster management server, unzip it, execute the run file it contains, then run the installer.

Perform the procedure in this section as the database owner (oracle or other ). You stage the bundle in a temporary directory, but after you unpack the bundle and execute the run file it contains, then the installation package is installed in a subdirectory under $ORACLE_HOME. For example: $ORACLE_HOME/BDSJaguar-4.0.0/cdh510-6-node1.mycluster.mydomain.com.

Before starting, check that ORACLE_HOME and ORACLE_SID are set correctly.

How Many Times Do I Run the Installation?

You must perform the installation for each instance of each database. For example, if you have a non-CBD database and a CBD database (DBA and DBB respectively, in the example below) on a single two-node RAC, then you would install Oracle Big Data SQL on both nodes as follows:

On node 1

./bds-database-install.sh --db-resource=DBA1 --cdb=false   
./bds-database-install.sh --db-resource=DBB1 --cdb=true

On node 2

./bds-database-install.sh --db-resource=DBA2 --cdb=false
./bds-database-install.sh --db-resource=DBB2 --cdb=true

Copy Over the Components and Prepare for the Installation

Copy over and install the software on each database node in sequence from node 1 to node n.

If you have not already done so, use your preferred method to copy over the database-side installation bundle from the Hadoop cluster management server to the Oracle Database server. If there are multiple bundles, be sure to select the correct bundle for cluster that you want to connect to Oracle Database. Copy it to any location that you would like to use as a temporary staging area. To perform the copy operation, it may be easiest for you to log on to the Hadoop cluster management server as root, navigate down to the bundle location within the Oracle Big Data SQL installation directory, and push the bundle over to the database server. Use the database owner account (usually oracle ) for the remote logon on the database side.
In this example the bundle is copied over to /opt/tmp. You can copy the bundle to any secure directory, but first confirm that the directory exists.
```
# cd <Big Data SQL Install Directory>/BDSjaguar-4.0.0/db-bundles
# scp bds-4.0.0-db-<cluster>-<yymmdd.hhmi>.zip oracle@<database_node>:/opt/tmp
```

If you generated a database request key, then also copy that key over to the Oracle Database server.

# cd <Big Data SQL Install Directory>/BDSjaguar-4.0.0/dbkeys
# scp <database name or other name>.reqkey oracle@<database_node>:/opt/tmp

Log on to the database server host as oracle (or whichever user is the database owner) and cd to the directory where you copied the file (or files).
Unzip the bundle. You will see that it contains a single, compressed run file.
Check to ensure that the prerequisite environment variables are set – $ORACLE_HOME and $ORACLE_SID.
Run the file in order to unpack the bundle into $ORACLE_HOME. For example:
```
$ ./bds-4.0.0-db-cdh510-170309.1918.run
```
Because you can set up independent Oracle Big Data SQL connections between an Oracle Database instance and multiple Hadoop clusters, the run command unpacks the bundle to a cluster-specific directory under $ORACLE_HOME/BDSJaguar-4.0.0 . For example:
```
$ ls $ORACLE_HOME/BDSJaguar-4.0.0
  cdh510-6-node1.mycluster.mydomain.com
  test1-3-node1.myothercluster.mydomain.com
```
If the BDSJaguar-4.0.0 directory does not already exist, it is created as well.
If you generated a database request key, then copy it into the newly created installation directory. For example:
```
$ cp /opt/tmp/mydb.reqkey $ORACLE_HOME/BDSJaguar-4.0.0/cdh510-6-node1.mycluster.mydomain.com
```
Tip:

You also have the option to leave the key file in the temporary location and use the --reqkey parameter in the installation command in order to tell the script where the key file is located. This parameter lets you specify a non-default request key filename and/or path.

However, the install script only detects the key in the installation directory when the key filename is the same as the database name. Otherwise, even if the key is local, if you gave it a different name then you must still use --reqkey to identify to the install script.

Now that the installation directory for the cluster is in place, you are ready to install the database side of Oracle Big Data SQL.

Install Oracle Big Data SQL on the Oracle Database Node

Important:

The last part of the installation may require a single restart of Oracle Database under either or both of these conditions:

If Oracle Database does not include the Oracle Grid Infrastructure. In this case, the installation script makes a change to the pfile or spfile configuration file in order to support standalone operation of diskmon.
If there are changes to the IP address and the communication protocol recorded in cellinit.ora. These parameters define the connection to the cells on the Hadoop cluster. For example, if this is a re-installation and the IP address for the Hadoop/Oracle Database connection changes from an Ethernet address to an InfiniBand address and/or the protocol changes (between TCP and UDP), then a database restart is required.

Start the database if it is not running.
Log on to the Oracle Database node as oracle and change directories to the cluster-specific installation directory under $ORACLE_HOME/BDSJaguar-4.0.0. For example:
```
$ cd $ORACLE_HOME/BDSJaguar-4.0.0/cdh510-6-node1.my.domain.com 
```
Run bds-database-install.sh, the database-side Oracle Big Data SQL installer. You may need to include some optional parameters.
```
$ ./bds-database-install.sh [options]
```
Restart the database (optional).

3.3.1 Command Line Parameter Reference for bds-database-install.sh

The bds-database-install.sh script accepts a number of command line parameters. Each parameter is optional, but the script requires at least one.

Table 3-1 Parameters for bds-database-install.sh

Parameter	Function
`--install`	Install the Oracle Big Data SQL connection to this cluster. Note: In mid-operation, this script will pause and prompt you to run a second script as root: `bds-database-install: root shell script /u03/app/masha/12.1.0/ dbhome_mydb/install/bds-database-install-10657-root-scriptclust1.sh please run as root: <enter> to continue checking.. q<enter> to quit bds-database-install: root` As root, open a session in a second terminal and run the script there. When that script is complete, return to the original terminal and press Enter to resume the `bds-database-install.sh` session. Because Oracle Big Data SQL is installed on the database side as a regular user (not a superuser), tasks that must be done as root and/or the Grid user require the installer to spawn shells to run other scripts under those accounts while `bds-database-install.sh` is paused. In some earlier releases, the `--install` parameter was implicit if no other parameters are supplied. You must now explicitly include this parameter in order to do an installation.
`--version`	Show the `bds-database-install.sh` script version.
`--info`	Show information about the cluster, such as the cluster name, cluster management server host, and the web server.
`--grid-home`	Specifies the Oracle Grid home directory.
`--crs`	Use `crsctl` to set up Oracle Big Data SQL MTA extprocs. Ignored in non-Grid environments. `--crs={true\|false}` The installer always checks to verify that Grid is running. If Grid is not running, then the installer assumes that Grid is not installed and that the database is single-instance. It then automatically sets the `crs` flag to `false`. If `--crs=true` is explicitly set and Grid cannot be found, the installer terminates with an error message stating that `GI_HOME` must be set.
`--cdb`	Create database objects on all PDBs for CDB databases. `--cdb={true\|false}`
`--db-resource`	Deprecated. Oracle Big Data SQL scripts will run on the database specified by `ORACLE_SID` or `ORACLE_HOME` environment variables. `--db-resource=$ORACLE_SID`
`--db-name`	Deprecated. Oracle Big Data SQL scripts will run on the database specified by `ORACLE_SID` or `ORACLE_HOME` environment variables. `--db-name=$ORACLE_HOME`
`--reqkey`	This parameter tells the installer the name and location of the request key file. Database Authentication is enabled by default in the configuration. Unless you do not intend to enable this feature, then as one of the steps to complete the configuration, you must use `--reqkey`. To provide the key file name and the path if the key file is not in the local directory: `$ ./bds-database-install.sh --reqkey=/opt/tmp/some_name.reqkey --install` To provide the key file name if the file is local, but the filename is not the same as the database name: `$ ./bds-database-install.sh --reqkey=some_name.reqkey --install` If the request key filename is provided without a path, then the key file is presumed to be in the installation directory. The installer will find the key if the filename is the same as the database name. For example, in this case the request key file is in the install directory and the name is the same as the database name. `$ ./bds-database-install.sh --install` This file is consumed only once on the database side in order to connect the first Hadoop cluster to the database. Subsequent cluster installations to connect to the same database use the configured key. You do not need to resubmit the `.reqkey` file.
`--uninstall`	Uninstall the Oracle Big Data SQL connection to this cluster.
`--reconfigure`	Reconfigures bd_cell network parameters, Hadoop configuration files, and Big Data SQL parameters. The Oracle Big Data SQL installation to connect to this cluster must already exist. Run reconfigure when a change has occurred on the Hadoop cluster side, such as a change in the DataNode inventory.
`--databaseack-only`	Create the Database Acknowledge zip file. (You must then copy the zip file back to `/opt/oracle/DM/databases/conf` on the Hadoop cluster management server and run `./jaguar databaseack <configuration file>`.)
`--mta-restart`	MTA extproc restart. (Oracle Big Data SQL must already be installed on the database server.)
`--mta-setup`	Set up MTA extproc with no other changes. (Oracle Big Data SQL must already be installed on the database server.)
`--mta-destroy`	Destroy MTA extproc and make no other changes. (Oracle Big Data SQL must already be installed on the database server.)
`--aux-run-mode`	Because Oracle Big Data SQL is installed on the database side as a regular user (not a superuser), tasks that must be done as `root` and/or the Grid user require the installer to spawn shells to run other scripts under those accounts while `bds-database-install.sh` is paused. The`--aux-run-mode` parameter specifies a mode for running these auxiliary scripts. `--aux-run-mode=<mode>` Mode options are: `session` – through a spawned session. `su` — as a substitute user. `sudo` — through sudo. `ssh` — through secure shell.
`--root-script`	Enables or disables the startup root script execution. `--root-script={true\|false}`
`--no-root-script`	Skip root script creation and execution.
`--root-script-name`	Set a name for the root script (the default name is based on the PID).
`--pdb-list-to-install`	For container-type databases, Oracle Big Data SQL is by default set up on all open PDBs. This parameter limits the setup to the specified list of PDBs.
`--restart-db`	If a database restart is required, then by default the install script prompts the user to choose between doing the restart now or later. Setting `--restart-db=yes` tells the script in advance to proceed with any needed restart without prompting the user. If `--restart-db=no` then the prompt is displayed and the installation waits for a response. This is useful for unattended executions.
`--skip-db-patches-check`	To skip the patch validation, add this parameter when you run the installer: `./bds-database-install.sh --skip-db-patches-check` By default, database patch requirements are checked when you run `./bds-database-install.sh`. If the patch validation fails, the installer returns a prompt with a warning message, indicating that there are some patches missing. Installation will continue after the warning.

The --root-script-only parameter from previous releases is obsolete.

3.4 Granting User Access to Oracle Big Data SQL

In Oracle Big Data SQL releases prior to 3.1, access is granted to the PUBLIC user group. In the current release, you must do the following for each user who needs access to Oracle Big Data SQL:

Grant the BDSQL_USER role.
Grant read privileges on the BigDataSQL configuration directory object.
Grant read privileges on the Multi-User Authorization security table to enable impersonation.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;
SQL> grant read on BDSQL_USER_MAP to user1;

3.5 Enabling Object Store Access

If you want to use Oracle Big Data SQL to query object stores, certain database properties and Network ACL values must be set on the Oracle Database side of the installation. The installation provides two SQL scripts you can use to do this.

As of Oracle Big Data SQL 4.0, the ORACLE_BIGDATA driver enables you to create external tables over data within object stores in the cloud. Currently, Oracle Object Store and Amazon S3 are supported. You can create external tables over Parquet, Avro, and text files in these stores. The first step is set up access to the object stores as follows.

Run set_parameters_cdb.sql and allow_proxy_pdb.sql to Enable Object Store Access

After you run bds-database-install.sh to execute the database-side installation, find these two SQL script files under $ORACLE_HOME, in the cluster subdirectory under the BDSJaguar directory:
```
set_parameters_cdb.sql
allow_proxy_pdb.sql
```
Open and read each of these files. Confirm that the configuration is correct.

Important:
Because there are security implications, carefully check that the HTTP server setting and other settings are correct.
In CBD ROOT, run set_parameters_cdb.sql.
In each PDB that needs access to object stores, run allow_proxy_pdb.sql.

In a RAC database, you only need to run these scripts on one instance of the database.

3.6 Maintaining WebLogic Server

The WebLogic Server is included as component of Oracle Big Data SQL. Download and run the patching tool described here to keep WebLogic Server up-to-date with the official security patches provided by Oracle.

Web Logic Server is deployed on all DataNodes where Big Data SQL is installed. It is embedded in the bd_cells (Big Data SQL processing cells) running on these nodes. At the time of Big Data SQL 4.0 release, the WebLogic Server 10.3.6 component is up-to-date with the 2019 April PSU. To maintain WebLogic Server, you should run the patching tool after the Big Data SQL installation and then again later as newer PSUs become available.

The patching tool is available as patch 31188867 on My Oracle Support. Click on the Patches and Updates tab and enter "31188867" into the search field. Installation instructions are in the readme file provided with the patch. However, also review Document 2662568.1 in My Oracle Support. It provides some information that is not in the readme file.