3 Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL

Oracle Big Data SQL must be installed on both the Hadoop cluster management server and the Oracle Database system. This section describes the installation of Oracle Big Data SQL on both single-instance and RAC systems.

3.1 Before You Start the Database-Side Installation

Review the points in this section before starting the installation.

If the current version of Oracle Big Data SQL is already installed on the Oracle Database system and you are making changes to the existing configuration, then you do not need to repeat the entire installation process. See Reconfiguring an Existing Oracle Big Data SQL Installation.

Important:

  • If you intend to log in as the database owner user with an anonymous connection to SQL*Plus, first be sure that the ORACLE_HOME and ORACLE_SID environment variables are set properly. GI_HOME should also be set if Grid infrastructure is running on the system and its location is not relative to ORACLE_HOME.
  • For multi-node databases (such as Oracle RAC systems), you must repeat this installation on every compute node of the database. If this is not done, you will see RPC connection errors when the Oracle Big Data SQL service is started.

  • It is recommended that you set up passwordless SSH for root on the database nodes where Grid is running. Otherwise, you will need to supply the credentials during the installation on each node.

  • If the diskmon process is not already running prior to the installation, a Grid infrastructure restart will be required in order to complete the installation.

  • If you set up Oracle Big Data SQL connections from more than one Hadoop cluster to the same database, be sure that the configurations for each connection are the same. Do not set up one connection to use InfiniBand and another to use Ethernet. Likewise, if you enable database authentication in one configuration, then this feature must be enabled in all Oracle Big Data SQL connections between different Hadoop clusters and the same database.

  • If the database system is Kerberos-secured, then it is important to note that authentication for the database owner principal can be performed by only one KDC. Oracle Big Data SQL currently does not support multiple Kerberos tickets. If two or more Hadoop cluster connections are installed by the same database owner, all must use the same KDC.

    The KRB5_CONF environment variable must point to only one configuration file. The configuration file must exist, and the variable itself must be set for the database owner account at system startup.

    Be sure that you have installed a Kerberos client on each of the database compute nodes.

    Also, if the principal used on the Hadoop side is not the same as the principal of the database owner, then before running the database-side installation, you must copy the correct principal from the KDC to the Oracle Database side. Then when you run the installer, include the --alternate-principal parameter in the command line in order to direct the installer to the correct keytab. (See Command Line Parameter Reference for bds-database-install.sh

  • For database running under Oracle Grid Infrastructure, if the system has more than one network interfaces of the same type (two or more Ethernet, two or more InfiniBand, or two or more Ethernet-over-InfiniBand interfaces), then the installation always selects the one whose name is first in alphabetical order.

3.1.1 Check for Required Grid Patches

Make sure all patches to the Oracle Grid Infrastructure required by Oracle Big Data SQL have been installed.

Check the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) to find up-to-date information on patch requirements.

3.1.2 Potential Requirement to Restart Grid Infrastructure

In certain database environments bds-database-install.sh needs to create cellinit.ora and/or celliniteth.ora. In these cases, the script will attempt to propagate similar changes across all nodes in the Grid Infrastructure. To do this, the script expects that passwordless SSH is set up between oracle and root or it will prompt for the root password for each node during execution. If the nature of the changes requires a restart of the Grid Infrastructure, the script will also display messages indicating that grid infrastructure needs to be restarted manually. Because the installlation cannnot complete without Grid credentials if a restart is necessary, be sure that you have the Grid password at hand.

3.1.2.1 Understanding When Grid or Database Restart is Required

On the database side of Oracle Big Data SQL, the diskmon process is the agent in charge of communications with the Hadoop cluster. This is similar to its function on the Oracle Exadata Database Machine, where it manages communications between compute nodes and storage nodes.

In Grid environments, diskmon is owned by the Grid user. In a non-Grid environment, it is owned by the Oracle Database owner.

In Oracle Big Data SQL, diskmon settings are stored on cellinit.ora and celliniteth.ora files in the /etc/oracle/cell/network-config/ directory. The installer updates these files in accordance with the cluster connection requirements.

This is how the installer determines when Grid or Oracle Database needs to be restarted:

  • If the installer detects that no previous cellinit.ora or celliniteth.ora file exists, this means that no diskmon process is running. In this case, if the environment includes Oracle Grid then you must restart the Grid. If the environment does not include Grid, then you must restart Oracle Database.

  • If previous cellinit.ora and/or celliniteth.ora file exist, this indicates that diskmon process is running. In this case, if the installer needs to make a change to these files, then only the database needs to be restarted.

  • In multi-node Grid environments, diskmon works on all nodes as a single component and cellinit.ora and celliniteth.ora must be synchronized on all nodes. This task is done through SSH. If passwordless SSH is set up on the cluster, no user interaction is required. If passwordless SSH is not set up, then the script will pause for you to input the root credentials for all nodes. When the cellinit.ora and celliniteth.ora files across all nodes are synchronized, then the script will continue. Then the script finishes and in this case, you must restart the Grid infrastructure.

3.1.3 Special Considerations When a System Under Grid Infrastructure has Multiple Network Interfaces of the Same Type

The Oracle Big Data SQL installation or SmartScan operation may sometimes fail in these environments because the wrong IP address is selected for communication with the cells.

When installing Oracle Big Data SQL within Oracle Grid Infrastructure, you cannot provide the installer with a specific IP address to select for communication with the Oracle Big Data SQL cells on the nodes. Network interface selection is automatically determined in this environment. This determination is not always correct and there are instances where the database-side installation may fail, or, you may later discover that SmartScan is not working.

You can manually correct this problem, but first it is helpful to understand how the installation decides which network interfaces to select.

How the Installation Selects From Among Multiple Interfaces of the Same Type on a System Under Grid Infrastructure

  • The diskmon process is controlled by Oracle Grid Infrastructure and not by the database. Grid manages communications with the Oracle Big Data SQL cells.

  • The Oracle Big Data SQL installer in these cases does not create cellinit.ora and celliniteth.ora, nor does it update the cell settings stored in these files. In these environments, the task is handled by Grid, because it is a cluster-wide task that must be synchronized across all nodes.

  • If there are multiple network interfaces, the Grid-managed update to the cells automatically selects the first network interface of the appropriate type on each node. It selects the interface whose name is first in alphabetical order.

For example, here is a system that is under Grid Infrastructure. It has multiple InfiniBand, Ethernet, and Ethernet over InfiniBand (bondeth*)  network interfaces. This is the list of interfaces:

[root@mynode ~]# ip -o -f inet addr show
1: lo    inet 127.0.0.1/8 
2: eth0    inet 12.17.207.156/21 
3: eth1    inet 16.10.145.12/21 
19: bondeth0    inet 12.17.128.15/20 
20: bondeth2    inet 16.10.230.160/20 
21: bondeth4    inet 192.168.1.45/20 
30: bondib0    inet 192.168.31.178/21 
31: bondib1    inet 192.168.129.2/21 
32: bondib2    inet 192.168.199.205/21 
33: bondib3    inet 192.168.216.31/21 
34: bondib4    inet 192.168.249.129/21 

When the Oracle Big Data SQL installer runs on this system, the following interfaces would be selected.

  • 192.168.31.178/21 is selected for the InfiniBand connection configured in cellinit.ora.

    Among the InfiniBand interfaces on this system, the interface name bondib0 is first in an ascending alphabetical sort.

  • 12.17.128.15/20 is selected for an Ethernet-over-InfniBand connection configured in celliniteth.ora.

    Note:

    This example demonstrates an additional selection factor – Ethernet over InfiniBand takes precedence over standard Ethernet.

    The bondeth0 interface name is first in this case.

How the Installation (or SmartScan) may Fail Under These Conditions

It is possible that diskmon cannot connect to the Oracle Big Data SQL cells via the network interface selected according to the logic described above. The correct subnet (one that diskmon can reach) may not appear first in an alphabetical sort.

How to Fix the Problem

You can manually change the IP addresses in thecellinit.ora and celliniteth.ora files. These files are at /etc/oracle/cell/network-config on each node.

  1. Stop the CRS process. (Be sure to do this before the cell edit. If you do not, diskmon may hang.)

  2. Edit cellinit.ora and/or celliniteth.ora . Change the IP addresses as needed.

  3. Restart CRS.

3.2 About the Database-Side Installation Directory

You start the database side of the installation by unpacking the database-side installation bundle and executing the run file it contains. The run file creates an installation directory under $ORACLE_HOME/BDSJaguar-4.1.1. For example:

$ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.mycluster.<domain>.com

The installation of Oracle Big Data SQL is not finished when this directory is created. The directory is a staging area that contains all of the files needed to complete the installation on the database node.

There can be Oracle Big Data SQL connections between the database and multiple Hadoop clusters. Each connection is established through a separate database-side installation and therefore creates a separate installation directory. The segments in the name of the installation directory enable you to identify the Hadoop cluster in this specific connection:

<Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>

You should keep this directory. It captures the latest state of the installation and you can use it to regenerate the installation if necessary. Furthermore, if in the future you need to adjust the database-side of the installation to Hadoop cluster changes, then the updates generated by the Jaguar reconfigure command are applied to this directory.

Consider applying permissions that would prevent the installation directory from being modified or deleted by any user other than oracle (or other database owner).

3.3 Steps for Installing on Oracle Database Nodes

To install the database side of Oracle Big Data SQL, copy over the zip file containing the database-side installation bundle that was created on the Hadoop cluster management server, unzip it, execute the run file it contains, then run the installer.

Perform the procedure in this section as the database owner (oracle or other). You stage the bundle in a temporary directory, but after you unpack the bundle and execute the run file it contains, then the installation package is installed in a subdirectory under $ORACLE_HOME. For example: $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.mycluster.<domain>.com.

Before starting, check that ORACLE_HOME and ORACLE_SID are set correctly. You should also review the script parameters in Command Line Parameter Reference for bds-database-install.sh so that you know which ones you should include in the install command for a given environment.

How Many Times Do I Run the Installation?

You must perform the installation for each instance of each database. For example, if you have a non-CBD database and a CBD database (DBA and DBB respectively, in the example below) on a single two-node RAC, then you would install Oracle Big Data SQL on both nodes as follows:

  • On node 1
    ./bds-database-install.sh --db-resource=DBA1 --cdb=false   
    ./bds-database-install.sh --db-resource=DBB1 --cdb=true
  • On node 2
    ./bds-database-install.sh --db-resource=DBA2 --cdb=false
    ./bds-database-install.sh --db-resource=DBB2 --cdb=true  

Copy Over the Components and Prepare for the Installation

Copy over and install the software on each database node in sequence from node 1 to node n.

  1. If you have not already done so, use your preferred method to copy over the database-side installation bundle from the Hadoop cluster management server to the Oracle Database server. If there are multiple bundles, be sure to select the correct bundle for cluster that you want to connect to Oracle Database. Copy the bundle to any secure directory that you would like to use as a temporary staging area.

    In this example the bundle is copied over to /opt/tmp.
    scp root@<hadoop node name>:/opt/oracle/BDSJaguar/db-bundles/bds-4.1.1-db-<Hadoop
        cluster>-200421.0517.zip /opt/tmp
  2. If you generated a database request key, then also copy that key over to the Oracle Database server.

    
    # scp root@<hadoop node name>:/opt/oracle/BDSJaguar/dbkeys/<database name or other name>.reqkey oracle@<database_node> /opt/tmp
  3. Then, cd to the directory where you copied the file (or files).

  4. Unzip the bundle. You will see that it contains a single, compressed run file.

  5. Check to ensure that the prerequisite environment variables are set – $ORACLE_HOME and $ORACLE_SID.

  6. Run the file in order to unpack the bundle into $ORACLE_HOME. For example:
    $ ./bds-4.1.1-db-cdh510-170309.1918.run

    Because you can set up independent Oracle Big Data SQL connections between an Oracle Database instance and multiple Hadoop clusters, the run command unpacks the bundle to a cluster-specific directory under $ORACLE_HOME/BDSJaguar-4.1.1 . For example:

    $ ls $ORACLE_HOME/BDSJaguar-4.1.1
      cdh510-6-node1.mycluster.<domain>.com
      test1-3-node1.myothercluster.<domain>.com

    If the BDSJaguar-4.1.1 directory does not already exist, it is created as well.

  7. If you generated a database request key, then copy it into the newly created installation directory. For example:

    $ cp /opt/tmp/mydb.reqkey $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.mycluster.<domain>.com
    

    Tip:

    You also have the option to leave the key file in the temporary location and use the --reqkey parameter in the installation command in order to tell the script where the key file is located. This parameter lets you specify a non-default request key filename and/or path.

    However, the install script only detects the key in the installation directory when the key filename is the same as the database name. Otherwise, even if the key is local, if you gave it a different name then you must still use --reqkey to identify to the install script.

  8. If you plan to authenticate with a Kerberos principal that is not the same as the Kerberos principal used on the Hadoop side, then copy the keytab for that principal into the installation directory.

    Skip this step if you plan to use the same principal on both sides of the installation. A copy of the keytab that was used for authentication in the Hadoop cluster is already in the installation directory.

Now that the installation directory for the cluster is in place, you are ready to install the database side of Oracle Big Data SQL.

Install Oracle Big Data SQL on the Oracle Database Node

Important:

The last part of the installation may require a single restart of Oracle Database under either or both of these conditions:

  • If Oracle Database does not include the Oracle Grid Infrastructure. In this case, the installation script makes a change to the pfile or spfile configuration file in order to support standalone operation of diskmon.

  • If there are changes to the IP address and the communication protocol recorded in cellinit.ora. These parameters define the connection to the cells on the Hadoop cluster. For example, if this is a re-installation and the IP address for the Hadoop/Oracle Database connection changes from an Ethernet address to an InfiniBand address and/or the protocol changes (between TCP and UDP), then a database restart is required.

  1. Start the database if it is not running.

  2. Log on to the Oracle Database node as the database owner and change directories to the cluster-specific installation directory under $ORACLE_HOME/BDSJaguar-4.1.1. For example:

    $ cd $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.my.<domain>.com 
    
  3. Run bds-database-install.sh, the database-side Oracle Big Data SQL installer. You may need to include some optional parameters.

    $ ./bds-database-install.sh [options]
  4. Restart the database (optional).

See Also :

The bds-database-install.sh installer command supports parameters that are ordinarily optional, but may be required for some configurations. See the Command Line Parameter Reference for bds-database-install.sh

Extra Step If You Enabled Database Authentication or Hadoop Secure Impersonation

  • If database_auth_enabled or impersonation_enabled was set to “true” in the configuration file used to create this installation bundle, copy the ZIP file generated by the database-side installer back to the Hadoop cluster management server and run the Jaguar “Database Acknowledge” operation. This completes the set up of login authentication between Hadoop cluster and Oracle Database.

    Find the zip file that contains the GUID-key pair in the installation directory. The file is named according to the format below.
    <name of the Hadoop cluster>-<number nodes in the cluster>-<FQDN of the node where the cluster management server is running>-<FQDN of this database node>.zip
    For example:
    $ ls $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.my.<domain>.com/*.zip 
    $ mycluster1-18-mycluster1node03.<domain>.com-myoradb1.<domain>.com.zip
    1. Copy the ZIP file back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server.

    2. Log on to the cluster management server as root, cd to /BDSJaguar under the directory where Oracle Big Data SQL was installed, and run databaseack (the Jaguar “database acknowledge” routine). Pass in the configuration file that was used to generate the installation bundle (bds-config.json or other).

      # cd <Big Data SQL Install Directory>/BDSJaguar
      # ./jaguar databaseack bds-config.json

3.3.1 Command Line Parameter Reference for bds-database-install.sh

The bds-database-install.sh script accepts a number of command line parameters. Each parameter is optional, but the script requires at least one.

Table 3-1 Parameters for bds-database-install.sh

Parameter Function
--install

Install the Oracle Big Data SQL connection to this cluster.

Note:

In mid-operation, this script will pause and prompt you to run a second script as root:

bds-database-install: root shell script /u03/app/masha/12.1.0/
dbhome_mydb/install/bds-database-install-10657-root-scriptclust1.sh
please run as root:
<enter> to continue checking..
q<enter> to quit
bds-database-install: root

As root, open a session in a second terminal and run the script there. When that script is complete, return to the original terminal and press Enter to resume the bds-database-install.sh session.

Because Oracle Big Data SQL is installed on the database side as a regular user (not a superuser), tasks that must be done as root and/or the Grid user require the installer to spawn shells to run other scripts under those accounts while bds-database-install.sh is paused.

In some earlier releases, the --install parameter was implicit if no other parameters are supplied. You must now explicitly include this parameter in order to do an installation.

--version Show the bds-database-install.sh script version.
--info

Show information about the cluster, such as the cluster name, cluster management server host, and the web server.

--grid-home

Specifies the Oracle Grid home directory.

--crs

Use crsctl to set up Oracle Big Data SQL MTA extprocs. Ignored in non-Grid environments.

--crs={true|false}

The installer always checks to verify that Grid is running. If Grid is not running, then the installer assumes that Grid is not installed and that the database is single-instance. It then automatically sets the crs flag to false.

If --crs=true is explicitly set and Grid cannot be found, the installer terminates with an error message stating that GI_HOME must be set.

--alternate-principal This optional parameter enables you to specify a Kerberos principal that is other than the database owner. It can be used in conjunction with the --install, --reconfigure, and --uninstall parameters.

See About Using an Alternate Kerberos Principal on the Database Side below for details.

--alternate-repo

CDH 6 uses yum for the client installations. The --alternate-repo parameter enables you to specify an arbitrary yum repository for the installer to acquire the Hadoop, Hive, and HBase RPMs needed to complete the Big Data SQL installation on the database side. (The optional url and dir parameters in the bds-config.json configuration file on the Hadoop side of the installation are also for creating a pointer to a repository, but these parameters are ignored on CDH 6 systems.)

For CDH 6.x prior to 6.3.3, --alternate-repo is optional. If Internet access is open, then by default the Hadoop-side Jaguar installer should be able to download the clients from the public Cloudera repository and integrate them into the database-side installation bundle that it generates. If the Internet is not accessible or if you want to download the clients from a different local or remote repository, use --alternate-repo. To enable the installer to download directly from the Cloudera repository (or elsewhere) pass the URL, as in this example for CDH 6.2.1:
# ./bds-database-install.sh --alternate-repo="https://archive.cloudera.com/cdh6/6.2.1/<Red Hat OS version>/yum/" <other parameters>
For CDH 6.3.3 and later, --alternate-repo is required, because you must provide Cloudera login credentials in order to download files for these releases from Cloudera. The Big Data SQL installer cannot automatically download CDH 6.3.3 and later files from the public repository. For these later releases, include you Cloudera user account and password, as in this example:
# ./bds-database-install.sh --alternate-repo="https://<Cloudera account name>:<Cloudera password>@archive.cloudera.com/p/cdh6/6.3.3/<Red Hat OS version>/yum/" <other parameters>

Note:

You can set up a local yum repository that includes the correct version of the HDFS, Hive, and HBase clients and then point to that local repo instead of the Cloudera public repository:
# ./bds-database-install.sh --alternate-repo="<valid yum repository baseurl>" <other parameters>
See Cloudera's instructions on Configuring a Local Package Repository.

There are several options if HTTP is not available on the database system:

  • Temporarily install a lightweight HTTP server as described in the Cloudera article at the URL above and then remove it after the Big Data SQL installation.
  • If HTTP is not an option, then on a machine that has Internet access, create a local mirror of the Cloudera repository (also described in the article) and then move the files into a directory that is accessible from the database server. This could be a local directory on the database machine itself or an NFS share elsewhere. Then, use a tool such as createrepo to set up the directory as a yum repository. This enables you to use the FILE:/// protocol for the baseurl passed to --alternate-repo, as in:
    # ./bds-database-install.sh  --alternate-repo="file:///path/to/local/repo"  <other parameters>

The --alternate-repo parameter is required for CDH 6.3.3 and greater, optional for CDH 6.x prior to 6.3.3, unavailable for CDH 5 as well as for HDP systems.

--cdb

Create database objects on all PDBs for CDB databases.

--cdb={true|false} 
--reqkey

This parameter tells the installer the name and location of the request key file. Database Authentication is enabled by default in the configuration. Unless you do not intend to enable this feature, then as one of the steps to complete the configuration, you must use --reqkey.

  • To provide the key file name and the path if the key file is not in the local directory:

    $ ./bds-database-install.sh --reqkey=/opt/tmp/some_name.reqkey --install

  • To provide the key file name if the file is local, but the filename is not the same as the database name:

    $ ./bds-database-install.sh --reqkey=some_name.reqkey --install

If the request key filename is provided without a path, then the key file is presumed to be in the installation directory. The installer will find the key if the filename is the same as the database name.

For example, in this case the request key file is in the install directory and the name is the same as the database name.

$ ./bds-database-install.sh --install

This file is consumed only once on the database side in order to connect the first Hadoop cluster to the database. Subsequent cluster installations to connect to the same database use the configured key. You do not need to resubmit the .reqkey file.

--uninstall
Uninstall the Oracle Big Data SQL connection to this cluster. If you installed using the --alternate-principal parameter, then also include this parameter when you uninstall.
./bds-database-install --uninstall --alternateprincipal="<primary>/<instance>@<REALM>"
See --alternate-principal in this table for more detail.
--reconfigure

Reconfigures bd_cell network parameters, Hadoop configuration files, and Big Data SQL parameters.   The Oracle Big Data SQL installation to connect to this cluster must already exist.

Run reconfigure when a change has occurred on the Hadoop cluster side, such as a change in the DataNode inventory.

 --databaseack-only

Create the Database Acknowledge zip file. (You must then copy the zip file back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server and run ./jaguar databaseack <configuration file>.)

 --mta-restart

 MTA extproc restart. (Oracle Big Data SQL must already be installed on the database server.)

--mta-setup

Set up MTA extproc with no other changes. (Oracle Big Data SQL must already be installed on the database server.)

--mta-destroy

Destroy MTA extproc and make no other changes. (Oracle Big Data SQL must already be installed on the database server.)

--aux-run-mode

Because Oracle Big Data SQL is installed on the database side as a regular user (not a superuser), tasks that must be done as root and/or the Grid user require the installer to spawn shells to run other scripts under those accounts while bds-database-install.sh is paused.

The--aux-run-mode parameter specifies a mode for running these auxiliary scripts.

--aux-run-mode=<mode>

Mode options are:

  • session – through a spawned session.

  • su — as a substitute user.

  • sudo — through sudo.

  • ssh — through secure shell.

  --root-script
Enables or disables the startup root script execution.
 --root-script={true|false}
--no-root-script

Skip root script creation and execution.

  --root-script-name Set a name for the root script (the default name is based on the PID).
--pdb-list-to-install For container-type databases, Oracle Big Data SQL is by default set up on all open PDBs. This parameter limits the setup to the specified list of PDBs.
--restart-db If a database restart is required, then by default the install script prompts the user to choose between doing the restart now or later. Setting --restart-db=yes tells the script in advance to proceed with any needed restart without prompting the user. If --restart-db=no then the prompt is displayed and the installation waits for a response. This is useful for unattended executions.
--skip-db-patches-check
To skip the patch validation, add this parameter when you run the installer:
./bds-database-install.sh
      --skip-db-patches-check

By default, database patch requirements are checked when you run ./bds-database-install.sh. If the patch validation fails, the installer returns a prompt with a warning message, indicating that there are some patches missing. Installation will continue after the warning.

--set-default-cluster
To set the default cluster to the current cluster in a multi-cluster installation, add this parameter when you run the installer:
./bds-database-install.sh
      --set-default-cluster

In a multi-cluster installation, the first cluster installed is considered the default cluster. If more clusters are installed, all queries use the java clients provided by the default cluster regardless of the versions of the actual cluster receiving the query. In some cases you may want to change the default, for example, to use newer java clients. If passing this parameter the current cluster is set as the default. The required database links and file system links are regenerated accordingly, and the existing multi-threading agents are restarted.

--force-incompatible To allow different versions of Hadoop, set this parameter when you run the installer:
./bds-database-install.sh
      --force-incompatibble

Without this parameter, the installer prevents the installation of clusters with different major or minor Hadoop versions. For example, if a Hadoop 2.6 cluster is installed and then a Hadoop 3.0 is installed the script does not allow the second install. This behavior avoids possible Hadoop version incompatibilities. However, this parameter can be used to override this behavior. When the force-incompatible parameter is used and two clusters of different Hadoop versions are detected the user simply gets a warning.

The --root-script-only parameter from previous releases is obsolete.

3.3.2 About Using an Alternate Kerberos Principal on the Database Side

The Kerberos principal that represents the Oracle Database owner is the default, but you can also choose a different principal for Oracle Big Data SQL authentication on Oracle Database.

The figure below shows how alternate principals are handled in the installation. On the Hadoop side of the install, use the Kerberos principal and keytab specified in the BDS-config.json file to set up Kerberos authentication for Big Data SQL on the Hadoop cluster. The same keytab is included in the installation bundle that you unzip on the database side. When you unzip the bundle and run the installer on the database side, you have three choices for setting up Kerberos authentication:

a. Ignore the principal and keytab provided with the installation bundle and use the database owner principal.

b. Use the principal and keytab provided with the bundle.

c. Specify a different principal.

Figure 3-1 Using the --alternate-principal Parameter with bds-database-install.sh

Description of Figure 3-1 follows
Description of "Figure 3-1 Using the --alternate-principal Parameter with bds-database-install.sh"

This is the explanation of options a, b, and c in the figure above.

Option a: Use the database owner principal for Kerberos authentication.

If the keytab of the database owner principal is present in the staging directory and you want to use that principal, then do not include the --alternate-principal parameter when you invoke the installer. By default the installer will use the database owner principal.

# ./bds-datbase-install.sh --install <other parameters, except --alternate-principal>

Option b: Use the principal included with the installation bundle for authentication.

Include --alternate-principal in the install command, but do not supply a value for this parameter. This tells the installer to use the principal that was specified in the Jaguar bds-config.json configuration file on the Hadoop side of the installation. This principal is included in the database-side installation bundle and already exists in the staging directory. This is the directory that the installation run file generated. It includes the installation files for connection to a specific Hadoop cluster and node. For example, $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.mycluster.mydomain.com.

# ./bds-datbase-install.sh --install --alternate-principal <other parameters>

Option c: Use a different principal for authentication.

Include --alternate-principal on the command line and pass the principal as the value.

This last case requires two extra steps before you run bds-database-install.sh. When you ran the run file that unpacked the database-side install bundle, it created a database-side installation directory. Copy the alternate keytab file into this directory. Then, rename the file by prefixing "bds_" to the principal. The prefix tells the installer to look for the principal in this keytab file. You can then run the command as follows:
./bds-database-install.sh --reconfigure --alternate-principal=<primary>/<instance>@<REALM>

Note:

Any bds-database-install.sh operation that includes the --alternate-principal parameter generates a script that must be run as root. You will be prompted to run this script and then to press Enter to continue and complete the operation. For example:
bds-database-install: root shell script : /home/oracle/db_home2/install/bds-database-install-28651-root-script-scaj42.sh
please run as root:
/home/oracle/db_home2/install/bds-database-install-28651-root-script-<cluster name>.sh
waiting for root script to complete, press <enter> to continue checking.. q<enter> to quit

Changing the Kerberos Principal Post Release

You can use the reconfigure operation to switch to a different principal.

./bds-database-install --reconfigure --alternate-principal="<other_primary>/<instance>@<REALM>"

Important:

If the keytab of the database owner principal is found by the installer in the staging directory (the directory where the run file unpacked the installer files), then the database owner principal is used and the --alternate-principal parameter is ignored.

Removing a Kerberos Principal and Keytab When You Uninstall Big Data SQL

If you specified an alternate principle when you installed Big Data SQL, then if you uninstall Big Data SQL later, include the same --alternate-principle parameter and value, so that the uninstall operation removes the principal and keytab from the system.

./bds-database-install --uninstall --alternate-principal="bds_<primary>/<instance>@<REALM>"

3.4 Granting User Access to Oracle Big Data SQL

In Oracle Big Data SQL releases prior to 3.1, access is granted to the PUBLIC user group. In the current release, you must do the following for each user who needs access to Oracle Big Data SQL:

  • Grant the BDSQL_USER role.
  • Grant read privileges on the BigDataSQL configuration directory object.
  • Grant read privileges on the Multi-User Authorization security table to enable impersonation.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;
SQL> grant read on BDSQL_USER_MAP to user1;

See Also:

Use the DBMS_BDSQL PL/SQL Package in the Oracle Big Data SQL User's Guide to indirectly provide users access to Hadoop data. The Multi-User Authorization feature that this package implements uses Hadoop Secure Impersonation to enable the oracle account to execute tasks on behalf of other designated users.

3.5 Enabling Object Store Access

If you want to use Oracle Big Data SQL to query object stores, certain database properties and Network ACL values must be set on the Oracle Database side of the installation. The installation provides two SQL scripts you can use to do this.

The ORACLE_BIGDATA driver enables you to create external tables over data within object stores in the cloud. Currently, Oracle Object Store, Microsoft Azure, and Amazon S3 are supported. You can create external tables over Parquet, Avro, and text files in these stores. The first step is set up access to the object stores as follows.

Run set_parameters_cdb.sql and allow_proxy_pdb.sql to Enable Object Store Access

  1. After you run bds-database-install.sh to execute the database-side installation, find these two SQL script files under $ORACLE_HOME, in the cluster subdirectory under the BDSJaguar directory:
    set_parameters_cdb.sql
    allow_proxy_pdb.sql
  2. Open and read each of these files. Confirm that the configuration is correct.

    Important:

    Because there are security implications, carefully check that the HTTP server setting and other settings are correct.
  3. In CBD ROOT, run set_parameters_cdb.sql.
  4. In each PDB that needs access to object stores, run allow_proxy_pdb.sql.

In a RAC database, you only need to run these scripts on one instance of the database.

3.6 Installing and Configuring Oracle SQL Access to Kafka

If you work with Apache Kafka clusters as well as Oracle Database, Oracle SQL Access to Kafka (OSAK) can give you access to Kafka brokers through Oracle SQL. You can then query data in the Kafka records and also join the Kafka data with data from Oracle Database tables.

After completing the Oracle Database side of the Oracle Big Data SQL installation, you can configure OSAK connections to your Kafka cluster.

Installation Requirements

  • A cluster running Apache Kafka. Kafka version 2.3.0 is supported, as are all versions of Kafka that run on Oracle Big Data Appliance.
  • A version of Oracle Database from 12.2 to 19c.

    Note:

    Big Data SQL supports Oracle Database 12.1, but OSAK does not.

Steps Performed by the Oracle Big Data SQL Installer

OSAK is included in the database-side installation bundle. As part of the installation, the bds-database-install.sh installer described in the previous section of this guide does the following:

  • Copies the OSAK kit into the Big Data SQL directory ($ORACLE_HOME/bigdatasql/), unzips it, and verifies the contents of the kit.
  • Creates the symlink $ORACLE_HOME/bigdatasql/orakafka, which links to the extracted OSAK directory.
  • Sets JAVA_HOME for OSAK to $ORACLE_HOME/bigdatasql/jdk.
You can check for the OSAK installation steps in the bds-database-install.sh output. This example is from an Oracle Database 19c system. It is the same on earlier database releases except for the shiphome path.
 ...
 bds-database-setup: installing orakafka kit
 
 Step1: Check for valid JAVA_HOME
 --------------------------------------------------------------
 Found /u03/app/oracle/19.1.0/dbhome_orcl/shiphome/database/bigdatasql/jdk/bin/java, \
JAVA_HOME path is valid.
 Step1 succeeded.
 
 Step2: JAVA version check
 --------------------------------------------------------------
 java version "1.8.0_171"
 Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
 Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
 Java version >= 1.8
 Step2 succeeded.
 
 Step3: Creating configure_java.sh script
 --------------------------------------------------------------
 Wrote to /u03/app/oracle/19.1.0/dbhome_orcl/shiphome/database/bigdatasql/orakafka-1.1.0\
/bin/scripts/configure_java.sh
 Step3 succeeded.
 
 Successfully configured JAVA_HOME in /u03/app/oracle/19.1.0/dbhome_orcl/shiphome/database\
/bigdatasql/orakafka-1.1.0/bin/scripts/configure_java.sh
 
 The above information is written to /u03/app/oracle/19.1.0/dbhome_orcl/shiphome/database\
/bigdatasql/orakafka-1.1.0/logs/set_java_home.log.2020.02.20-17.40.23
...

Steps You Must Perform to Complete the OSAK Installation and Configuration

To complete the installation and configuration, refer to the instructions in the Oracle SQL Access to Kafka README_INSTALL file at $ORACLE_HOME/bigdatasql/orakafka/doc. The README_INSTALL file tells you how to use orakafka.sh. This script can help you perform the remaining configuration steps:

  • Add a new Kafka cluster configuration directory under $ORACLE_HOME/bigdatasql/orakafka/clusters/.
  • Allow a database user to access the Kafka cluster.
  • Install the ORA_KAFKA PL/SQL packages in a user schema and create the required database directories.

Important:

If you are setting up access to a secured Kafka cluster, then after generating the cluster directory, you must also set the appropriate properties in this properties file.
$ORACLE_HOME/bigdatasql/orakafka/clusters/<cluster_name>/conf/orakafka.properties

The properties file includes some tested subsets of possible security configurations. For general information about Kafka security, refer to the Apache Kafka documentation at https://kafka.apache.org/documentation/#security

When you have completed the configuration, the next step is to register your Kafka cluster. After registering the cluster, you can start using Oracle SQL Access to Kafka to create views over Kafka topics and query the data.

See Also:

Oracle SQL Access to Kafka in the Oracle Big Data SQL User's Guide shows you how to register your Kafka cluster and start working with OSAK.