3 Installing the Oracle Database Side of Oracle Big Data SQL

3.1 Installing on the Oracle Database Server

Oracle Big Data SQL must be installed on both the Hadoop cluster management server and the Oracle Database server nodes. This section describes the full installation of Oracle Big Data SQL 3.1 on Oracle Database.

Prerequisites for Installing on an Oracle Database Server

If Oracle Big Data SQL is already installed on a database node and you are updating the existing configuration, see Reconfiguring an Existing Oracle Big Data SQL Installation on Database Nodes.

Important:

  • For multi-node databases (such as Oracle RAC systems), you must repeat this installation on every node of the database. If this is not done, you will see RPC connection errors when the Oracle Big Data SQL service is started. You can reuse the same installation bundle on each database node by using a parameter to pass in the correct network interface address when you run the installation script. This is described in the section, If You Need to Change the Configured Database_IP Address, below.

  • On Oracle Big Appliance, Oracle Big Data SQL can connect to Oracle Database on either Exadata Database Machine or an non-Exadata server, but not to both at the same time.


Required Software

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for supported Linux distributions, Oracle Database release levels, and required patches.

Note:

Be sure that the correct Bundle Patch and any one-off patches identified in the Compatibility Matrix have been pre-applied before starting this installation.

Recommended Network Connections to the Hadoop Cluster

For Ethernet connections between Oracle Database and the Hadoop cluster, Oracle recommends 10 Gb/s Ethernet.

Environment Variables

The following are required:

  • ORACLE_SID

  • ORACLE_HOME

GI_HOME (Grid Infrastructure home directory) is optional if it is located on a relative path accessible from ORACLE_HOME. Otherwise, if Grid home exists then GI_HOME must be set. If Grid home does not exist on a node, then you must use the crs=false option when you run the bds-database-install.sh script as described in the steps below

Copy, Extract, and Run the Big Data SQL Installation Script on all Database Nodes

Perform the procedure in this section as the oracle user, except where sudo is indicated.

The latter part of the installation may require a single restart of Oracle Database under either or both of these conditions:

  • If Oracle Database does not include the Oracle Grid Infrastructure. In this case, the installation script makes a change to the pfile or spfile configuration file in order to support standalone operation of diskmon.

  • If there are changes to the IP address and the communication protocol recorded in cellinit.ora. These parameters define the connection to the cells on the Hadoop cluster. For example, if the IP address changes from an Ethernet address to an InfiniBand address and/or the protocol changes (between TCP and UDP), then a database restart is required.

  1. If you haven’t done so already as part of the instructions in Creating the Database-Side Installation Bundle, copy the installation bundle from the BDSSetup directory on the Hadoop cluster management server to /home/oracle on the database node.

    $ scp bds-database-install.zip oracle@<database_node>:/home/oracle
    
  2. Log on to the database node as oracle in order perform the installation.

  3. Unzip the bundle and then cd into the bds-database-install directory that was extracted from the zip file.

    $ unzip bds-database-install.zip
    $ cd bds-database-install
    
  4. Run bds-database-install.sh.

    Note the optional bds-database-install parameters listed in Table 3-1.

    [oracle@mydbserver bds-database-install]$ ./bds-database-install.sh
    

    Special Conditions:

    • If the database does not use Oracle Grid, then use the --crs=false option with the bds-database-install.sh command. For example:

      [oracle@mydbserver bds-database-install]$ ./bds-database-install.sh --crs=false --dbname=<db-name> 
      

      If Oracle Grid is not running, you must also start MTA extproc manually, as in ORACLE_HOME/bin/mtactl start <MTA_SID>. In the default installation, MTA_SID is formulated as bds_<ORACLE_SID>_<Hadoop_cluster_name>.

    If bds-database-install.sh finds and updates /etc/oracle/cell/network-config/cellinit.ora, then the installation is complete and you will see this message:
    bds-database-install: installation successful for cluster <cluster name>
    

Table 3-1 Optional Parameters for bds-database-install.sh

Parameter Function
--crs Specifies use of Oracle Grid support. If Grid is not running on this database node, or, if this database does not use Grid (CRS/ASM) in any way, then apply the --crs=false option with the bds-database-install.sh script.
--cdb Specifies the use of multitier databases.
--db-resource Specify the Oracle Database SID (Oracle Database instance name). The parameter --db-name is synonym from older releases and is still supported.
--debug Activate shell trace mode. If you report a problem, Oracle Support may want to see this output.
--grid-home Specifies the Oracle Grid home directory.
--info Show information about the cluster.
--ip-cell Set a particular IP address for db_cell process. See If You Need to Change the Configured Database_IP Address below.
--install-as-secondary

Specifies a secondary installation. See Primary and Secondary Installations below.

--jdk-home Specifies the JDK home directory.
--reconfigure Modify the parameters of the connection of this node with the Hadoop cluster.
--root-script-only Generate the root script only.
--uninstall-as-primary Removes information about the primary cluster from the database.

Important:

Note that --uninstall-as-primary also removes key information that is also used by any secondary clusters. To avoid this, use --uninstall-as-secondary to uninstall the primary unless you do intend to unregister all Hadoop clusters from the database. Otherwise to restore the connection to earch secondary Hadoop cluster, you would need to reinstall the software from the database bundle that was generated for that cluster.
--uninstall-as-secondary Removes the metadata about a Hadoop cluster from Oracle Database, but does not remove key information that is common to all clusters.
--version Show the bds-database-install.sh script version.

Primary and Secondary Installations

Oracle Big Data SQL on the Oracle Database side can provide access to a single Hadoop cluster or multiple Hadoop clusters (which can be on the same rack or on different on racks in the network).

Each installation registers a Hadoop cluster with the database. The first cluster registered with the database becomes the default/primary Hadoop cluster for the database. This first installation also installs the JDK, Hadoop client, and Hive client needed to access data on a Hadoop cluster.

If you want the database to connect to an additional Hadoop cluster, deploy the database-side installation bundle generated on that cluster to the database server. Unpack it, and run bds-database-install.sh with the --install-as-secondary option. This “secondary installation” adds the metadata about the additional Hadoop cluster to the Oracle Database without unnecessarily repeating the installation of the clients and the JDK.

When there are multiple clusters (one primary and any number of secondaries) registered with the database, an Oracle Big Data SQL user can query each Hadoop cluster separately or do JOINs to combine data from different clusters.

If You Need to Change the Configured Database_IP Address

The DATABASE_IP parameter in the bds-config.json file identifies the network interface of the database node. If you run bds-database-install.sh with no parameter passed in, it will search for that IP address (with that length, specifically) among the available network interfaces. You can pass the ––ip-cell parameter to bds-database-install.sh in order to override the configured DATABASE_IP setting:

$ ./bds-database-install.sh --ip-cell=10.20.30.40/24

Possible reasons for doing this are:

  • bds-database-install.sh terminates with an error. The configured IP address may be wrong.

  • There is an additional database node in the cluster and the defined DATABASE_IP address is not a network interface of the current node.

  • The connection is to a multi-node database. In this case, perform the installation on each database node. On each node, use the ––ip-cell parameter to set the correct DATABASE_IP value.

To determine the correct value for ip-cell, you can use list all network interfaces on a node as follows:
/sbin/ip -o -f inet addr show

3.2 Potential Requirement to Restart Grid Infrastructure

In certain database environments bds-database-install.sh needs to create cellinit.ora and/or celliniteth.ora. If so, then similar changes need to be propagated across all nodes in the grid infrastructure. In those cases, the script expects passwordless SSH setup between oracle and root or it will prompt for the password for each node during execution. If the nature of the changes requires a restart of the grid infrastructure, the script will also display messages indicating that grid infrastructure needs to be restarted manually. Because the installlation cannnot complete without grid credentials if a restart is necessary, be sure that you have the grid password at hand.

3.2.1 Understanding When Grid or Database Restart is Required

On the database side of Oracle Big Data SQL, the diskmon process is the agent in charge of communications with the Hadoop cluster. This is similar to its function on the Oracle Exadata Database Machine, where it manages communications between compute nodes and storage nodes.

In Grid environments, diskmon is owned by the Grid user. In a non-Grid environment, it is owned by the Oracle Database owner.

In Oracle Big Data SQL, diskmon settings are stored on cellinit.ora and celliniteth.ora files in the /etc/oracle/cell/network-config/ directory. The installer updates these files in accordance with the cluster connection requirements.

This is how the installer determines when Grid or Oracle Database needs to be restarted:

  • If the installer detects that no previous cellinit.ora or celliniteth.ora file exists, this means that no diskmon process is running. In this case, if the environment includes Oracle Grid then you must restart the Grid. If the environment does not include Grid, then you must restart Oracle Database.

  • If previous cellinit.ora and/or celliniteth.ora file exist, this indicates that diskmon process is running. In this case, if the installer needs to make a change to these files, then only the database needs to be restarted.

  • In multi-node Grid environments, diskmon works on all nodes as a single component and cellinit.ora and celliniteth.ora must be synchronized on all nodes. This task is done through SSH. If passwordless SSH is setup on the cluster, no user interaction is required. If not passwordless SSH is not set up, then the script will pause for you to input the root credentials for all nodes. When the cellinit.ora and celliniteth.ora files across all nodes are synchronized, then the script will continue. Then the script finishes and in this case, you must restart the Grid infrastructure.

3.3 Granting Oracle Big Data Access to Users

In Oracle Big Data SQL releases prior to 3.1, access is granted to the PUBLIC user group. In 3.1 (and later releases), you must do the following for each user who needs access to Oracle Big Data SQL:

  • Grant the BDSQL_USER role.

  • Grant read privileges on the BigDataSQL configuration directory object.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;