2 Installing or Upgrading the Hadoop Side of Oracle Big Data SQL
After downloading the Oracle Big Data SQL deployment bundle and extracting the files, the next step is to configure the installer and then run the installation.
The installation of Oracle Big Data SQL on the Hadoop cluster is deployed using the services provides by the cluster management server (Cloudera Manager or Ambari). The Jaguar install
command uses the management server API to register the BDS service and start the deployment task. From there, the management server controls the process and deploys the software to the nodes of the cluster and installs it.
The installation also generates the deployment bundle for the database side of the installation.
If a previous version of Oracle Big Data SQL is already installed, Oracle Big Data SQL upgrades the installation to Release 3.2.1.2. Users of previous Oracle Big Data SQL releases, please note that setup-bds
, the previous Hadoop-side installer, had been replaced by the Jaguar installer and that the configuration parameters for the installation have changed significantly in this release.
2.1 Before You Start
Check to Ensure that all DataNodes of the Cluster Meet the Prerequisites
Prerequisites for both the Hadoop and Oracle Database sides of the installation are described in Chapter 1.
If you miss a prerequisite, the installer will alert you. You can see this in the output examples in Oracle Big Data SQL Installation Examples.
Plan the Configuration and Edit the Jaguar Configuration File
Review the Jaguar Configuration Parameter and Command Reference in this chapter. You should know the answers to these questions before editing the configuration file:
- Does your Hadoop environment need to connect to both Oracle Database 12.1 and 12.2 or just one of these releases?
By default, Big Data SQL allows connections from both 12.1 and 12.2 databases. However, the offloaders that support each of these connections are resource-intensive, particularly for memory consumption on the nodes of the cluster. If you do not need dual support, you can save resource, by restricting connections to databases of one version. You set this in the configuration passed to the Jaguar installer. For example, you can enter this string to allow connections from 12.2 databases only:
"database_compatibility" : [ "12.2" ]
-
Do you want to enable Database Authentication in order to validate connection requests from Oracle Database to Big Data SQL server processes on the Hadoop DataNodes?
Database Authentication in the network connection between the Oracle Database and Hadoop is set to
“true”
in the configuration by default. You have the option to disable it by setting thedatabase_auth_enabled
to“false”
:"database_auth_enabled" : "false",
-
Do you want to use the Multi-User Authorization feature?
Multi-User Authorization enables you to grant users other than
oracle
permissions to run SQL queries against the Hadoop cluster.The first step in setting up Multi-User Authorization is to set these parameters in the security section of the configuration file:
Note that you can add any account to the blacklist."impersonation_enabled" : "true", "impersonation_usehosts" : "true", "impersonation_blacklist" : "hdfs,hive"
-
Are the Hadoop cluster and the Oracle Database system going to communicate over Ethernet or over InfiniBand? Also, do the Hadoop nodes have more than one network interface?
See theuse_infiniband
andselection_subnet
parameters. (Theselection_subnet
does not apply to Oracle Big Data Appliance.)
By default"use_infiniband" : "false", "selection_subnet" : "5.32.128.10/21"
use_infiniband
is set to false. Ethernet is the default protocol. -
Are you going pre-download the Hadoop and Hive client tarballs and set up a local repository or directory where the installer can acquire them, or, do you want the installer to download them directly from the Cloudera or Hortonworks repository on the Internet?
On Oracle Big Data Appliance the tarballs are stored on and automatically retrieved from a repository within the cluster. Therefore, the related configuration parameters do not apply. However, for other CDH systems or Hortonworks, you can choose to specify an URL or directory path or both for the tarballs. If instead you choose to accept the default and let the installer download the files from the Internet, then ensure that there is Internet access from the node where you are running the installation.
"url" : [ "http://repo1.domain.com/loc/hadoop", "http://repo2.domain.com/loc", "http://alternate.domain.com/backup/repo" ], "dir" : [ "/root/hadooprepo" ]
If you choose to let the installer download the tarballs from the Internet, you can configure the installer to work with HTTP/HTTPS proxies within your environment:
"http_proxy" : "http://my.proxy.server.com:80", "https_proxy" : "http://mysecure.proxy.server.com:80"
You can either set the proxy values in the configuration file or use preexisting proxy environment variables.
-
If the network is Kerberos-secured, do you want the installer to set up automatic Kerberos ticket renewal for the
oracle
account used by Oracle Big Data SQL?See the parameters in thekerberos
section:"principal" : "oracle/mycluster@MY.DOMAIN.COM", "keytab" : "/home/oracle/security/oracle.keytab", "hdfs-principal" : "hdfs/mycluster@MY.DOMAIN.COM", "hdfs-keytab" : "/home/hdfs/security/hdfs.keytab"
-
Do you want the Oracle Big Data SQL install process to automatically restart services that are in a stale state?
By default, stale services are restarted automatically. If you want to suppress this, you can set the
restart_stale
parameter in the configuration file to“false”
. -
Is the Hadoop cluster using the default REST API port for CDH or Ambari?
If not, set the
ports
parameter.
Note:
Setting these parameters in the configuration file does not complete the set up for some features. For example, to enable Database Authentication, you must also pass a special -—requestdb
parameter to the Jaguar utility in order to identify the target database or databases. There are also steps required to generate and install the security key used by this feature. To enable Multi-User Authorization, you start by setting the Hadoop Impersonation parameters in the configuration file, but also need to set up the authorization rules. The steps to complete these setups are provided where needed as you work through the instructions in this guide.
2.2 About the Jaguar Utility
Jaguar is a multifunction command line utility that you use to perform all Oracle Big Data SQL operations on the Hadoop cluster.
Jaguar currently supports these operations:
-
install
-
Deploys Oracle Big Data SQL binaries to each cluster node that is provisioned with the DataNode service.
-
Configures Linux and network settings for
bd_cell
(the Oracle Big Data SQL service) on each of these nodes. -
Generates the bundle that installs Oracle Big Data SQL on the Oracle Database side. It uses the parameter values that you set in the configuration file in order to configure the Oracle Database connection to the cluster.
-
-
updatenodes
Checks and updates the installation on the DataNodes of the Hadoop cluster.
-
reconfigure
Modifies the current configuration of the installation (according to the settings in the configuration file provided).
-
databasereq
Generates a request key file that contains one segment of the GUID-key pair used in Database Authentication. (The
databasereq
operation performs this function only. Forinstall
,updatenodes
, andreconfigure
, request key generation is an option that can be included as part of the larger operation.) -
databaseack
Perform the last step in Database Authentication setup – install the GUID-key pair on all Hadoop DataNodes in a cluster in order to allow queries from the Oracle Database that provided it.
-
uninstall
Uninstalls Oracle Big Data SQL from all DataNodes of the Hadoop cluster.
See Also:
Jaguar Operations in the next section provides details and examples.2.2.1 Jaguar Configuration Parameter and Command Reference
Configuration Parameters
The table below describes all parameters available for use in bds-config.json
or your own configuration file. Only the cluster name
parameter is always required. Others are required under certain conditions stated in the description.
Note:
When editing the configuration file, be sure to maintain the JSON format. Square brackets are required around lists, even in the case of a list with a single item.Table 2-1 Configuration Parameters in bds-config.json (or in Customer-Created Configuration Files)
Parameter | Type | Description |
---|---|---|
name |
String |
The name of the cluster. For CDH clusters (Oracle Big Data Appliance or other), this name can be either the physical cluster name or the display name. The installer searches first by physical name and then by display name. Required. |
database_compatibility | List | Supported database versions: "12.1" or "12.2".
All incoming requests from databases of the same version are attended by a single offloader process. By default, there is an offloader running for 12.1 and another for 12.2. To conserve system resources (primarily memory), it is highly recommended that you use this configuration parameter to specify which offloaders are required. If you exclude one or the other offloader from the list, that offloader process is not started. Optional. |
port |
Integer |
Cloudera Manager or Ambari REST API port. By default, on CDH clusters this port is 7183 for secured and 7180 for unsecured access. For Ambari, is 8443 for secured and 8080 for unsecured. Optional. |
restart_stale |
Boolean |
If If Optional. The default is |
dir |
List |
List of directories where the Hadoop clients for deployment on the database side are located. These directories can be on the local file system or on NFS. Directories are searched in the order listed. By default, the list is empty. If the Optional. Not applicable to Oracle Big Data Appliance. |
url |
List |
This is the list of URLs where the Hadoop client tarballs for deployment on the database side are located. If you data center already has repositories set up for access via HTTP, then you may prefer to maintain the Hadoop tarballs in that repository and use URL parameter for Oracle Big Data SQL installations. The URLs can be to the localhost, an internal network, or a site on the Internet (if the node has Internet access). The URLs are tried in the order listed. If Hadoop clients cannot be successfully located via the If access to all listed repositories fails and/or Internet access is blocked, the database installation bundle is not created and a warning message is displayed. After correcting any problems and providing access to a repository, you can re-run the installer using the Not applicable to Oracle Big Data Appliance. The tarballs are stored in a local repository in the cluster and the location is automatically added to the configuration file. |
use_infiniband |
Boolean |
If Used for Oracle Big Data Appliance clusters only. Default value: |
selection_subnet |
String |
If Hadoop cluster nodes have several network interfaces, you can use If the Hadoop cluster nodes have only one network interface, this parameter is ignored. The default value depends upon these conditions:
Note for Oracle Big Data Appliance Users: It's possible to configure several networks on an Oracle Big Data Appliance. If multiple networks exist, then this parameter must be set in order to select a specific network. |
|
String |
Specify the proxy settings to enable download the Hadoop client tarballs and cluster settings. If both of these strings are empty, the OS environment proxy settings are used. By default, both strings are empty. Using these two parameters in the configuration file is optional. If they are needed, you could instead set them externally as in Not applicable to Oracle Big Data Appliance |
database_auth_enabled |
Boolean |
If If Default value: |
impersonation_enabled |
Boolean |
If Default value: |
impersonation_blacklist |
String |
The Hadoop proxy users blacklisted for impersonation. This parameter is used only used if Hadoop impersonation is enabled. Since this is a required setting on the Oracle Database side, it is provided with a default value of |
impersonation_usehosts |
Boolean |
If "true", the proxy hosts variable is set to the IP address of the database node. If "false", the proxy hosts variable is set to the wildcard: "*". Default value: |
principal |
String |
The fully-qualified Kerberos principal name for the
The Oracle Big Data SQL installation uses the Kerberos Required for secured clusters. |
keytab |
String |
Fully-qualified location for the principal’s keytab file name. Copy the keytab file to a location accessible to the Jaguar installer and set the path as the value of this parameter. |
hdfs-principal |
String |
Fully-qualified Kerberos principal name for the " The Required for secured clusters. |
hdfs-keytab |
String |
Fully-qualified path to the principal keytab file. A keytab file is created for each principal on the KDC server. It must exist in a location accessible to the Jaguar installer. Required for secured clusters. |
min_hard_limit |
Integer |
The minimum amount of memory reserved for Big Data SQL, in megabytes. This parameter is used on CDH clusters (Oracle Big Data Appliance and others). It is not used on HDP clusters. By default, the value is 8192 (8 GB). |
max_percentage |
Integer |
On CDH clusters (Oracle Big Data Appliance and others) this parameter specifies the percentage of memory on each node to reserve for Big Data SQL. This percentage is considered from a total amount of: NodeManager if YARN ResourceManager is enabled for that node. Physical memory if not. . If the YARN Resource Manager is enabled for the node, then percentage should be based on the total amount of memory used by the NodeManager. Otherwise it should be a percentage of physical memory. This parameter is ignored on HDP clusters. |
excluded_nodes |
Array |
If the installer does not correctly identify a node as an edge node, then it will install the BDS agent on the node. The BDS agent checks for installation readiness. If the edge node does not meet the prerequisites, then the Oracle Big Data SQL installation fails. The This parameter is in the network section of the configuration file.
Note that the value of The installer is now almost always successful at detecting edge nodes, so you should rarely need to use this parameter. |
Note:
After Oracle Big Data SQL is installed on the Hadoop cluster management server, you can find configuration file examples that demonstrate various parameter combinations in the<Big Data SQL Install directory>/BDSjaguar-3.2.1.2
directory:example-bda-config.json
example-cdh-config.json
example-kerberos-config.json
example-localrepos-config.json
example-subnetwork-config.json
example-unsecure-config.json
You can see all possible parameter options in use in example-cdh-config.json
.
See Also:
See the Appendix Determining the Correct Software Version and Composing the Download Paths for Hadoop Clients for suggestions that can help with the setup of client tarball downloads.Jaguar Operations
The table below lists the full set of operations performed by the Jaguar utility on the Hadoop side of the Oracle Big Data SQL installation.
The general syntax for Jaguar commands is as follows. The --requestdb
parameter does not apply to all Jaguar commands.
# ./jaguar {--requestdb <comma-separated database names> | NULL } <action> { bds-config.json | <myfilename>.json | NULL }
Examples:
# ./jaguar install
# ./jaguar install bds-config.json
# ./jaguar install mycustomconfig.json
# ./jaguar --requestdb orcl,testdb,proddb install
# ./jaguar --requestdb orcl install
You can use the default bds-config.json
or your own configuration file, or omit the configuration file argument (which defaults to bds-config.json
).
About --requestdb:
The --requestdb
parameter is required for the databasereq
command, optional for install
, updatenodes
, and reconfigure
, and non-applicable for other Jaguar commands. The parameter must be passed in to one of these operations in order to enable Database Authentication in the connection between a Hadoop cluster and a database. Unless you prefer to disable Database Authentication, it is recommended that you include --requestdb
with the initial install
operation. Otherwise, you will need perform an additional step later in order to generate the request key.
This parameter is functional only when Database Authentication (database_auth_enabled
) is set to “true
” in the configuration. (This setting is a configuration default and does not need to be explicitly set in the configuration file.)
Jaguar needs the database names in order to generate a unique .reqkey
(request key) file for each database. When database_auth_enabled
is set “true
” at installation time, the --requestdb
parameter is still optional. Post-installation you have the same option to send the request key in the updatenodes
, reconfigure
, and databasereq
commands. Database Authentication is not implemented until you do all of the following:
-
Ensure that
database_auth_enabled
is either absent from the configuration file or is set to ““true”
. (It is“true”
by default.) -
Include
--requestdb
in a Jaguar command:-
Run the Jaguar
install
,updatenodes
, orreconfigure
and install the updated database-side installation bundle, or -
Run Jaguar
databasereq
to generate an acknowledge key from the existing database side installation.
-
-
Copy the generated ZIP file that contains the .ackkey file from the database-side installation directory to
/opt/oracle/DM/databases/conf
on the Hadoop cluster management server. -
Run the Jaguar
databaseack
command as described in the table below.
The table below shows the available Jaguar commands.
Table 2-2 Jaguar Operations
Jaguar Operation | Supports --requestdb? | Usage and Examples |
---|---|---|
install The --requestdb <comma-separated database list> |
Y |
Installs Oracle Big Data SQL on the Hadoop cluster identified in the configuration file and creates an installation bundle for the database side based on the parameters included in the configuration file (or default values for parameters not explicitly assigned value in the configuration file). Examples:
No configuration file parameter is included in the above example. Note: You may need to use the
|
updatenodes |
Y |
Expand or shrink the cluster. Oracle Big Data SQL to any new DataNodes and update the cells inventory if the cluster has grown since the last Oracle Big data SQL installation. |
reconfigure |
Y |
Modify the current installation by applying changes you have made to the configuration file (
Note that if you run The
|
databasereq |
Y |
Use this command to create the
|
databaseack |
N |
The “Database Acknowledge” process provides confirmation to the Oracle Big Data SQL installation on the Hadoop cluster that security features you enabled in the configuration file have been successfully implemented in the database-side installation. It then completes implementation of the selected security features on the Hadoop cluster side.
Only run
If a database-side installation bundle is built with any of these features set to
Copy this ZIP archive back to |
uninstall |
Uninstall Oracle Big Data SQL from the Hadoop cluster. The uninstall process stops the |
Note:
When Oracle Big Data SQL is uninstalled on the Hadoop side, any queries against Hadoop data that are in process on the database side will fail. It is strongly recommended that you uninstall Oracle Big Data SQL from all databases systems shortly after uninstalling the Hadoop component of the software.See Also:
Uninstalling Oracle Big Data SQL.2.3 Steps for Installing on the Hadoop Cluster
After you have set up the Jaguar configuration file according to your requirements, follow these steps to run the Jaguar installer, which will install Oracle Big Data SQL on the Hadoop cluster and will also generate a database-side installation bundle that you deploy to the Oracle Database system. In these steps, bds-config.json
is the configuration filename passed to Jaguar. This is the default. Any file name is accepted, therefore you can create separate configuration files for installation on different clusters and save them in different files.
Note:
Jaguar requires Python 2.7 to 3.0. If necessary, you can add a Jaguar-compatible version of Python as a secondary installation. Revisit the prerequisites section in the Introduction for details. If you are using Oracle Big Data Appliance (and possibly other Hadoop platforms), do not overwrite the installed Python release.-
Log on to the cluster management server node as
root
and cd to the directory where you extracted the downloaded Oracle Big Data SQL installation bundle. -
Cd to the
BDSjaguar
subdirectory under the path where you unzipped the bundle.# cd <Big Data SQL Install Directory>/BDSjaguar
-
Edit the file
bds-config.json
. Provide the required cluster name for the cluster where you want to install Oracle Big Data SQL:{ "cluster": { "name": "<Your cluster name>" } }
Add the parameters that you want to use in this installation. For example:
- Start the Orace Database 12.1 offloader only:
"cluster" : { "database_compatibility" : [ "12.1" ] }
- Start both offloaders:
"cluster" : { "database_compatibility" : [ "12.1", "12.2" ]}
The default is to start both offloaders so you can accomplish the same thing by leaving out the
database_compatibility
parameter.
See Also:
The cluster name is the only required parameter in this version of Oracle Big Data SQL. See the Jaguar Configuration Parameter and Command Reference for a description of all available parameters. You can see an example of a
bds-config.json
file populated with all available parameters in bds-config.json Configuration Example.In the BDSjaguar directory, run the Jaguar
install
operation. Pass theinstall
parameter and the configuration file name. (bds-config.json
is the implicit default) as arguments to the Jaguar command. You may or may not need to include the--requestdb
option.[root@myclusteradminserver:BDSjaguar] # ./jaguar install <config file name>
Note:
By default, Database Authentication is set to true unless you setdatabase_auth_enabled
to “false” in the configuration file. If you enable Database Authentication, then either as part of the install operation or later, generate a “request key.” This is half of a GUID/key pair used in the authentication process. To generate this key, include the--requestdb
parameter in the Jaguarinstall
command line:
If the install was run with[root@myclusteradminserver:BDSjaguar] # ./jaguar --requestdb mydb install
database_auth_enabled
is “true”, you can use the Jaguardatabasereq
command to generate the key after the database-side installation. Several other Jaguar commands can also generate the request key if you pass them the--requestdb
parameter.Jaguar prompts for the cluster management service administrator credentials and then installs Oracle Big Data SQL throughout the Hadoop cluster. It also generates the database-side installation bundle in the
db-bundles
subdirectory. The following message is returned if the installation completed without error.BigDataSQL: INSTALL workflow completed.
- Start the Orace Database 12.1 offloader only:
-
Check for the existence of the database side installation bundle:
# ls
<Big Data SQL Install Directory>/BDSjaguar/db-bundles
bds-3.2.1.2-db-<cluster>-<yymmdd.hhmi>.zipThis bundle is for setting up Oracle Big Data SQL connectivity Oracle database and the specific cluster defined in the
bds-config.json
(or other) configuration file. It contains all packages and settings files required except for an optional database request key file.If you included--requestdb
in the install command, then the installation also generates one or more database request key files under thedbkeys
subdirectory. You should check to see that this key exists.# ls
<Big Data SQL Install Directory>/BDSjaguar/dbkeys
cluster1db.reqkey
This completes the Oracle Big Data SQL installation on the Hadoop cluster.
What Next?
After Jaguar has successfully installed Oracle Big Data SQL on the Hadoop cluster, you are done with the first half of the installation. The next step is to install Oracle Big Data SQL on the Oracle Database system that will run queries against the data on the Hadoop cluster.
To do this, copy the database-side installation bundle to any location on the Oracle Database system. Unless you set database_auth_enabled
to “false”
in the configuration file, then also copy over the .reqkey
file generated by Jaguar.
Tip:
You only need to send a request key to a database once. A single request key is valid for all Hadoop cluster connections to the same database. If you have already complete the installation to connect one Hadoop cluster to a specific database, then the database has the key permanently and you do not need to generate it again or copy it over to the database again in subsequent cluster installations.Go to Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL for instructions on unpacking the bundle and installing the database-side components of the software.
See Also:
An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Installation Examples.