2 Installing or Upgrading the Hadoop Side of Oracle Big Data SQL

After downloading the Oracle Big Data SQL deployment bundle and extracting the files, the next step is to configure the installer and then run the installation.

The installation of Oracle Big Data SQL on the Hadoop cluster is deployed using the services provided by the cluster management server (Cloudera Manager or Ambari). The Jaguar install command uses the management server API to register the BDS service and start the deployment task. From there, the management server controls the process that deploys the software to the nodes of the cluster and installs it.

The Hadoop-side installation also generates the deployment bundle for the database side of the installation.

If a previous version of Oracle Big Data SQL is already installed, Oracle Big Data SQL upgrades the installation to Release 4.0.

Users of previous Oracle Big Data SQL releases, please note that there are changes to the BDSJaguar configuration parameters available in this release.

2.1 About Support for Multiple Database Versions (18c, 12.2, and 12.1)

Oracle Big Data SQL now supports Oracle Database 18c and also provides backward compatibility for Oracle Database 12.2 and 12.1.

To take Oracle advantage of the new capabilities in Oracle Big Data SQL 4.0, you need use Oracle Database 18c or later. However, use of Oracle Database 12.1 and 12.2 is fully supported in this release (even though you cannot leverage the new 4.0 capabilities with these database versions). This backward compatibility enables you to install and administer release 4.0 in a mixed environment that includes both Oracle Database 18c and 12c.

See Also:

The Jaguar Configuration Parameter and Command Reference in this chapter shows you how to configure support for Oracle Database versions when you install the Hadoop side of Big Data SQL.

2.2 Before You Start

Check to Ensure that all DataNodes of the Cluster Meet the Prerequisites

2.2.1 Check for Hadoop-Side Prerequisites With bds_node_check.sh

You can run bds_node_check.sh on all cluster DataNodes prior to installing Oracle Big Data SQL. This is a quick way to check if each node meets the installation criteria. You can see exactly what needs to be updated.

Running bds_node_check.sh is not required, but is recommended. The Jaguar installer runs the same pre-checks internally, but when Jaguar runs the pre-checks it also starts and stops the cluster management server. Furthermore, the installation stops in place when it encounters a node that does not meet the prerequisites. Each time this happens, you then need to fix the readiness errors on the node in order to continue. Running bds_node_check.sh as a first step contributes to a smoother installation.

You can use this same script to check for the prerequisites when you add new nodes to the cluster.

Deploying and Running bds_node_check.sh

The script checks the local node where it is run. It does not check all nodes in the cluster.

  1. Find the script on the cluster management server in the install directory created when you executed ./BDSJaguar-4.0.0.run.
    $ ls <Big Data SQL Install Directory>
    BDSJaguar
    bds_node_check.sh
    $ cd <Big Data SQL Install Directory>
  2. Use your preferred method to copy the script to a node that you want to check.
    $ scp bds_node_check.sh oracle@<node_IP_address>:/opt/tmp 
  3. Log on to the node and run the script.
    $ ./bds_node_check.sh

Checking for Missing Prerequisites in the bds_node_check.sh Output

The report returned by bds_node_check.sh inspects the node both for Jaguar installer prerequisites and prerequisites for support of communications with Query Server on its edge node. If you do not intend to install Query Server, you can ignore that subset of the prerequisites.

bds_node_check.sh:
Starting pre-requirements checks for BDS Jaguar
Total memory 50066 >= 20480 correct
vm_overcommit_memory=0 correct
shmmax=1099511627776, shmall=268435456, PAGE_SIZE=4096 correct
shmmax=1099511627776 >= total_memory=52498006016 + 1024 correct
swappiness=10 correct
Total cores 9 >= 8 correct
Size of socket buffer rmem_default 4194304 >= 4194304 correct
Size of socket buffer rmem_max 8388608 >= 4194304 correct
Size of socket buffer wmem_default 4194304 >= 4194304 correct
Size of socket buffer wmem_max 8388608 >= 4194304 correct
dmidecode installed
net-snmp not installed
net-snmp-utils not installed
perl-XML-SAX not installed
perl-XML-LibXML not installed
perl-libwww-perl not installed
perl-libxml-perl not installed
libaio installed
glibc installed
libgcc installed
libstdc++ installed
libuuid installed
perl-Time-HiRes not installed
perl-libs not installed
rpm found
scp found
curl found
unzip found
zip found
tar found
wget not found
uname found
perl not found
10 error(s) found for BDS Jaguar pre-requirements

Starting pre-requirements checks for BDS Query Server
expect installed
procmail installed
rpm found
scp found
curl found
unzip found
zip found
tar found
wget not found
uname found
perl not found

2 error(s) found for BDS Query Server pre-requirements

2.2.2 Greater Memory Requirement if You Intend to Support Oracle Database 12.1.

Support for Oracle Database 12.1 may require additional memory.

Oracle Big Data SQL 4.0 provides backward compatibility with Oracle Database 12.1 and 12.2 . However compatibility with Oracle Database 12.1 incurs an additional cost in memory. Oracle Database 12.1 support requires that the Hadoop nodes run an older offload server (in addtion to the offload server normally present). The overhead of running this additional offload server is a resource expense that you can prevent if you do not need to support Oracle Database 12.1.

If You Need to Support Oracle Database 12.1 for This Cluster:

Be sure that the DataNodes in the Hadoop cluster have enough memory. The minimum memory requirement per Hadoop node for an installation that supports full database compatibility, including 18c, 12.2, 12.1 is 64 GB.

Also check to be sure that the memory cgroup upper limit is set to allow Oracle Big Data SQL to consume this much memory.

If You do not Need to Support Oracle Database 12.1 for This Cluster:

Be sure to choose the right setting for the database_compatibility value in the Jaguar configuration file (bds-config.json or other). The options for this parameter are: "12.1", "12.2", "18", and "full". Both "12.1" and "full" trigger the startup of the additional offload server. Use whichever of the options is appropriate for your environment.

Also note that the new features that Oracle Big Data SQL provides cannot be leveraged on the older 12c databases.

See Also:

The Jaguar Configuration Parameter and Command Reference describes the database_compatibility parameter.

2.2.3 Plan the Configuration and Edit the Jaguar Configuration File

Before you start, consider the questions below.

Answering these questions will help clarify how you should edit the Jaguar configuration file. (See the Jaguar Configuration Parameter and Command Reference in this chapter.)

  • Do you plan to access data in object storage?

    If so, then in the Jaguar configuration file you need to enable this access and also define some proxy settings that are specific to object store access.

  • Do you want to install the optional Query Server?

    If so, several things are required before you run Jaguar in order to install Oracle Big Data SQL:

    • Identify a cluster edge node to host the optional Query Server.

      A dedicated node is strongly recommended. Query Server is resource-intensive and cannot run on node hosting either the DataNode or BDSSERVER roles.

    • Download and unzip the Query Server bundle and then execute the run file.
      The Query Server is in a separate deployment bundle from the Jaguar installer (BDSExtra-4.0.0-QueryServer.zip). Before running Jaguar, download and unzip this bundle from eDelivery. Then execute the Query Server run file. Also, in the configuration file submitted to Jaguar, define the two related parameters in the edgedb section of the file -- node and enabled.
         "edgedb": {                                                                     
              "node"    : "<some server>.<some domain>.com",                                
              "enabled" : "true"                                                          
          }              
  • Does your Hadoop environment need to connect to more than one Oracle Database version?
    By default, Big Data SQL allows connections from 12.1, 12.2, and 18c databases. However, the offloaders that support these connections are resource-intensive, particularly for memory consumption on the nodes of the cluster. If you do not need to support all three releases, you can save resource by turning off either the 12.1 or 12.2 offloader. (Note that the 12.2 offloader actually supports both 12.2 and 18c.) You set this in the configuration passed to the Jaguar installer. For example, you can enter this string to allow connections from 12.1 databases only:
    "database_compatibility" : [ "12.1" ]
    To turn off the 12.1 offloader, enter the value "12.2":
    "database_compatibility" : [ "12.2" ]
  • Do you want to enable Database Authentication in order to validate connection requests from Oracle Database to Big Data SQL server processes on the Hadoop DataNodes?

    Database Authentication in the network connection between the Oracle Database and Hadoop is set to “true” in the configuration by default. You have the option to disable it by setting the database_auth_enabled to “false”:

    "database_auth_enabled" : "false",
  • Do you want to use the Multi-User Authorization feature?

    Multi-User Authorization enables you to grant users other than oracle permissions to run SQL queries against the Hadoop cluster. Multi-User Authorization can be used in conjunction with Sentry's role-based access control to provide improved control over user access.

    The first step in setting up Multi-User Authorization is to set these parameters in the security section of the configuration file:
    "impersonation_enabled" : "true",
    "impersonation_usehosts" : "true",
    "impersonation_blacklist" : "hdfs,hive"
    Note that you can add any account to the blacklist.
  • Are the Hadoop cluster and the Oracle Database system going to communicate over Ethernet or over InfiniBand? Also, do the Hadoop nodes have more than one network interface?

    See the use_infiniband and selection_subnet parameters. (The selection_subnet does not apply to Oracle Big Data Appliance.)
    "use_infiniband" : "false",
    "selection_subnet" : "5.32.128.10/21"
    By default use_infiniband is set to false. Ethernet is the default protocol.
  • Are you going to pre-download the Hadoop and Hive client tarballs and set up a local repository or directory where the installer can acquire them, or, will you configure Jaguar to download them directly from the Cloudera or Hortonworks repository on the Internet?

    Jaguar needs to know where to find the Hadoop and Hive tarballs. You can specify either an URL or directory path or both for the tarballs.

    "url" : [
    "http://repo1.domain.com/loc/hadoop",
    "http://repo2.domain.com/loc",
    "http://alternate.domain.com/backup/repo"
    ],
    "dir" : [ "/root/hadooprepo" ]

    If you choose to download the tarballs from the Internet, ensure that the repository is accessible from the node where you are running the installation. If needed, you can set the http_proxy parameter to configure the installer to work with HTTP/HTTPS proxies within your environment:

    "http_proxy" : "http://my.proxy.server.com:80",
    "https_proxy" : "http://mysecure.proxy.server.com:80"

    You can either set the proxy values in the configuration file or use preexisting proxy environment variables.

    Note:

    On Big Data Appliance only, the Mammoth installer provides an option for you to include Big Data SQL in the Big Data Appliance release installation. You can also install it later using the bdacli utility. If you use either of these appliance-specific methods, the clients are installed for you and you do not need to provide any information on where to find them. However, if you use the Jaguar to install Big Data SQL on Big Data Appliance as well as other supported Hadoop platforms, you do have to provide the path to the repositories in the Jaguar configuration file using either the dir or url parameter.
  • If the network is Kerberos-secured, do you want the installer to set up automatic Kerberos ticket renewal for the oracle account used by Oracle Big Data SQL?

    See the parameters in the kerberos section:
    "principal" : "oracle/mycluster@MY.DOMAIN.COM",
    "keytab" : "/home/oracle/security/oracle.keytab",
    "hdfs-principal" : "hdfs/mycluster@MY.DOMAIN.COM",
    "hdfs-keytab" : "/home/hdfs/security/hdfs.keytab"
  • Do you want the Oracle Big Data SQL install process to automatically restart services that are in a stale state?

    By default, stale services are restarted automatically. If you want to suppress this, you can set the restart_stale parameter in the configuration file to “false”.

  • Is the Hadoop cluster using the default REST API port for CDH or Ambari?

    If not, set the ports parameter.

Note:

Setting these parameters in the configuration file does not complete the set up for some features. For example, to enable Database Authentication, you must also pass a special -—requestdb parameter to the Jaguar utility in order to identify the target database or databases. There are also steps required to generate and install the security key used by this feature. To enable Multi-User Authorization, you start by setting the Hadoop Impersonation parameters in the configuration file, but also need to set up the authorization rules. The steps to complete these setups are provided where needed as you work through the instructions in this guide.

2.3 About the Jaguar Utility

Jaguar is a multifunction command line utility that you use to perform all Oracle Big Data SQL operations on the Hadoop cluster.

Jaguar currently supports these operations:

  • install

    • Deploys Oracle Big Data SQL binaries to each cluster node that is provisioned with the DataNode service.

    • Configures Linux and network settings for bd_cell (the Oracle Big Data SQL service) on each of these nodes.

    • Generates the bundle that installs Oracle Big Data SQL on the Oracle Database side. It uses the parameter values that you set in the configuration file in order to configure the Oracle Database connection to the cluster.

  • reconfigure

    Modifies the current configuration of the installation (according to the settings in the configuration file provided).

  • databasereq

    Generates a request key file that contains one segment of the GUID-key pair used in Database Authentication. (The databasereq operation performs this function only. For install and reconfigure, request key generation is an option that can be included as part of the larger operation.)

  • databaseack

    Perform the last step in Database Authentication setup -- install the GUID-key pair on all Hadoop DataNodes in a cluster in order to allow queries from the Oracle Database that provided it.

  • sync_principals

    Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users for Query Server.

  • uninstall

    Uninstalls Oracle Big Data SQL from all DataNodes of the Hadoop cluster.

See Also:

Jaguar Operations in the next section provides details and examples.

2.3.1 Jaguar Configuration Parameter and Command Reference

This section describes the parameters within the Jaguar configuration file as well as Jaguar command line parameters.

Configuration Parameters

The table below describes all parameters available for use in bds-config.json or your own configuration file. Only the cluster name parameter is always required. Others are required under certain conditions stated in the description.

Note:

When editing the configuration file, be sure to maintain the JSON format. Square brackets are required around lists, even in the case of a list with a single item.

Table 2-1 Configuration Parameters in bds-config.json (or in Customer-Created Configuration Files)

Section Parameter Type Description
cluster name String

The name of the cluster.

For CDH clusters (Oracle Big Data Appliance or other), this name can be either the physical cluster name or the display name. The installer searches first by physical name and then by display name.

The name parameter is required only if Cloudera Manager will manage two or more clusters.

cluster database_compatibility string Select which Oracle Database versions must be supported.

Possible values: "12.1"| "12.2"| "18"| "full"

For example, either of these settings enables support for Oracle Database 12.1, 12.2, and 18c.

"database_compatibility" : [ "12.1" ]
"database_compatibility" : [ "full" ]

Either of the following settings enables support for Oracle Database 12.2, and 18c, but disable support for Oracle Database 12.1. By disabling support for Oracle Database 12.1 if it is not needed, you conserve some system resources, particularly memory.

"database_compatibility" : [ "12.2" ]
"database_compatibility" : [ "18" ]

The default is "full".

api hostname String Visible hostname for the cluster management server. In some scenarios, the visible hostname for Cloudera Manager or Ambari is not the same to the current hostname, for example, in High Availability environments.

Default: the local hostname.

api skip_health_check Boolean If "true", the cluster health check is skipped.

The cluster health check verifies that HDFS, Hive and Yarn services are running with good health and are not stale. Additionally, for CDH clusters, management services should be running with good health and not stale.

Default: "false".

api port Integer

Cloudera Manager or Ambari REST API port.

By default, on CDH clusters this port is 7183 for secured and 7180 for unsecured access. For Ambari, is 8443 for secured and 8080 for unsecured.

Optional.

api restart_stale Boolean

If “true”, then services with stale configurations are restarted at the end of install process. These services are HDFS NodeName, YARN NodeManager and/or Hive (depending upon the settings selected).

If “false”, the installation will finish but those services will remain on stale state. This is useful for avoiding unwanted service interruptions. You can then restart later when it is more convenient.

Optional. The default is “true”.

edgedb enabled Boolean Determines whether or not the Query Server functionality is enabled or not.

Default: "false".

edgedb node String

Hostname of the node where the Query Server database will be running (if enabled).

Note:

Because Query Server is resource-intensive, it is highly recommended that you install the database on a dedicated node. Query server cannot run on a node that is running the DataNode role, nor the BDSSERVER role.
object_store_support enabled boolean If "true", Oracle Wallet is set up both in the cluster and on the database system in order to allow access to Object Store.

Default: "false".

object_store_support cell_http_proxy string If object store access support is enabled, this parameter is required for access to an object store from the Hadoop cluster side, even for empty values. Follows same rules as the Linux http_proxy variable. For example: http://myproxy.domain.com:80. No default value.
object_store_support cell_no_proxy string Like cell_http_proxy, supports access to object stores and is also required if this access is enabled, even for empty values. Follows same syntax rules as the Linux no_proxy environment variable. For example: localhost,127.0.0.1,.domain.com. No default value.
object_store_support database_http_proxy string Same description as cell_http_proxy, except that this parameter supports object store access from the database side, not the Hadoop side.
object_store_support database_no_proxy string Same description as cell_no_proxy, except that this parameter supports object store access from the database side, not the Hadoop side.
network
http_proxy
https_proxy
String

Specify the proxy settings to enable download of the Hadoop client tarballs and cluster settings files.

If both of these strings are empty, the OS environment proxy settings are used.

By default, both strings are empty.

Using these two parameters in the configuration file is optional. If they are needed, you could instead set them externally as in export http_proxy=<proxy value>

Not applicable to Oracle Big Data Appliance

network extra_nodes List

List additional nodes where the BDSAgent should be installed

The BDSAgent and BDSServer roles are installed on all DataNodes instances. In addition, BDSAgent is installed on cluster nodes running HiveServer2 and HiveMetaStore instances. All remaining nodes are automatically excluded unless you add them here.

Default: empty

network excluded_nodes List Nodes that are not hosting the DataNode role can be excluded by listing them within this parameter.
security impersonation_enabled Boolean

If "true", Hadoop impersonation for Multi-user Authorization support is enabled. This sets up the oracle OS user as the Hadoop proxy user and propagates the proxy user’s black list to the database nodes. If "false", this feature is not enabled.

Default value: "true" in Oracle Big Data SQL 4.x and higher.

Note:

For CDH clusters, if the Sentry service is running, this setting is overidden and impersonation is enabled regardless of the value of this parameter.
kerberos principal String

The fully-qualified Kerberos principal name for a user.

Note:

In earlier releases, only the principal for the oracle user is supported. Other principals are now supported as well.

The principal has three parts:

  • The User Name:

    Kerberos principal name are case-sensitive. Be sure the User Name in the same format used for the Kerberos principal name.

  • Qualifier: "/<qualifier>." This is optional information to help you organize and identify principals.

  • Domain: "@MY.DOMAIN.COM." This is required information managed by the KDC.

The Oracle Big Data SQL installation uses the Kerberos principal field (and keytab field below) to set up automated Kerberos ticket renewal for the user represented by the principal. It does this on both the Hadoop and Oracle Database sides of the installation. The installer does not create the principal or the keytab. These must already exist.

Required for secured clusters.

kerberos db-service-principal String

Specifies a principal on the KDC server. This is the service principal Query Server uses to validate the Kerberos ticket presented by a client.

Both db-service-principal and db-service-keytab are used to validate the Kerberos ticket presented by a client . Note that the parameters SQLNET.AUTHENTICATION_KERBEROS5_SERVICE and SQLNET.KERBEROS5_KEYTAB in sqlnet.ora will be set accordingly.

The qualifier for the principal name must match the fully qualified domain name of the node where the Query Server will be running.

Required for secured clusters.

kerberos db-service-keytab String

Fully-qualified location of the keytab file for the principal specificed with db-service-principal.

Be sure to store the keytab in a location that is accessible to the Jaguar installer.

kerberos sync_principals Boolean

The sync_principals parameter specifies whether or not Jaguar automatically gets a list of principals from a KDC running on a cluster node and then uses the list to create externally-identified database users for Query Server.

If set to true, then an automatic synchronization with Kerberos principals occurs during Jaguar install and reconfigure operations. The user can also call this synchronization at any time by invoking the sync_principals operation of Jaguar on the command line.

Default: "true".

kerberos hdfs-keytab String

Fully-qualified path to the principal keytab file. A keytab file is created for each principal on the KDC server. It must exist in a location accessible to the Jaguar installer.

Required for secured clusters.

kerberos keytab String

Fully-qualified location for the principal’s keytab file name.

Copy the keytab file to a location accessible to the Jaguar installer and set the path as the value of this parameter.

kerberos hdfs-principal String

Fully-qualified Kerberos principal name for the "hdfs" user. It has three parts: User name, Qualifier, and Domain.

The User name is the fully-qualified principal name for the hdfs user. Qualifier is the cluster name prefixed by a forward slash, as in /mycluster. Domain is specified in the form @MY.DOMAIN.COM. All three are required. The principal name is defined on the KDC.

Required for secured clusters.

repositories dir List

List of directories where the Hadoop clients for deployment on the database side are located. These directories can be on the local file system or on NFS. Directories are searched in the order listed. By default, the list is empty. If the dir list has any entries, these are searched before the URL list is searched, since this option should provide the fastest access to the clients. To give the installer the quickest access to the tarballs, you could set up a local repository, download the tarballs separately though a direct Internet connection, copy them into a directory on the same node where the Oracle Big Data SQL installer will run, and list that directory in the dir parameter.

Optional.

Not applicable to Oracle Big Data Appliance.

repositories url List

This is the list of URLs where the Hadoop client tarballs for deployment on the database side are located. If you data center already has repositories set up for access via HTTP, then you may prefer to maintain the Hadoop tarballs in that repository and use URL parameter for Oracle Big Data SQL installations. The URLs can be to the localhost, an internal network, or a site on the Internet (if the node has Internet access). The URLs are tried in the order listed. Note that internal proxy values and/or OS environment proxy settings must be set to allow this access if needed.

If access to all listed repositories fails and/or Internet access is blocked, the database installation bundle is not created and a warning message is displayed. After correcting any problems and providing access to a repository, you can re-run the installer using the reconfigure and the installer should successfully generate the database-side installation bundle. Note that reconfigure detects and implements changes according the current directives in the configuration file. It does not uninstall and reinstall Oracle Big Data SQL on the cluster.

Not applicable to Big Data Appliance, where the tarballs are stored in a local repository in the cluster and the location is automatically added to the configuration file.

network use_infiniband Boolean

If “true”, the communication will be set through private network interface, if "false" by client network interface.

Used for Oracle Big Data Appliance clusters only.

Default value: “false”.

network selection_subnet String

If Hadoop cluster nodes have several network interfaces, you can use selection_subnet to select one. The selected IP address will be the nearest to the selection subnetwork.

If the Hadoop cluster nodes have only one network interface, this parameter is ignored.

The default value depends upon these conditions:

  • On non-Oracle commodity Hadoop clusters (CDH or HDP) the default selection is 0.0.0.0/0. (If a cluster node has several IP addresses, the lowest address is selected.)

  • On Oracle Big Data Appliance, the default is either the private or client IP address, depending upon the setting of the use_infiniband parameter.

Note for Oracle Big Data Appliance Users:

It's possible to configure several networks on an Oracle Big Data Appliance. If multiple networks exist, then this parameter must be set in order to select a specific network.
security database_auth_enabled Boolean

If "true", the database authentication through the GUID-key mechanism is enabled. This requires an extra step in the installation process in order to set up the database GUID-key pair on the cluster side.

If "false", the feature will not be enabled.

Default value: "true".

security impersonation_blacklist String

The Hadoop proxy users blacklisted for impersonation. This parameter is used only if Hadoop impersonation is enabled.

Since this is a required setting on the Oracle Database side, it is provided with a default value of "dummy" in order to avoid extproc errors that can occur if Hadoop Impersonation is not enabled.

security impersonation_usehosts Boolean

If "true", the proxy hosts variable is set to the IP address of the database node.

If "false", the proxy hosts variable is set to the wildcard: "*".

Default value: "true".

memory min_hard_limit Integer

The minimum amount of memory reserved for Big Data SQL, in megabytes. This parameter is used on CDH clusters (Oracle Big Data Appliance and others). It is not used on HDP clusters. By default, the value is 16384 MB (16 GB) .

memory max_percentage Integer

On CDH clusters (Oracle Big Data Appliance and others) this parameter specifies the percentage of memory on each node to reserve for Big Data SQL. This percentage is considered from a total amount of: NodeManager if YARN ResourceManager is enabled for that node. Physical memory if not.

If the YARN Resource Manager is enabled for the node, then percentage should be based on the total amount of memory used by the NodeManager. Otherwise it should be a percentage of physical memory.

This parameter is ignored on HDP clusters.

Note:

After Oracle Big Data SQL is installed on the Hadoop cluster management server, you can find configuration file examples that demonstrate various parameter combinations in the <Big Data SQL Install directory>/BDSjaguar directory:
example-bda-config.json
example-cdh-config.json 
example-kerberos-config.json
example-localrepos-config.json
example-subnetwork-config.json
example-unsecure-config.json
You can see all possible parameter options in use in example-cdh-config.json .

See Also:

See the Appendix Determining the Correct Software Version and Composing the Download Paths for Hadoop Clients for suggestions that can help with the setup of client tarball downloads.

Jaguar Operations

The table below lists the full set of operations performed by the Jaguar utility on the Hadoop side of the Oracle Big Data SQL installation.

The general syntax for Jaguar commands is as follows. The --requestdb parameter does not apply to all Jaguar commands.

# ./jaguar {--requestdb <comma-separated database names> | NULL } <action> { bds-config.json | <myfilename>.json | NULL } 

Examples:

# ./jaguar install
# ./jaguar install bds-config.json
# ./jaguar install mycustomconfig.json 
# ./jaguar --requestdb orcl,testdb,proddb install
# ./jaguar --requestdb orcl install
# ./jaguar sync_principals

You can use the default bds-config.json or your own configuration file, or omit the configuration file argument (which defaults to bds-config.json).

About --requestdb:

The --requestdb parameter is required for the databasereq command, optional for install, updatenodes, and reconfigure, and non-applicable for other Jaguar commands. The parameter must be passed in to one of these operations in order to enable Database Authentication in the connection between a Hadoop cluster and a database. Unless you prefer to disable Database Authentication, it is recommended that you include --requestdb with the initial install operation. Otherwise, you will need perform an additional step later in order to generate the request key.

This parameter is functional only when Database Authentication (database_auth_enabled) is set to “true” in the configuration. (This setting is a configuration default and does not need to be explicitly set in the configuration file.)

Jaguar needs the database names in order to generate a unique .reqkey (request key) file for each database. When database_auth_enabled is set “true” at installation time, the --requestdb parameter is still optional. Post-installation you have the same option to send the request key in the updatenodes, reconfigure, and databasereq commands. Database Authentication is not implemented until you do all of the following:

  1. Ensure that database_auth_enabled is either absent from the configuration file or is set to ““true”. (It is “true” by default.)

  2. Include --requestdb in a Jaguar command:

    1. Run the Jaguar install, updatenodes, or reconfigure and install the updated database-side installation bundle, or

    2. Run Jaguar databasereq to generate an acknowledge key from the existing database side installation.

  3. Copy the generated ZIP file that contains the .ackkey file from the database-side installation directory to /opt/oracle/DM/databases/conf on the Hadoop cluster management server.

  4. Run the Jaguar databaseack command as described in the table below.

The table below shows the available Jaguar commands.

Table 2-2 Jaguar Operations

Jaguar Operation Supports --requestdb? Usage and Examples
install

The --requestdb parameter is not strictly required by the install operation, but you cannot enable Database Authentication if you do not generate a request key for each database.

--requestdb <comma-separated database list>
Y

Installs Oracle Big Data SQL on the Hadoop cluster identified in the configuration file and creates an installation bundle for the database side based on the parameters included in the configuration file (or default values for parameters not explicitly assigned value in the configuration file). Examples:

# ./jaguar --requestdb orcl,testdb,proddb install

No configuration file parameter is included in the above example. bds-config.json is the implicit default. You can specify a different configuration file as in./jaguar --requestdb mydb install myconfig.json

Note:

You may need to use the scl utility to ensure that the correct Python version is invoked:

scl enable python27 "./jaguar install"

On Big Data Appliance clusters running Oracle Linux 6 and Oracle Linux 7, scl is not needed in order call the correct Python version for Jaguar.

updatenodes Y

Expand or shrink the cluster. Oracle Big Data SQL to any new DataNodes and update the cells inventory if the cluster has grown since the last Oracle Big data SQL installation.

reconfigure Y

Modify the current installation by applying changes you have made to the configuration file (bds-config.json or other).

# ./jaguar reconfigure myconfigfile.json

Note that if you run ./jaguar reconfigure <config file> to reconfigure Oracle Big Data SQL on the Hadoop cluster, a corresponding reconfiguration is required on the Oracle Database side. The two sides cannot communicate if the configurations do not match. The Jaguar utility regenerates the database-side bundle files to incorporate the changes, You must redeploy the bundle on all database servers where it was previously installed.

The --requestdb argument is required if database_auth_enabled is set to “true” in the updated configuration file. This is so that Jaguar will generate .reqkey files that are included in the database-side installation bundle. Note that we let the configuration file parameter default to bds-config.json.

# ./jaguar --requestdb demodb,testdb,proddb1 reconfigure 
databasereq Y

Use this command to create the .reqkey file without repeating the Hadoop-side installation, or doing an updatenodes or reconfigure operation. (For example, if you forgot to include the --requestdb argument with the Jaguar install command), you can create a request key later with databasereq. This operation requires that database_auth_enabled is set to “true” (the default value) in the configuration.

`
# ./jaguar --requestdb demodb,testdb,proddb1 databasereq  
databaseack N

The “Database Acknowledge” process provides confirmation to the Oracle Big Data SQL installation on the Hadoop cluster that security features you enabled in the configuration file have been successfully implemented in the database-side installation. It then completes implementation of the selected security features on the Hadoop cluster side.

./jaguar databaseack bds-config.json

Only run databaseack if you chose to enable security features by setting either of these parameters in the configuration file to “true”:

  • "impersonation_enabled" : "true"

  • "database_auth_enabled" : "true"

If a database-side installation bundle is built with any of these features set to “true”, then the database-side installation from that bundle generates a ZIP file in the installation directory under $ORACLE_HOME on the database server. The format of the ZIP file name is <Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>-<FQDN of this database node>.zip. For example:

$ ls $ORACLE_HOME/BDSJaguar-4.0.0/cdh510-6-node1.my.domain.com/*.zip
$ cdh510-6-node1.my.domain.com-myoradb1.mydomain.com.zip

Copy this zip archive back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server after the database-side installation is complete. Then, to fully enable the security features, run databaseack.

sync_principals N/A

Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users in Query Server. You can do the same by including the similarly-named sync_principals parameter in a Jaguar configuration file during Jaguar install and reconfigure operations.

--object-store-http-proxy N/A Specify a different proxy for Object Store access than the one set in the configuration file.
--object-store-no-proxy N/A Sets a no-proxy value and overrides the no_proxy value that may be set in the configuration file.
uninstall N/A

Uninstall Oracle Big Data SQL from the Hadoop cluster.

The uninstall process stops the bd_cell process (the Oracle Big Data SQL process) on all Hadoop cluster nodes, removes all instances from Hadoop cluster, and release all related resources.

Note:

When Oracle Big Data SQL is uninstalled on the Hadoop side, any queries against Hadoop data that are in process on the database side will fail. It is strongly recommended that you uninstall Oracle Big Data SQL from all databases systems shortly after uninstalling the Hadoop component of the software.

2.4 Steps for Installing on the Hadoop Cluster

After you have set up the Jaguar configuration file according to your requirements, follow these steps to run the Jaguar installer, which will install Oracle Big Data SQL on the Hadoop cluster and will also generate a database-side installation bundle that you deploy to the Oracle Database system. In these steps, bds-config.json is the configuration filename passed to Jaguar. This is the default. Any file name is accepted, therefore you can create separate configuration files for installation on different clusters and save them in different files.

Note:

Jaguar requires Python 2.7 to 3.0. Versions greater than 3.0 are not supported by Oracle Big Data SQL at this time. If necessary, you can add a Jaguar-compatible version of Python as a secondary installation. Revisit the prerequisites section in the Introduction for details. If you are using Oracle Big Data Appliance, do not overwrite the Mammoth-installed Python release.
  1. Log on to the cluster management server node as root and cd to the directory where you extracted the downloaded Oracle Big Data SQL installation bundle.

  2. Cd to the BDSJaguar subdirectory under the path where you unzipped the bundle.

    # cd <Big Data SQL Install Directory>/BDSJaguar
  3. Edit the file bds-config.json.

    {
    "cluster": {
               "name": "<Your cluster name>"
               }
    }

    Add the parameters that you want to use in this installation.

    See Also:

    The cluster name is the only required parameter, but it is required only in environments where the configuration management service must manage more than one cluster. See the Jaguar Configuration Parameter and Command Reference for a description of all available parameters. You can see an example of a bds-config.json file populated with all available parameters in bds-config.json Configuration Example.

    In the BDSJaguar directory, run the Jaguar install operation. Pass the install parameter and the configuration file name. (bds-config.json is the implicit default) as arguments to the Jaguar command. You may or may not need to include the --requestdb option.

    [root@myclusteradminserver:BDSjaguar] #  ./jaguar install <config file name>

    Note:

    By default, Database Authentication is set to true unless you set database_auth_enabled to “false” in the configuration file. If you enable Database Authentication, then either as part of the install operation or later, generate a “request key.” This is half of a GUID/key pair used in the authentication process. To generate this key, include the --requestdb parameter in the Jaguar install command line:
    [root@myclusteradminserver:BDSjaguar] # ./jaguar --requestdb mydb install
    
    If the install was run with database_auth_enabled is “true”, you can use the Jaguar databasereq command to generate the key after the database-side installation. Several other Jaguar commands can also generate the request key if you pass them the --requestdb parameter.

    Jaguar prompts for the cluster management service administrator credentials and then installs Oracle Big Data SQL throughout the Hadoop cluster. It also generates the database-side installation bundle in the db-bundles subdirectory. The following message is returned if the installation completed without error.

    BigDataSQL: INSTALL workflow completed.
  4. Check for the existence of the database side installation bundle:

    # ls <Big Data SQL Install Directory>/BDSJaguar/db-bundles
     bds-4.0.0-db-<cluster>-<yymmdd.hhmi>.zip

    This bundle is for setting up Oracle Big Data SQL connectivity Oracle database and the specific cluster defined in the bds-config.json (or other) configuration file. It contains all packages and settings files required except for an optional database request key file.

    If you included --requestdb in the install command, then the installation also generates one or more database request key files under the dbkeys subdirectory. You should check to see that this key exists.
    # ls <Big Data SQL Install Directory>/BDSJaguar/dbkeys
     cluster1db.reqkey

See Also:

If you chose to install Query Server, you can connect and start working with it now. It is not dependent on completion of the Oracle Database side of the installation. See Working With Query Server in the Oracle Big Data SQL User's Guide.

This completes the Oracle Big Data SQL installation on the Hadoop cluster.

What Next?

After Jaguar has successfully installed Oracle Big Data SQL on the Hadoop cluster, you are done with the first half of the installation. The next step is to install Oracle Big Data SQL on the Oracle Database system that will run queries against the data on the Hadoop cluster.

To do this, copy the database-side installation bundle to any location on the Oracle Database system. Unless you set database_auth_enabled to “false” in the configuration file, then also copy over the .reqkey file generated by Jaguar.

Tip:

You only need to send a request key to a database once. A single request key is valid for all Hadoop cluster connections to the same database. If you have already completed the installation to connect one Hadoop cluster to a specific database, then the database has the key permanently and you do not need to generate it again or copy it over to the database again in subsequent cluster installations.

Go to Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL for instructions on unpacking the bundle and installing the database-side components of the software.

See Also:

An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Installation Examples.