2 Installing or Upgrading the Hadoop Side of Oracle Big Data SQL

After downloading the Oracle Big Data SQL deployment bundle and extracting the files, the next step is to configure the installer and then run the installation.

The installation of Oracle Big Data SQL on the Hadoop cluster is deployed using the services provided by the cluster management server (Cloudera Manager or Ambari). The Jaguar install command uses the management server API to register the BDS service and start the deployment task. From there, the management server controls the process that deploys the software to the nodes of the cluster and installs it.

The Hadoop-side installation also generates the deployment bundle for the database side of the installation.

If a previous version of Oracle Big Data SQL is already installed, Oracle Big Data SQL upgrades the installation to Release 4.1.1.

Users of previous Oracle Big Data SQL releases, please note that there are changes to the BDSJaguar configuration parameters available in this release.

2.1 About Support for Multiple Database Versions (19c, 18c, 12.2, and 12.1)

Oracle Big Data SQL now supports Oracle Database 19c and also provides backward compatibility for Oracle Database 18c, 12.2, and 12.1.

You can use Oracle Big Data SQL 4.1.1 with any Oracle Database from release 12.1 to 19c. The database-related feature set available to you in Big Data SQL is determined by the Oracle Database version where it is installed. Each release of the database provides some advantages for Big Data SQL that its predecessors do not.

  • Oracle Database 19c (first supported in Big Data SQL 4.1) provides the ability to create hybrid partitioned tables that can include data in CSV or Parquet files, and other formats accessible to tools in Spark, Hadoop, and other big data technologies. See Hybrid Partitioned Tables in the Oracle Database VLDB and Partitioning Guide.

    Another Oracle Database 19c new feature that is useful to Big Data SQL is In-Memory External Tables.

    In addition, an installation of Big Data SQL on an 19c database system has all of the functionality available to Big Data SQL on 18c databases.

  • With Oracle Database 18c (which is supported by Oracle Big Data SQL 4.0 and later), you can access object stores in the cloud through the ORACLE_BIGDATA driver. 18c also enables Big Data SQL to perform aggregation offload, in which processing of aggregations in queries against data in Hadoop is pushed down to the Hadoop cluster.
  • Oracle Database 12.1 and 12.2 are fully supported in this release. However, Big Data SQL installations on these databases do not enable you to leverage the newer capabilities that are available with 18c and 19c. With 12.1 and 12.2, Big Data SQL functionality is equivalent to Big Data SQL 3.2.1.1.

This backward compatibility enables you to install and administer release 4.1.1 in a mixed environment that includes both Oracle Database 19c, 18c, and 12c.

See Also:

The Jaguar Configuration Parameter and Command Reference in this chapter shows you how to configure support for Oracle Database versions when you install the Hadoop side of Big Data SQL.

2.2 Before You Start the Hadoop-Side Installation

Check to ensure all DataNodes of the cluster meet prerequisites.

2.2.1 Check Hadoop-Side Prerequisites

You can run bds_node_check.sh on all cluster DataNodes prior to installing Oracle Big Data SQL. This is a quick way to check if each node meets the installation criteria. You can see exactly what needs to be updated.

Running bds_node_check.sh is not required, but is recommended. The Jaguar installer runs the same pre-checks internally, but when Jaguar runs the pre-checks it also starts and stops the cluster management server. Furthermore, the installation stops in place when it encounters a node that does not meet the prerequisites. Each time this happens, you then need to fix the readiness errors on the node in order to continue. Running bds_node_check.sh as a first step contributes to a smoother installation.

You can use this same script to check for the prerequisites when you add new nodes to the cluster.

Deploying and Running bds_node_check.sh

The script checks the local node where it is run. It does not check all nodes in the cluster.

  1. Find the script on the cluster management server in the install directory created when you executed ./BDSJaguar-4.1.1.run.
    $ ls <Big Data SQL Install Directory>
    BDSJaguar
    bds_node_check.sh
    $ cd <Big Data SQL Install Directory>
  2. Use your preferred method to copy the script to a node that you want to check.
    $ scp bds_node_check.sh oracle@<node_IP_address>:/opt/tmp 
  3. Log on to the node and run the script.
    $ ./bds_node_check.sh

Checking for Missing Prerequisites in the bds_node_check.sh Output

The report returned by bds_node_check.sh inspects the node both for Jaguar installer prerequisites and prerequisites for support of communications with Query Server on its edge node. If you do not intend to install Query Server, you can ignore that subset of the prerequisites.

bds_node_check.sh: BDS version 4.1.1 (c) 2020 Oracle Corporation
bds_node_check.sh:
bds_node_check.sh: Starting pre-requirements checks for BDS Jaguar
bds_node_check.sh: Total memory 64240 >= 40960 correct
bds_node_check.sh: vm_overcommit_memory=0 correct
bds_node_check.sh: shmmax=4398046511104, shmall=1073741824, PAGE_SIZE=4096 correct
bds_node_check.sh: shmmax=4398046511104 >= total_memory=67360522240 + 1024 correct
bds_node_check.sh: swappiness=10 correct
bds_node_check.sh: Total cores 32 >= 8 correct
bds_node_check.sh: Size of socket buffer rmem_default 4194304 >= 4194304 correct
bds_node_check.sh: Size of socket buffer rmem_max 8388608 >= 4194304 correct
bds_node_check.sh: Size of socket buffer wmem_default 4194304 >= 4194304 correct
bds_node_check.sh: Size of socket buffer wmem_max 8388608 >= 4194304 correct
bds_node_check.sh: dmidecode installed
bds_node_check.sh: net-snmp installed
bds_node_check.sh: net-snmp-utils installed
bds_node_check.sh: perl-XML-SAX installed
bds_node_check.sh: perl-XML-LibXML installed
bds_node_check.sh: perl-libwww-perl installed
bds_node_check.sh: perl-libxml-perl installed
bds_node_check.sh: libaio installed
bds_node_check.sh: glibc installed
bds_node_check.sh: libgcc installed
bds_node_check.sh: libstdc++ installed
bds_node_check.sh: libuuid installed
bds_node_check.sh: perl-Time-HiRes installed
bds_node_check.sh: perl-libs installed
bds_node_check.sh: perl-Env installed
bds_node_check.sh: libcgroup-tools installed
bds_node_check.sh: rpm found
bds_node_check.sh: scp found
bds_node_check.sh: curl found
bds_node_check.sh: unzip found
bds_node_check.sh: zip found
bds_node_check.sh: tar found
bds_node_check.sh: uname found
bds_node_check.sh: perl found
bds_node_check.sh: cgget found
bds_node_check.sh:
bds_node_check.sh: Optionally, if this node will be running the Jaguar installer,
bds_node_check.sh: it must have at least python version 2.7.5
bds_node_check.sh: with cryptography module available
bds_node_check.sh: Testing with /usr/bin/python
bds_node_check.sh: Python version 2.7.5, correct
bds_node_check.sh: Python cryptography module available, correct
bds_node_check.sh:
bds_node_check.sh: All pre-requirements were met for BDS Jaguar
bds_node_check.sh:
bds_node_check.sh: Starting pre-requirements checks for BDS Query Server
bds_node_check.sh: Open files 131072 >= 131072 correct
bds_node_check.sh: expect installed
bds_node_check.sh: procmail not installed
bds_node_check.sh: oracle-database-preinstall-19c not installed
bds_node_check.sh: rpm found
bds_node_check.sh: scp found
bds_node_check.sh: curl found
bds_node_check.sh: unzip found
bds_node_check.sh: zip found
bds_node_check.sh: tar found
bds_node_check.sh: uname found
bds_node_check.sh: perl found
bds_node_check.sh: cgget found
bds_node_check.sh: No database instances running on this node, correct
bds_node_check.sh: /etc/oracle/olr.loc file does not exist
bds_node_check.sh: /etc/oracle/ocr.loc file does not exist
bds_node_check.sh:
bds_node_check.sh: 2 error(s) found for BDS Query Server pre-requirements

2.2.2 Check Memory Requirements

Support for Oracle Database 12.1 may require additional memory.

Oracle Big Data SQL provides backward compatibility with Oracle Database 12.1 and 12.2 . However compatibility with Oracle Database 12.1 incurs an additional cost in memory. Oracle Database 12.1 support requires that the Hadoop nodes run an older offload server (in addtion to the offload server normally present). The overhead of running this additional offload server is a resource expense that you can prevent if you do not need to support Oracle Database 12.1.

If You Need to Support Oracle Database 12.1 for this Cluster:

Be sure that the DataNodes in the Hadoop cluster have enough memory. The minimum memory requirement per Hadoop node for an installation that supports full database compatibility, including 19c, 18c, 12.2, 12.1 is 64 GB.

Also check to be sure that the memory cgroup upper limit is set to allow Oracle Big Data SQL to consume this much memory.

If You do not Need to Support Oracle Database 12.1 for this Cluster:

Be sure to choose the right setting for the database_compatibility value in the Jaguar configuration file (bds-config.json or other). The options for this parameter are: "12.1", "12.2", "18", "19", and "full". It's important to note that both "12.1" and "full" trigger the startup of the additional offload server to support 12.1.

See Also:

The Jaguar Configuration Parameter and Command Reference describes the database_compatibility parameter.

2.2.3 Plan the Configuration and Edit the Jaguar Configuration File

Before you start, consider the questions below.

Answering these questions will help clarify how you should edit the Jaguar configuration file. (See the Jaguar Configuration Parameter and Command Reference in this chapter.)

  • Do you plan to access data in object stores (S3, Azure, or Oracle Object Store)?

    If so, then in the Jaguar configuration file you need to enable this access and also define some proxy settings that are specific to object store access.

  • Do you want to install the optional Query Server?

    If so, several things are required before you run Jaguar in order to install Oracle Big Data SQL:

    • Identify a cluster edge node to host the optional Query Server.

      A dedicated node is strongly recommended. Query Server is resource-intensive and cannot run on node hosting either the DataNode or BDSSERVER roles.

    • Download and unzip the Query Server bundle and then execute the run file.
      The Query Server is in a separate deployment bundle from the Jaguar installer (BDSExtra-4.1.1-QueryServer.zip). Before running Jaguar, download and unzip this bundle from https://edelivery.oracle.com/. Then execute the Query Server run file. Also, in the configuration file submitted to Jaguar, define the two related parameters in the edgedb section of the file -- node and enabled.
         "edgedb": {                                                                     
              "node"    : "<some server>.<some domain>.com",                                
              "enabled" : "true"                                                          
          }              
  • Does your Hadoop environment need to connect to more than one Oracle Database version?
    By default, Big Data SQL allows connections from 12.1, 12.2, 18c, and 19c databases. However, the offloaders that support these connections are resource-intensive, particularly for memory consumption on the nodes of the cluster. If you do not need to support all three releases, you can save resource by turning off either the 12.1 or 12.2 offloader. (Note that the 12.2 offloader actually supports 12.2, 18c, and 19c.) You set this in the configuration passed to the Jaguar installer. For example, you can enter this string to allow connections from 12.1 databases only:
    "database_compatibility" : [ "12.1" ]
    If you specify "12.2", "18c", or "19c", the 12.1 offload is not enabled:
    "database_compatibility" : [ "12.2" ]
  • Do you want to enable Database Authentication in order to validate connection requests from Oracle Database to Big Data SQL server processes on the Hadoop DataNodes?

    Database Authentication in the network connection between the Oracle Database and Hadoop is set to “true” in the configuration by default. You have the option to disable it by setting the database_auth_enabled to “false”:

    "database_auth_enabled" : "false",
  • Do you want to use the Multi-User Authorization feature?

    Multi-User Authorization enables you to grant users other than oracle permissions to run SQL queries against the Hadoop cluster. Multi-User Authorization can be used in conjunction with Sentry's role-based access control to provide improved control over user access.

    The first step in setting up Multi-User Authorization is to set these parameters in the security section of the configuration file:
    "impersonation_enabled" : "true",
    "impersonation_usehosts" : "true",
    "impersonation_blacklist" : "hdfs,hive"
    Note that you can add any account to the blacklist.
  • Are the Hadoop cluster and the Oracle Database system going to communicate over Ethernet or over InfiniBand? Also, do the Hadoop nodes have more than one network interface?

    See the use_infiniband and selection_subnet parameters. (The selection_subnet does not apply to Oracle Big Data Appliance.)
    "use_infiniband" : "false",
    "selection_subnet" : "5.32.128.10/21"
    By default use_infiniband is set to false. Ethernet is the default protocol.
  • Are you going to pre-download the Hadoop and Hive client tarballs and set up a local repository or directory where the installer can acquire them, or, will you allow Jaguar to download them directly from the Cloudera or Hortonworks repository on the Internet (the default behavior)?

    For Cloudera releases prior to 6.0, you can use the url or dir parameters in BDS-config.json (the Jaguar installer's configuraton file) to specify an arbitrary download location. If Internet access is via proxies, you can also set the http_proxy and https_proxy parameters in BDS-config.json.

    Note:

    On Big Data Appliance only, if you use the built-in Mammoth or bdacli utilities to install Big Data SQL the clients are automatically installed for you. However, if you use the Jaguar to install Big Data SQL on Big Data Appliance as well as other supported Hadoop platforms, you do have to provide the path in BDS-config.json if the location is other than the public repository.

    For Big Data SQL on Cloudera 6.x, the default is also automatic download of the clients from the public repository. However, in these environments you cannot specify a different repository in the Jaguar configuration file. Instead, the CLI for installer on the database side provides the --alternate-repo parameter. Use this parameter to pass the client download location to the installer. See --alternate-repoin the Command Line Parameter Reference for bds-database-install.sh

  • If the network is Kerberos-secured, do you want the installer to set up automatic Kerberos ticket renewal for the Kerberos principal on the Hadoop side and the Oracle Database side?

    See the parameters in the kerberos section:
    "principal" : "<oracle or other>/mycluster@MY.<DOMAIN>.COM",
    "keytab" : "/home/oracle/security/oracle.keytab",
    "hdfs-principal" : "hdfs/mycluster@MY.<DOMAIN>.COM",
    "hdfs-keytab" : "/home/hdfs/security/hdfs.keytab"

    The Kerberos principal and keytab identified here are used on the Hadoop side. They also copied into the database-side installation bundle. You can either use the same principal or a different principal on the database side. See --alternate-principal in Command Line Parameter Reference for bds-database-install.sh.

  • Do you want the Oracle Big Data SQL install process to automatically restart services that are in a stale state?

    By default, stale services are restarted automatically. If you want to suppress this, you can set the restart_stale parameter in the configuration file to “false”.

  • Is the Hadoop cluster using the default REST API port for CDH or Ambari?

    If not, set the ports parameter.

  • Are the HDFS or Hive daemons in the Hadoop cluster owned by non-default groups and/or users?

    By default, HDFS daemons are owned by the hdfs user in the hdfs group. Hive daemons are by default owned by the hive user in the hive group. If these defaults have been changed, use the parameters in the hadooop_ids section of the configuration file to identify the current groups and users for these daemons: hdfs_user, hdfs_group, hive_user, hive_group.

Note:

Setting these parameters in the configuration file does not complete the set up for some features. For example, to enable Database Authentication, you must also pass a special -—requestdb parameter to the Jaguar utility in order to identify the target database or databases. There are also steps required to generate and install the security key used by this feature. To enable Multi-User Authorization, you start by setting the Hadoop Impersonation parameters in the configuration file, but also need to set up the authorization rules. The steps to complete these setups are provided where needed as you work through the instructions in this guide.

2.3 About the Jaguar Utility

Jaguar is a multifunction command line utility that you use to perform all Oracle Big Data SQL operations on the Hadoop cluster.

Jaguar currently supports these operations:

  • install

    • Deploys Oracle Big Data SQL binaries to each cluster node that is provisioned with the DataNode service.

    • Configures Linux and network settings for bd_cell (the Oracle Big Data SQL service) on each of these nodes.

    • Generates the bundle that installs Oracle Big Data SQL on the Oracle Database side. It uses the parameter values that you set in the configuration file in order to configure the Oracle Database connection to the cluster.

  • reconfigure

    Modifies the current configuration of the installation (according to the settings in the configuration file provided).

  • databasereq

    Generates a request key file that contains one segment of the GUID-key pair used in Database Authentication. (The databasereq operation performs this function only. For install and reconfigure, request key generation is an option that can be included as part of the larger operation.)

  • databaseack

    Perform the last step in Database Authentication setup -- install the GUID-key pair on all Hadoop DataNodes in a cluster in order to allow queries from the Oracle Database that provided it.

  • sync_principals

    Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users for Query Server.

  • uninstall

    Uninstalls Oracle Big Data SQL from all DataNodes of the Hadoop cluster.

See Also:

Jaguar Operations in the next section provides details and examples.

2.3.1 Jaguar Configuration Parameter and Command Reference

This section describes the parameters within the Jaguar configuration file as well as Jaguar command line parameters.

Configuration Parameters

The table below describes all parameters available for use in bds-config.json or your own configuration file. Only the cluster name parameter is always required. Others are required under certain conditions stated in the description.

Note:

When editing the configuration file, be sure to maintain the JSON format. Square brackets are required around lists, even in the case of a list with a single item.

Table 2-1 Configuration Parameters in bds-config.json (or in Customer-Created Configuration Files)

Section Parameter Type Description
cluster name String

The name of the cluster.

For CDH clusters (Oracle Big Data Appliance or other), this name can be either the physical cluster name or the display name. The installer searches first by physical name and then by display name.

The name parameter is required only if Cloudera Manager will manage two or more clusters.

cluster database_compatibility string Select which Oracle Database versions must be supported.

Possible values: "12.1"| "12.2"| "18"|"19"| "full"

For example, either of these settings enables support for Oracle Database 12.1, 12.2, 18c, and 19c.

"database_compatibility" : [ "12.1" ]
"database_compatibility" : [ "full" ]

Either of the following settings enables support for Oracle Database 12.2, and 18c, but disable support for Oracle Database 12.1. By disabling support for Oracle Database 12.1 if it is not needed, you conserve some system resources, particularly memory.

"database_compatibility" : [ "12.2" ]
"database_compatibility" : [ "18" ]

The default is "full".

api hostname String Visible hostname for the cluster management server. In some scenarios, the visible hostname for Cloudera Manager or Ambari is not the same to the current hostname, for example, in High Availability environments.

Default: the local hostname.

api skip_health_check Boolean If "true", the cluster health check is skipped.

The cluster health check verifies that HDFS, Hive and Yarn services are running with good health and are not stale. Additionally, for CDH clusters, management services should be running with good health and not stale.

Default: "false".

api port Integer

Cloudera Manager or Ambari REST API port.

By default, on CDH clusters this port is 7183 for secured and 7180 for unsecured access. For Ambari, is 8443 for secured and 8080 for unsecured.

Optional.

api restart_stale Boolean

If “true”, then services with stale configurations are restarted at the end of install process. These services are HDFS NodeName, YARN NodeManager and/or Hive (depending upon the settings selected).

If “false”, the installation will finish but those services will remain on stale state. This is useful for avoiding unwanted service interruptions. You can then restart later when it is more convenient.

Optional. The default is “true”.

edgedb enabled Boolean Determines whether or not the Query Server functionality is enabled or not.

Default: "false".

edgedb node String

Hostname of the node where the Query Server database will be running (if enabled).

Note:

Because Query Server is resource-intensive, it is highly recommended that you install the database on a dedicated node. Query server cannot run on a node that is running the DataNode role, nor the BDSSERVER role.
object_store_support enabled boolean If "true", Oracle Wallet is set up both in the cluster and on the database system in order to allow access to Object Store.

Default: "false".

object_store_support cell_http_proxy string If object store access support is enabled, this parameter is required for access to an object store from the Hadoop cluster side, even for empty values. Follows same rules as the Linux http_proxy variable. For example: http://myproxy.<domain>.com:80. No default value.
object_store_support cell_no_proxy string Like cell_http_proxy, supports access to object stores and is also required if this access is enabled, even for empty values. Follows same syntax rules as the Linux no_proxy environment variable. For example: localhost,127.0.0.1,.<domain>.com. No default value.
object_store_support database_http_proxy string Same description as cell_http_proxy, except that this parameter supports object store access from the database side, not the Hadoop side.
object_store_support database_no_proxy string Same description as cell_no_proxy, except that this parameter supports object store access from the database side, not the Hadoop side.
network http_proxy

https_proxy

String

Specify the proxy settings to enable download of the Hadoop client tarballs and cluster settings files.

If both of these strings are empty, the OS environment proxy settings are used.

By default, both strings are empty.

Using these two parameters in the configuration file is optional. If they are needed, you could instead set them externally as in export http_proxy=<proxy value>

Not applicable to Oracle Big Data Appliance

network extra_nodes List

List additional nodes where the BDSAgent should be installed

The BDSAgent and BDSServer roles are installed on all DataNodes instances. In addition, BDSAgent is installed on cluster nodes running HiveServer2 and HiveMetaStore instances. All remaining nodes are automatically excluded unless you add them here.

Default: empty

network excluded_nodes List Nodes that are not hosting the DataNode role can be excluded by listing them within this parameter.
security impersonation_enabled Boolean

If "true", Hadoop impersonation for Multi-user Authorization support is enabled. This sets up the oracle OS user as the Hadoop proxy user and propagates the proxy user’s black list to the database nodes. If "false", this feature is not enabled.

Default value: "true" for Oracle Big Data SQL 4.x and higher.

Note:

For CDH clusters, if the Sentry service is running, this setting is overidden and impersonation is enabled regardless of the value of this parameter.
kerberos principal String

The fully-qualified Kerberos principal name for a user. Before 4.1 the principal had to be oracle user. Starting with 4.1 the principal can be that of the oracle user or any other user. The Jaguar installer does not create the principal or the keytab. These must already exist.

The principal has three parts:

  • The User Name:

    The name of the Linux account associated with the principal. Note that Kerberos principal names are case-sensitive.

  • Qualifier: "/<qualifier>." This is optional information to help you organize and identify principals.

  • Domain: "@MY.<DOMAIN>.COM." This is required information managed by the KDC.

The Oracle Big Data SQL installation uses the Kerberos principal field (and keytab field below) to set up Kerberos authentication and automated Kerberos ticket renewal for the user represented by the principal. It does this on the Hadoop cluster. It also does it on the Oracle Database system automatically if two conditions are true:

  • The principal to be used on the Hadoop cluster and the Oracle Database system are the same.
  • The principal for the database owner does not exist. If the database owner principle does exist, it takes precedence and is used for authentication.

Required for secured clusters.

Note: Later, when you perform the database-side installation of Big Data SQL, review the description of the --alternate-principal install parameter in the Command Line Parameter Reference for bds-database-install.sh . This reference provides steps to take prior the database-side installation if the principal used on the Hadoop and database sides are not the same, or if the principal required is not that of the database owner.

kerberos db-service-principal String

Specifies a principal on the KDC server for use by Query Server (and only Query Server). It is not used for authentication against an external Oracle Database.

Both db-service-principal and db-service-keytab are used to validate the Kerberos ticket presented by a client . Note that the parameters SQLNET.AUTHENTICATION_KERBEROS5_SERVICE and SQLNET.KERBEROS5_KEYTAB in sqlnet.ora will be set accordingly.

The qualifier for the principal name must match the fully qualified domain name of the node where the Query Server will be running.

Required for secured clusters.

kerberos db-service-keytab String

Fully-qualified location of the keytab file for the principal specificed with db-service-principal.

Be sure to store the keytab in a location that is accessible to the Jaguar installer.

kerberos sync_principals Boolean

The sync_principals parameter specifies whether or not Jaguar automatically gets a list of principals from a KDC running on a cluster node and then uses the list to create externally-identified database users for Query Server.

If set to true, then an automatic synchronization with Kerberos principals occurs during Jaguar install and reconfigure operations. The user can also call this synchronization at any time by invoking the sync_principals operation of Jaguar on the command line.

Default: "true".

kerberos hdfs-keytab String

Fully-qualified path to the principal keytab file. A keytab file is created for each principal on the KDC server. It must exist in a location accessible to the Jaguar installer.

Required for secured clusters.

kerberos keytab String

Fully-qualified location for the principal’s keytab file name.

Copy the keytab file to a location accessible to the Jaguar installer and set the path as the value of this parameter.

kerberos hdfs-principal String

Fully-qualified Kerberos principal name for the "hdfs" user. It has three parts: User name, Qualifier, and Domain.

The User name is the fully-qualified principal name for the hdfs user. Qualifier is the cluster name prefixed by a forward slash, as in /mycluster. Domain is specified in the form @MY.<DOMAIN>.COM. All three are required. The principal name is defined on the KDC.

Required for secured clusters.

repositories dir List

List of directories where the Hadoop clients for deployment on the database side are located. These directories can be on the local file system or on NFS. Directories are searched in the order listed. By default, the list is empty. If the dir list has any entries, these are searched before the URL list is searched, since this option should provide the fastest access to the clients. To give the installer the quickest access to the tarballs, you could set up a local repository, download the tarballs separately though a direct Internet connection, copy them into a directory on the same node where the Oracle Big Data SQL installer will run, and list that directory in the dir parameter.

Optional.

Not applicable to Oracle Big Data Appliance, which already includes the required clients.

Important: The dir and url parameters are not supported on Cloudera 6.x systems. On these systems, you can specify a repository when you run the installer on the database side.

repositories url List

This is the list of URLs where the Hadoop client tarballs for deployment on the database side are located. If you data center already has repositories set up for access via HTTP, then you may prefer to maintain the Hadoop tarballs in that repository and use URL parameter for Oracle Big Data SQL installations. The URLs can be to the localhost, an internal network, or a site on the Internet (if the node has Internet access). The URLs are tried in the order listed. Note that internal proxy values and/or OS environment proxy settings must be set to allow this access if needed.

If access to all listed repositories fails and/or Internet access is blocked, the database installation bundle is not created and a warning message is displayed. After correcting any problems and providing access to a repository, you can re-run the installer using the reconfigure and the installer should successfully generate the database-side installation bundle. Note that reconfigure detects and implements changes according the current directives in the configuration file. It does not uninstall and reinstall Oracle Big Data SQL on the cluster.

Not applicable to Big Data Appliance, where the tarballs are stored in a local repository in the cluster and the location is automatically added to the configuration file.

network use_infiniband Boolean

If “true”, the communication will be set through private network interface, if "false" by client network interface.

Used for Oracle Big Data Appliance clusters only.

Default value: “false”.

network selection_subnet String

If Hadoop cluster nodes have several network interfaces, you can use selection_subnet to select one. The selected IP address will be the nearest to the selection subnetwork.

If the Hadoop cluster nodes have only one network interface, this parameter is ignored.

The default value depends upon these conditions:

  • On non-Oracle commodity Hadoop clusters (CDH or HDP) the default selection is 0.0.0.0/0. (If a cluster node has several IP addresses, the lowest address is selected.)

  • On Oracle Big Data Appliance, the default is either the private or client IP address, depending upon the setting of the use_infiniband parameter.

Note for Oracle Big Data Appliance Users:

It's possible to configure several networks on an Oracle Big Data Appliance. If multiple networks exist, then this parameter must be set in order to select a specific network.
security database_auth_enabled Boolean

If "true", the database authentication through the GUID-key mechanism is enabled. This requires an extra step in the installation process in order to set up the database GUID-key pair on the cluster side.

If "false", the feature will not be enabled.

Default value: "true".

security impersonation_blacklist String

The Hadoop proxy users blacklisted for impersonation. This parameter is used only if Hadoop impersonation is enabled.

Since this is a required setting on the Oracle Database side, it is provided with a default value of "dummy" in order to avoid extproc errors that can occur if Hadoop Impersonation is not enabled.

security impersonation_usehosts Boolean

If "true", the proxy hosts variable is set to the IP address of the database node.

If "false", the proxy hosts variable is set to the wildcard: "*".

Default value: "true".

memory min_hard_limit Integer

The minimum amount of memory reserved for Big Data SQL, in megabytes. This parameter is used on CDH clusters (Oracle Big Data Appliance and others). It is not used on HDP clusters. By default, the value is 32768 MB (32 GB) .

If you set the database_compatibilityparameter to "full", then the value of min_hard_limit must be 64 MB.

memory max_percentage Integer

On CDH clusters (Oracle Big Data Appliance and others) this parameter specifies the percentage of memory on each node to reserve for Big Data SQL. This percentage is considered from a total amount of: NodeManager if YARN ResourceManager is enabled for that node. Physical memory if not.

If the YARN Resource Manager is enabled for the node, then percentage should be based on the total amount of memory used by the NodeManager. Otherwise it should be a percentage of physical memory.

This parameter is ignored on HDP clusters.

hadoop_ids hdfs_user String The operating system user that runs the Hadoop HDFS daemons.

Default value: "hdfs"

Note: By default, Jaguar assumes Hive and HDFS usernames and groups. But if you used different names in your Hadoop installation, then use the hadoop_ids parameters to identify them.

hadoop_ids hdfs_group String The operating system group that runs the Hadoop HDFS daemons.

Default value: "hdfs"

hadoop_ids hive_user String The operating system user that runs the Hadoop Hive daemons.

Default value: "hive"

hadoop_ids hive_group String The operating system group that runs the Hadoop Hive daemons.

Default value: "hive"

Note:

After Oracle Big Data SQL is installed on the Hadoop cluster management server, you can find configuration file examples that demonstrate various parameter combinations in the <Big Data SQL Install directory>/BDSjaguar directory:
example-bda-config.json
example-cdh-config.json 
example-kerberos-config.json
example-localrepos-config.json
example-subnetwork-config.json
example-unsecure-config.json
You can see all possible parameter options in use in example-cdh-config.json .

See Also:

See the Appendix Downloading the Correct Versions of the Hadoop, Hive, and HBase Clients for a Local Repostory for suggestions that can help with the setup of client tarball downloads.

Jaguar Operations

The table below lists the full set of operations performed by the Jaguar utility on the Hadoop side of the Oracle Big Data SQL installation.

The general syntax for Jaguar commands is as follows. The --requestdb parameter does not apply to all Jaguar commands.

# ./jaguar {--requestdb <comma-separated database names> | NULL } <action> { bds-config.json | <myfilename>.json | NULL } 

Examples:

# ./jaguar install
# ./jaguar install bds-config.json
# ./jaguar install mycustomconfig.json 
# ./jaguar --requestdb orcl,testdb,proddb install
# ./jaguar --requestdb orcl install
# ./jaguar sync_principals

You can use the default bds-config.json or your own configuration file, or omit the configuration file argument (which defaults to bds-config.json).

About --requestdb:

The --requestdb parameter is required for the databasereq command, optional for install, and reconfigure, and non-applicable for other Jaguar commands. The parameter must be passed in to one of these operations in order to enable Database Authentication in the connection between a Hadoop cluster and a database. Unless you prefer to disable Database Authentication, it is recommended that you include --requestdb with the initial install operation. Otherwise, you will need perform an additional step later in order to generate the request key.

This parameter is functional only when Database Authentication (database_auth_enabled) is set to “true” in the configuration. (This setting is a configuration default and does not need to be explicitly set in the configuration file.)

Jaguar needs the database names in order to generate a unique .reqkey (request key) file for each database. When database_auth_enabled is set “true” at installation time, the --requestdb parameter is still optional. Post-installation you have the same option to send the request key using the reconfigure, and databasereq operations. Database Authentication is not implemented until you do all of the following:

  1. Ensure that database_auth_enabled is either absent from the configuration file or is set to ““true”. (It is “true” by default.)

  2. Include --requestdb in a Jaguar command:

    1. Run the Jaguar install or reconfigure and install the updated database-side installation bundle.

    2. Run Jaguar databasereq to generate an acknowledge key from the existing database side installation.

  3. Copy the generated ZIP file that contains the .ackkey file from the database-side installation directory to /opt/oracle/DM/databases/conf on the Hadoop cluster management server.

  4. Run the Jaguar databaseack command as described in the table below.

The table below shows the available Jaguar commands.

Table 2-2 Jaguar Operations

Jaguar Operation Supports --requestdb? Usage and Examples
install

The --requestdb parameter is not strictly required by the install operation, but you cannot enable Database Authentication if you do not generate a request key for each database.

--requestdb <comma-separated database list>
Y

Installs Oracle Big Data SQL on the Hadoop cluster identified in the configuration file and creates an installation bundle for the database side based on the parameters included in the configuration file (or default values for parameters not explicitly assigned value in the configuration file). Examples:

# ./jaguar --requestdb orcl,testdb,proddb install

No configuration file parameter is included in the above example. bds-config.json is the implicit default. You can specify a different configuration file as in./jaguar --requestdb mydb install myconfig.json

Note:

You may need to use the scl utility to ensure that the correct Python version is invoked:

scl enable python27 "./jaguar install"

On Big Data Appliance clusters running Oracle Linux 6 and Oracle Linux 7, scl is not needed in order call the correct Python version for Jaguar.

reconfigure Y

Modify the current installation by applying changes you have made to the configuration file (bds-config.json or other).

# ./jaguar reconfigure myconfigfile.json

Note that if you run ./jaguar reconfigure <config file> to reconfigure Oracle Big Data SQL on the Hadoop cluster, a corresponding reconfiguration is required on the Oracle Database side. The two sides cannot communicate if the configurations do not match. The Jaguar utility regenerates the database-side bundle files to incorporate the changes, You must redeploy the bundle on all database servers where it was previously installed.

The --requestdb argument is required if database_auth_enabled is set to “true” in the updated configuration file. This is so that Jaguar will generate .reqkey files that are included in the database-side installation bundle. Note that we let the configuration file parameter default to bds-config.json.

# ./jaguar --requestdb demodb,testdb,proddb1 reconfigure 
databasereq Y

Use this command to create the .reqkey file without repeating the Hadoop-side installation, or doing a reconfigure operation. (For example, if you forgot to include the --requestdb argument with the Jaguar install command), you can create a request key later with databasereq. This operation requires that database_auth_enabled is set to “true” (the default value) in the configuration.

`
# ./jaguar --requestdb demodb,testdb,proddb1 databasereq  
databaseack N

The “Database Acknowledge” process provides confirmation to the Oracle Big Data SQL installation on the Hadoop cluster that security features you enabled in the configuration file have been successfully implemented in the database-side installation. It then completes implementation of the selected security features on the Hadoop cluster side.

./jaguar databaseack bds-config.json

Only run databaseack if you chose to enable security features by setting either of these parameters in the configuration file to “true”:

  • "impersonation_enabled" : "true"

  • "database_auth_enabled" : "true"

If a database-side installation bundle is built with any of these features set to “true”, then the database-side installation from that bundle generates a ZIP file in the installation directory under $ORACLE_HOME on the database server. The format of the ZIP file name is <Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>-<FQDN of this database node>.zip. For example:

$ ls $ORACLE_HOME/BDSJaguar-4.1.1/cdh510-6-node1.my.<domain>.com/*.zip
$ cdh510-6-node1.my.<domain>.com-myoradb1.my<domain>.com.zip

Copy this zip archive back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server after the database-side installation is complete. Then, to fully enable the security features, run databaseack.

sync_principals N/A

Gets a list of principals from a KDC running on a cluster node and use it to create externally-identified database users in Query Server. You can do the same by including the similarly-named sync_principals parameter in a Jaguar configuration file during Jaguar install and reconfigure operations.

--object-store-http-proxy N/A Specify a different proxy for Object Store access than the one set in the configuration file.
--object-store-no-proxy N/A Sets a no-proxy value and overrides the no_proxy value that may be set in the configuration file.
uninstall N/A

Uninstall Oracle Big Data SQL from the Hadoop cluster.

The uninstall process stops the bd_cell process (the Oracle Big Data SQL process) on all Hadoop cluster nodes, removes all instances from Hadoop cluster, and release all related resources.

Note:

When Oracle Big Data SQL is uninstalled on the Hadoop side, any queries against Hadoop data that are in process on the database side will fail. It is strongly recommended that you uninstall Oracle Big Data SQL from all databases systems shortly after uninstalling the Hadoop component of the software.

2.4 Steps for Installing on the Hadoop Cluster

After you have set up the Jaguar configuration file according to your requirements, follow these steps to run the Jaguar installer, which will install Oracle Big Data SQL on the Hadoop cluster and will also generate a database-side installation bundle that you deploy to the Oracle Database system. In these steps, bds-config.json is the configuration filename passed to Jaguar. This is the default. Any file name is accepted, therefore you can create separate configuration files for installation on different clusters and save them in different files.

Note:

Jaguar requires Python 2.7 to 3.0. Versions greater than 3.0 are not supported by Oracle Big Data SQL at this time. If necessary, you can add a Jaguar-compatible version of Python as a secondary installation. Revisit the prerequisites section in the Introduction for details. If you are using Oracle Big Data Appliance, do not overwrite the Mammoth-installed Python release.
  1. Log on to the cluster management server node as root and cd to the directory where you extracted the downloaded Oracle Big Data SQL installation bundle.

  2. Cd to the BDSJaguar subdirectory under the path where you unzipped the bundle.

    # cd <Big Data SQL Install Directory>/BDSJaguar
  3. Edit the file bds-config.json.

    {
    "cluster": {
               "name": "<Your cluster name>"
               }
    }

    Add the parameters that you want to use in this installation.

    See Also:

    The cluster name is the only required parameter, but it is required only in environments where the configuration management service must manage more than one cluster. See the Jaguar Configuration Parameter and Command Reference for a description of all available parameters. You can see an example of a bds-config.json file populated with all available parameters in bds-config.json Configuration Examples.

    In the BDSJaguar directory, run the Jaguar install operation. Pass the install parameter and the configuration file name. (bds-config.json is the implicit default) as arguments to the Jaguar command. You may or may not need to include the --requestdb option.

    [root@myclusteradminserver:BDSjaguar] #  ./jaguar install <config file name>

    Note:

    By default, Database Authentication is set to true unless you set database_auth_enabled to “false” in the configuration file. If you enable Database Authentication, then either as part of the install operation or later, generate a “request key.” This is half of a GUID/key pair used in the authentication process. To generate this key, include the --requestdb parameter in the Jaguar install command line:
    [root@myclusteradminserver:BDSjaguar] # ./jaguar --requestdb mydb install
    
    If the install was run with database_auth_enabled is “true”, you can use the Jaguar databasereq command to generate the key after the database-side installation. Several other Jaguar commands can also generate the request key if you pass them the --requestdb parameter.

    Jaguar prompts for the cluster management service administrator credentials and then installs Oracle Big Data SQL throughout the Hadoop cluster. It also generates the database-side installation bundle in the db-bundles subdirectory. The following message is returned if the installation completed without error.

    BigDataSQL: INSTALL workflow completed.
  4. Check for the existence of the database side installation bundle:

    # ls <Big Data SQL Install Directory>/BDSJaguar/db-bundles
     bds-4.1.1-db-<cluster>-<yymmdd.hhmi>.zip

    This bundle is for setting up Oracle Big Data SQL connectivity Oracle database and the specific cluster defined in the bds-config.json (or other) configuration file. It contains all packages and settings files required except for an optional database request key file.

    If you included --requestdb in the install command, then the installation also generates one or more database request key files under the dbkeys subdirectory. You should check to see that this key exists.
    # ls <Big Data SQL Install Directory>/BDSJaguar/dbkeys
     cluster1db.reqkey

This completes the Oracle Big Data SQL installation on the Hadoop cluster.

See Also:

  • Working With Query Server in the Oracle Big Data SQL User's Guide. If you chose to install Query Server, you can connect and start working with it now. It is not dependent on completion of the Oracle Database side of the installation.
  • Post-Installation Tasks in this guide. Most of the tasks described are performed on the Hadoop system. You may want to complete those tasks before proceeding to the second half of the installation on the Oracle Database system. All of them are optional.

What Next?

After Jaguar has successfully installed Oracle Big Data SQL on the Hadoop cluster, you are done with the first half of the installation. The next step is to install Oracle Big Data SQL on the Oracle Database system that will run queries against the data on the Hadoop cluster.

To do this, copy the database-side installation bundle to any location on the Oracle Database system. Unless you set database_auth_enabled to “false” in the configuration file, then also copy over the .reqkey file generated by Jaguar.

Tip:

You only need to send a request key to a database once. A single request key is valid for all Hadoop cluster connections to the same database. If you have already completed the installation to connect one Hadoop cluster to a specific database, then the database has the key permanently and you do not need to generate it again or copy it over to the database again in subsequent cluster installations.

Go to Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL for instructions on unpacking the bundle and installing the database-side components of the software.

See Also:

An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Installation Example.