Oracle Big Data SQL is deployed using the services provides by the cluster management server. The installer program uses the management server API to register the service and start the deployment task. From there, the management server controls the process and deploys the software to the nodes of the cluster and installs it.
After installing Big Data SQL on the cluster management server, use the tools provided in the bundle to generate an installation package for the database server side.
Important Notes for Oracle Big Data Appliance Installations Prior to Release 4.8:
Oracle Big Data SQL 3.1 is decoupled from the bdacli
utility on Oracle Big Data Appliance systems prior to Release 4.8. On these systems, you cannot use the following bdacli
commands with Release 3.1 on Oracle Big Data Appliance 4.7 and earlier:
bdacli {enable|disable} big_data_sql
bdacli getinfo cluster_big_data_sql_enabled
bdacli {start | stop | restart | status} {big_data_sql_cluster | big_data_sql_server node_name}
On all supported Hadoop systems (listed in the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support) ,you can use the setup-bds
script as described in this guide to install, extend, reconfigure, and uninstall Oracle Big Data SQL 3.1 on a Hadoop cluster. You can also use the cluster management server interface to stop or stop Oracle Big Data SQL processes.
On Oracle Big Data Appliance 4.8 and greater, because Oracle Big Data SQL integration with bdacli
has been restored, you have the additional option of using bdacli
to install and uninstall (enable and disable) Oracle Big Data SQL. You also have access to all other bdacli
commands for administering Oracle Big Data SQL.
See Also:
olink:BIGOG-GUID-685D1923-EC2A-42B6-8D97-1DFB8239D57C in the Oracle Big Data Appliance Owner’s Guide.The following active services, installed packages, and available system tools are prerequisites to the Oracle Big Data SQL installation.
Platform requirements, such as supported Linux distributions and versions, as well as supported Oracle Database releases and required patches and are not listed here. See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for this information.
Services Running
These Apache Hadoop services must be running on the cluster.
HDFS
YARN
Hive
You do not need to take any extra steps to ensure that the correct HDFS and Hive clients URLs are specified in the database-side installation bundle.
Important:
The Apache Hadoop services listed above must be installed as parcels on Cloudera CDH and as stacks on Hortonworks HDP. Installation of these services via RPM is not supported in either case.About HBase:
HBase is not a prerequisite for the installation of Oracle Big SQL on the Hadoop cluster. However, the installation on the Oracle Database nodes does require an HBase client. If HBase is present on the cluster during the Hadoop-side installation, then no action on your part is required. In this case, when you create the package that is installed on the Oracle Database nodes, the installation package builder automatically includes the URL to the download site of the compatible HBase client JAR.
If HBase is not installed on the Hadoop cluster, then when you create the database-side installation package, use the --hbase-client-ws
parameter to add the URL to the the installation package. If an HBase service is installed on the Hadoop cluster, then the URL should point to the download site for a compatible client. When you install the database side of Oracle Big Data SQL, the installer will download the HBase client from the URL provided.
Packages
The following packages must be pre-installed on all Hadoop cluster nodes before installing Oracle Big Data SQL.
Oracle JDK version 1.7 or later
dmidecode
net-snmp, net-snmp-utils
perl
PERL LibXML – 1.7.0 or higher, e.g. perl-XML-LibXML-1.70-5.el6.x86_64.rpm
perl-libwww-perl, perl-libxml-perl, perl-Time-HiRes, perl-libs, perl-XML-SAX, perl-Env
The Java JDK is available for download on the Oracle Technology Network.
The yum
utility is the recommended method for installing these packages:
yum -y install dmidecode yum -y install net-snmp net-snmp-utils yum -y install perl perl-libs yum -y install perl-Time-HiRes perl-libwww-perl yum -y install perl-libxml-perl perl-XML-LibXML perl-XML-SAX yum -y install perl-Env yum -y localinstall <JDK RPM that you downloaded>
System Tools
curl
gcc
libaio
rpm
scp
tar
unzip
wget
yum
zip
yum install -y libaio gcc
Environment Settings
The following environment settings are required prior to the installation.
NTP enabled
Ensure that /usr/java/default
exists and is linked to the appropriate Java version (if $JAVA_HOME does not exist).
The path to the Java binaries must exist in /usr/java/latest
.
The installation process requires Internet access in order to download some packages from Cloudera or Hortonworks sites. If a proxy is needed for this access, ensure that the following Linux environment variables are properly set:
http_proxy
and https_proxy
no_proxy
Set no_proxy
to include the following: "localhost,127.0.0.1,<Comma—separated list of the hostnames in the cluster (in FQDN format).>
".
On Cloudera CDH, clear any proxy settings in Cloudera Manager administration before running the installation. You can restore them after running the script that creates the database-side installation bundle (bds-database-create-bundle.sh
).
Python 2.7 (for the Oracle Big Data SQL Installer)
The Oracle Big Data SQL installer requires Python 2.7 locally on the node where you run the installer. This should be the same node where the cluster management service (CM or Ambari) is running.
If any version of Python 2.7.x is already installed, you can use it to run the Oracle Big Data installer.
If an earlier version than Python 2.7.0 is already installed on the cluster management server and you need to avoid overwriting this existing installation, you can add Python 2.7.x as a secondary installation.Restriction:
On Oracle Big Data Appliance do not overwrite or update the pre-installed Python release. This restriction may also apply other supported Hadoop platforms. Consult the documentation for the CDH or HDP platform you are using.On Oracle Linux 5, add Python 2.7 as a secondary installation. On Oracle Linux 6, both Python 2.6 and 2.7 are pre-installed and you should use the provided version of Python 2.7 for the installer. Check whether the default interpreter is Python 2.6 or 2.7. To run the Oracle Big Data SQL installer, you may need to invoke Python 2.7 explicitly. On Oracle Big Data Appliance, SCL is installed so you can use to enable version 2.7 for the shell as in this example:
# scl enable python27 "./setup-bds install bds-config.json"
Below is a procedure for adding the Python 2.7.5 as a secondary installation.
Tip:
If you manually install Python, first ensure that the openssl-devel
package is installed:
# yum install -y openssl-devel
If you create a secondary installation of Python, it is strongly recommended that you apply Python update regularly to include new security fixes. Do not update the mammoth-installed Python unless directed to do so by Oracle.
# pyversion=2.7.5 # cd /tmp/ # mkdir py_install # cd py_install # wget https://www.python.org/static/files/pubkeys.txt # gpg --import pubkeys.txt # wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz.asc # wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz # gpg --verify Python-$pyversion.tgz.asc Python-$pyversion.tgz # tar xfzv Python-$pyversion.tgz # cd Python-$pyversion # ./configure --prefix=/usr/local/python/2.7 # make # mkdir -p /usr/local/python/2.7 # make install # export PATH=/usr/local/python/2.7/bin:$PATH
If Oracle Big Data SQL is Already Installed
If Oracle Big Data SQL is already installed, please read Upgrading From a Prior Release of Oracle Big Data SQL before proceeding.
Important:
If Oracle Big Data SQL 3.0.1 or earlier is already enabled on the Hadoop cluster, it must be disabled or removed before installing Release 3.1.
For Release 3.0.1 or earlier on Oracle Big Data Appliance, run bdacli disable big_data_sql
. This will disable Oracle Big Data SQL on all nodes of the cluster.
The uninstall on other Hadoop platforms (HDP or non-Oracle CDH-based systems) differs for Oracle Big Data SQL 3.0.1 and 3.0:
For Release 3.0.1, run setup-bds
from the original installation with the --uninstall
parameter, as in ./setup-bds --uninstall bds-config.json
.
For Release 3.0 there is no programmatic uninstall. You can find the manual procedure for uninstalling Release 3.0 in the Oracle Big Data SQL 3.0 User's Guide.
Pre-3.0 releases of Oracle Big Data SQL were not supported on systems other than Oracle Big Data Appliance.
You can check the DataNodes of the cluster for Oracle Big Data SQL installation prerequisites as follows.
root
, run the following checks on each DataNode:
# yum -y install dmidecode # yum -y install net-snmp net-snmp-utils # yum -y install perl perl-libs # yum -y install perl-Time-HiRes perl-libwww-perl # yum -y install perl-libxml-perl perl-XML-LibXML perl-XML-SAX # ls -l /usr/java
The first step of the Oracle Big Data SQL installation is to run the installer on the Hadoop cluster management server (where Cloudera Manager runs on a CDH system or Ambari runs on an HDP system). As post-installation task on the management server, you then run the script that prepares the installation bundle for the database server.
On the Hadoop side, you manually install the software on the cluster management server only. The installer uses CDH or Ambari to analyze the cluster configuration and automatically deploys and installs Oracle Big Data SQL on all nodes where it is required.
Note:
About Installer SecurityPasswords for Ambari and Cloudera Manager are not be passed in on the command line and are not be saved in any persistent files (including log or trace files) during the installation or after the installation is complete.
No temporary or persistent world-writable files are created.
No setuid or setgid files are used.
The installer works with hardened Oracle Database environments as well as hardened CDH and HDP clusters as described in the Cloudera CDH and Hortonworks HDP security documentation.
Temporary Workaround may be Required:
The setup-bds
installation script sets some Hive auxiliary parameters, which may overwrite existing custom settings. For example, to enable Hive operation logging for the installation, the script sets hive.server2.logging.operation.enabled=true
. After setup-bds
has finished (and before you run the next script in the process, bds-database-create-bundle.sh
), check your Hive auxiliary parameter settings and, if needed, restore any that may have been overwritten. This workaround will be unnecessary in a subsequent release of Oracle Big Data SQL.
In an Oracle Big Data SQL uninstall (which also uses setup-bds
) the uninstall completely removes the hive.server2.logging.operation.enabled parameter
, which effectively sets it to true
, the default value.
Extract the files from BigDataSQL-<Hadoop_distribution>-<version>.zip
archive. Then, configure bds-config.json
and run setup-bds
with the install
argument. This installs Oracle Big Data SQL on the cluster management server as well as on all Hadoop DataNodes in the cluster.
Run the database bundle creation script, bds-database-create-bundle
. This generates the database bundle file that you will run on the Oracle Database server in order to install Oracle Big Data SQL on the Oracle Database server.
Check the parameters in the database bundle file and adjust as needed.
After you have checked and (if necessary) edited the database bundle file, copy it over to the Oracle Database server and run it as described in Installing on the Oracle Database Server
Install Oracle Big Data SQL on the Cluster Management Server
Important: Patch 25796576 Required For Some Systems:
Install this patch if your system meets the following criteria:
For systems other than Oracle Big Data Appliance: using CDH, and running on Oracle Linux 6, Oracle Linux 7, Red Hat 6 or Red Hat 7.
For Oracle Big Data Appliance: when Oracle Big Data SQL is installed with the installer described in this document (in other words, when it is not installed by Mammoth or the bdacli utility).
For instructions on installing the patch and more detail on when the patch is required, see the following document in My Oracle Support: One-Off Patch 25796576 On An Oracle Big Data Appliance CDH Cluster OL6 with Big Data SQL 3.1 (Doc ID 2269180.1). This does not apply to Oracle Linux 5.
To install Big Data SQL on the cluster management server:
Copy the appropriate zip file (BigDataSQL-<Hadoop_distribution>-<version>.zip
) to a temporary location on the cluster management server (the node where Cloudera Manager or Ambari is running).
Unzip file.
For CDH systems, run Patch 25796576 at this point and then proceed with the Oracle Big Data SQL installation.
Change directories to BDSSetup
.
Edit the configuration file.
Table 2–4 below describes the use of each configuration parameter.
For CDH, edit bds-config.json
, as shown in this example. Any unused port will work as the web server port.
{ "cluster": { "name": "cluster1", "display_name": "Cluster 1" }, "database":{ "ip": "10.11.12.13/14" }, "memory": { "min_hard_limit": 8192 }, "webserver": { "port": 80 } }
For HDP, edit bds-config.json
as in this example. Notice that the HDP configuration file does not include the display_name
or min_hard_limit
parameters.
{ "cluster": { "name": "cluster1", }, "database":{ "ip": "10.11.12.13/14" }, "webserver": { "port": 80 } }
DATABASE_IP
must be the correct network interface address for the database node where you will perform the installation. You can confirm this by running /sbin/ip -o -f inet addr show
on the database node.
Note:
The next step requires the cluster administrator user ID and password.In the BDSSetup directory, become root
and run setup-bds
. Pass it the install
parameter and the configuration file name (bds-config.json
) as arguments. Note that the cluster management service is restarted in this process.
[root@myclusteradminserver:BDSSetup] # ./setup-bds install bds-config.json
BigDataSQL: INSTALL workflow completed.
See Also:
This is a condensed example. An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Install/Uninstall/Reconfigure Examples.Parameters in bds-config.json
The table below explains edits you need to make to bds-config.json
for CDH and HDP platforms.
Table 2-1 Configuration Parameters in bds-config.json
Configuration Parameter | Description | Applies To | Required/Optional |
---|---|---|---|
cluster:{name} |
The name of the cluster. See the description of display_name below for behavior that applies to both of these parameters. |
CDH, HDP | Required for CDH.
Optional for HDP. |
cluster:{display_name} |
The |
CDH only | Optional |
database:{ip} |
The IP address of the Oracle Database server that will make connection requests. This must be configured on one interface on the database node. The address must include the prefix length (as in 100.112.10.36/24). Although only one IP address is specified in the configuration file, it is possible to install the database-side software on multiple database servers (as in a RAC environment) by using a command line parameter to override |
CDH, HDP | Required |
memory:{min_hard_limit} |
The minimum amount of memory required for Oracle Big Data SQL. |
CDH only | Optional |
webserver:{port} |
Port for the temporary repository used for deploying tasks and gather responses from the DataNodes during installation. This can be any port that does not conflict with current cluster operation. | CDH, HDP | Required |
api:{port} |
Port for the CDH or Ambari REST API in cases where this is different from the default port. | CDH, HDP | Optional |
Operations Performed by setup-bds
The table below lists the full set of operations performed by setup-bds
.
The syntax for all options is:
# ./setup-bds <option> bds-config.json
For example:
# ./setup-bds install bds-config.json
Table 2-2 Command Line Options for setup-bds.sh
setup-bds Option | Use |
---|---|
install |
Install the Oracle Big Data SQL software on the cluster management server. |
extend |
Extend Oracle Big Data SQL to any new DataNodes and update the cells inventory if the cluster has grown since the last Oracle Big data SQL installation. |
remove |
Remove Oracle Big Data SQL components from any nodes where the DataNode service no longer exists. This must be done if a DataNode service is moved or removed. A possible scenario where this would be necessary is if a DataNode service has been moved another node for better cluster load balancing. |
reconfigure |
Modify the current installation by applying changes you have made to the configuration in Note that if you run For reconfigurations, a lightweight database bundle is provided so that the changes can be deployed relatively quickly. This bundle does not need and does not include the tarballs that are required by the initial installation. |
uninstall |
Uninstall Oracle Big Data SQL from the Hadoop cluster management server.
See Also: Uninstalling Oracle Big Data SQL. |
After installing Oracle Big Data SQL on the cluster management server, run the script BDSSetup/db/bds-database-create-bundle.sh
. This script creates the corresponding Oracle Big Data SQL installation bundle for any Oracle Database servers that will query data on the Hadoop system.
In addition to running bds-database-create-bundle
at installation time, you also need to run it when you have made configuration changes to the cluster management server that must be communicated to the database server, such as:
A change cluster configuration, such as from unsecure to secure HTTP.
Migration of the Hive Metastore from one node to another.
Change of the CM or Ambari port on the cluster management server.
Change of IP address used for installation on the database side.
External Resources Required for the Database-Side Bundle
bds-database-create-bundle
attempts to download the following external resources.
Hadoop and Hive client tarballs from Cloudera or Hortonworks repository web site.
Configuration files for Yarn and Hive from the cluster management server, via Cloudera Manager (for the CDH versions) or Ambari (for the HDP versions).
For HDP only, HDFS and MapReduce configuration files from Ambari.
If some of these resources are not accessible from the management server, you can add them manually. You can also use the command line switches described in Table 2-3 to manually turn off selected resource downloads so that these resources are not added to the bundle.
Running bds-database-create-bundle
Run this script as root
. Note that you are prompted for the cluster management service administrator credentials.
Change directories to BDSSetup/db
.
Run the BDS database bundle creation script. See Table 2-3 below for optional parameters that you can pass to the script in order to override any of the default settings.
[root@myclusteradminserver: db] # ./bds-database-create-bundle.sh <optional parameters>
This message is returned if the operation is successful:
bds-database-create-bundle: database bundle creation script completed all steps
The bds-database-create-bundle
script generates two different database bundles in the BDSSetup directory (not in the BDSSetup/db directory):
bds-database-install.zip
Deploy this bundle to the database servers for the initial installation of Oracle Big Data SQL (or if a full re-installation is required for other reasons). It contains all of the files needed to install the software, including the resources that were downloaded from Cloudera or Hortonworks (tarballs and configuration files).
bds-database-install-config.zip
Deploy this bundle to the database servers instead of bds-database-install.zip
when you change the existing Oracle Big Data SQL configuration on the cluster management server. This package makes the corresponding changes to the database-side configuration so that the two sides (Hadoop and Oracle Database) are aligned. A reconfiguration of Oracle Big Data SQL does not require the external resources needed in the full installation (such as the client tarballs). Therefore this bundle is smaller and can be deployed faster than the full installation bundle.
The database bundle file includes a number of parameters. When you run bds-database-create-bundle.sh
, you can use the switches in the table below to override any of these parameters as necessary. Any URLs specified must be accessible from the cluster management server at the time you run bds-database-create-bundle.sh
.
Table 2-3 Command Line Switches for bds-database-create-bundle.sh
Parameter | Value |
---|---|
--hadoop-client-ws |
Specifies an URL for the Hadoop client tarball download. |
--no-hadoop-client-ws |
Exclude this download. |
--hive-client-ws |
Specifies an URL for the Hive client tarball download. |
--no-hive-client-ws |
Exclude this download. |
--yarn-conf-ws |
Specifies an URL for the YARN configuration zip file download. |
--no-yarn-conf-ws |
Exclude this download. |
--hive-conf-ws |
Specifies an URL for the Hive configuration zip file download. |
--no-hive-conf-ws |
Exclude this download. |
--ignore-missing-files |
Create the bundle file even if some files are missing. |
--jdk-tar-path |
Override the default JDK path. Do not specify a relative path, use --jdk-tar-path=<jdk tarfile absolute path>. |
--clean-previous |
Deletes previous bundle files and directories from bds-database-install/ . If cluster management server the cluster settings have changed (for example, because of an extension, service node migration, or adding/removing security) then it necessary to redo the installation on the database server. As part of this re-installation, you must run --clean-previous to purge the cluster information left the database server side from the previous installation. |
--script-only |
This is useful for re-installations on the database side when there are no cluster configuration changes to communicate to the database server and where there is no need to refresh files (such as client tarballs) on the database side. With this switch, bds-database-create-bundle.sh generates a zip file that contains only the database installation script and does not bundle in other components, such as the tarballs. If these already exist on the database server, you can use --script-only to bypass the downloading and packaging of these large files. Do not include --clean-previous in this case. |
--hbase-client-ws |
This is parameter is required only if HBase is not installed in the Hadoop cluster. It specifies the URL where the HBase tarball can be downloaded from the Cloudera or Ambari website. The URL should point to the specific version of HBase that is supported by the CDH or HDP installation you are using.
If HBase is not installed, include this parameter when you run
$ bds-database-create-bundle.sh --hbase-client-ws <URL>
If HBase is not installed and you do not supply this parameter when you run |
--hdfs-conf-ws |
Specify an URL for the HDFS configuration zip file download. |
--no-hdfs-conf-ws |
Exclude this download (HDP only). |
--mapreduce-conf-ws |
Specify an URL for the MapReduce configuration zip file download (HDP only). |
--no-mapreduce-conf-ws |
Exclude this download (HDP only). |
--reconfigure |
Creates the bundles with the existing files, but force downloading new configuration. |
Manually Adding Resources if Download Sites are not Accessible to the BDS Database Bundle Creation Script
If one or more of the default download sites is inaccessible from the cluster management server, there are two ways around this problem:
Download the files from another server first and then provide bds-database-create-bundle.sh
with the alternate path as an argument. For example:
$ ./bds-database-create-bundle.sh --yarn-conf-ws='http://nodexample:1234/config/yarn'
Because the script will first search locally in /bds-database-install
for resources, you can download the files to another server, move the files into /bds-database-install
on the cluster management server and then run the bundle creation script with no additional argument. For example:
$ cp hadoop-xxxx.tar.gz bds-database-install/ $ cp hive-xxxx.tar.gz bds-database-install/ $ cp yarn-conf.zip bds-database-install/ $ cp hive-conf.zip bds-database-install/ $ cd db $ ./bds-database-create-bundle.sh
Copying the Database Bundle to the Oracle Database Server
Use scp
to copy the database bundle you created to the Oracle Database server. In the example below, dbnode
is the database server. The Linux account and target directory here are arbitrary. Use any account authorized to scp
to the specified path.
For a first-time installation of the current release of Oracle Big Data SQL, copy bds-database-install.zip
(the full installation bundle) to each database node.
$ scp bds-database-install.zip oracle@dbnode:/home/oracle
If you are updating the configuration in the existing Oracle Big Data SQL installation on the database servers, copy the smaller configuration update bundle to the database nodes (bds-database-install-config.zip
) instead of (bds-database-install.zip
).
$ scp bds-database-install-config.zip oracle@dbnode:/home/oracle
The next step is to log on to the Oracle Database server and install the bundle.