6 Securing Big Data SQL

This section describes measures you can take to secure Big Data SQL and to configure the software within secured environments.

6.1 Big Data SQL Communications and Secure Hadoop Clusters

It is generally a good security practice to ensure that HDFS file access permissions are minimized in order to prevent unauthorized write/read access. This is true regardless of whether or not the Hadoop cluster is secured by Kerberos.

Please refer to MOS Document 2123125.1 at My Oracle Support for detailed guidelines on securing Hadoop clusters for use with Oracle Big Data SQL.

6.2 Installing a Kerberos Client on the Database Server

If Kerberos is enabled on the Hadoop system, you must configure Oracle Big Data SQL on the database server to work with Kerberos. This requires a Kerberos client on each node of the database.

For commodity servers, download the Kerberos client software from a repository of your choice. If the database server is an Oracle Exadata Database Machine, download and install the software from the Oracle repository as shown below. The process should be similar for downloads from non-Oracle repositories.

Log on to the database server as root and use yum to install the krb5-libs and krb5-workstation packages. Download from the Oracle Linux 6 or Oracle Linux 5 repository as appropriate.

Check that the Oracle public-yum-ol6 or public-yum-ol5 repository ID is installed.
```
# yum repolist
```
Temporarily disable all repository IDs and then enable the Oracle repository only ( Oracle Linux 6 in this example).
```
# yum --disablerepo="*" --enablerepo="public-yum-ol6" list available 
```

Install the Kerberos packages.

# yum install krb5-libs krb5-workstation

Copy the /etc/krb5.conf file from the Key Distribution Center (KDC) to the same path on the database server.

These steps must be performed for each Oracle Database node.

You must also register the oracle Linux user (or other Linux user) and password in the KDC for the cluster as described in Enabling Oracle Big Data SQL Access to a Kerberized Cluster

6.3 Enabling Oracle Big Data SQL Access to a Kerberized Cluster

You must configure Oracle Big Data SQL to use Kerberos in environments where user access is Kerberos-controlled.

There are two situations when this is required:

When enabling Oracle Big Data SQL on a Kerberos-enabled cluster.
When enabling Kerberos on a cluster where Oracle Big Data SQL is already installed.

Oracle Big Data SQL processes run on the nodes of the Hadoop cluster as the oracle Linux user. On the Oracle Database server, the owner of the Oracle Database process is also (usually) the oracle Linux user. When Kerberos is enabled on the Hadoop system, the following is required in order to give the user access to HDFS.

The oracle Linux user needs to be able to authenticate as a principal in the Kerberos database on the Kerberos Key Distribution Center (KDC) server. The principal name in Kerberos does not have to be 'oracle'. However, the principal must have access to the underlying Hadoop data being requested by Oracle Big Data SQL.
The following are required on all Oracle Database nodes and all Hadoop cluster nodes running Oracle Big Data SQL:
- Kerberos client software installed.
- A copy of the Kerberos configuration file from the KDC.
- A copy of the Kerberos keytab file generated on the KDC for the oracle user.
- A valid Kerberos ticket for the oracle Linux user.

Installing the Kerberos Client

If the Kerberos client is not installed, see Installing a Kerberos Client on the Database Server for instructions on installing the Kerberos client.

Creating a Kerberos Principal for the oracle User

On the Kerberos Key Distribution Center (KDC) server, become root and use kadmin.local to add a principal for the oracle user.

# kadmin.local

Within kadmin.local, type:

add_principal <user>@<realm>
quit

You have the option to include the password, as in:

add_principal <user>@<realm> -pw <password> 
quit

Creating a Kerberos Keytab for the oracle User

On the KDC, become root and run the following:
```
# kadmin.local
```
Within kadmin.local, type:
```
xst –norandkey -k /home/oracle/oracle.keytab oracle
quit
```
This creates the oracle.keytab file for the Kerberos oracle user in the /home/oracle directory.
Ensure that oracle.keytab is owned by the oracle Linux user and is readable by that user only.
```
$ chown oracle oracle.keytab 
$ chmod 400 oracle.keytab
```

Distributing the Keytab and Kerberos Configuration Files

Log on to the KDC and copy these local files to the same path on each Hadoop cluster node and all Oracle Database compute nodes that use Oracle Big Data SQL.

Become the oracle user and copy /home/oracle/oracle.keytab to /home/oracle/oracle.keytab on each node.
Become root and copy the Kerberos configuration file /etc/krb5.conf to /etc/krb5.conf on each node.

Be sure you have retained the original permissions on the files.

Acquiring a Kerberos Ticket for oracle on Each Node

The oracle user on each Hadoop DataNode and the database owner on each Oracle Database compute node (usually the oracle Linux user as well) need a valid ticket to connect through Kerberos.

After the Kerberos client is installed and the krb5.conf and oracle.keytab files are in place, log onto each node as the oracle user and obtain a ticket for the user.

Log on as the oracle user.
Run kinit on the oracle account.
```
 $ /usr/bin/kinit oracle -k -t /home/oracle/oracle.keytab
```
A password is not required if a keytab for the user is defined as shown in this example. You can also use /usr/bin/kinit oracle without passing in a keytab file, but in that case you are prompted for the password defined for the principal.

Checking for a Valid Ticket

After you acquire a ticket for each node (or if you are unsure about the validity of a ticket), you can check for a valid ticket using klist.

$ /usr/bin/klist

Kerberos Utilities for Linux: Version 12.1.0.2.0 - Production on 19-SEP-2016 07:45:20
Copyright (c) 1996, 2014 Oracle. All rights reserved.
Ticket cache: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/krbcache
Default principal: <userID>@<realm>

If there is no valid ticket you will see a message similar to this:

klist: No credentials cache found

Cleaning up After Ticket Expirations

When the bd_cell process is running on the nodes of a secured Hadoop cluster but the Kerberos ticket is not valid, then the cell goes to quarantine status. You should drop all such quarantines.

Check that the oracle user has a valid Kerberos ticket on all Hadoop cluster nodes.
On each cluster node, become oracle and run the following:
```
$ /opt/oracle/bd_cell/cellsrv/bin/bdscli
```
In the bdscli shell, type:
```
list quarantine
```
While still in bdscli, drop each quarantine on the list:
```
drop quarantine <id>
```
Type exit to exit bdscli.

Automating Kerberos Ticket Renewal to Avoid Expirations

The oracle user needs a valid Kerberos ticket on every Oracle Database instance that is accessing the Hadoop cluster. A valid ticket is also required for the oracle user on the Hadoop nodes, since this user owns each Big Data SQL process.

It is best to automate the ticket renewal. Use cron or a similar utility to run kinit to acquire a new ticket for the user on a schedule that is prior to the ticket expiration date. (The ticket lifetime is recorded in /etc/krb5.conf.)

For example, on all nodes you could add a job to the oracle user’s crontab as follows.

Log on as the user (oracle or other).
Use crontab -e to edit the user’s crontab.
```
$ crontab -e
```
Add a line with a schedule and the command to renew the ticket before the ticket expiry. For example, if the ticket expires every two weeks then you may want to renew it every 13 days, as in this example.
```
15 1,13 * * * /usr/bin/kinit oracle -k -t /home/oracle/oracle.keytab
```

6.4 Using Oracle Secure External Password Store to Manage Database access for Oracle Big Data SQL

On the Oracle Database server, you can use the Oracle Secure External Password Store (SEPS) to manage database access credentials for Oracle Big Data SQL.

This is done by creating an Oracle wallet for the oracle Linux user (or other database owner). An Oracle wallet is a password-protected container used to store authentication and signing credentials, including private keys, certificates, and trusted certificates needed by SSL.

See MOS Document 2126903.1 at My Oracle Support for information on using SEPS with Oracle Big Data SQL.

6.5 About Data Security on Oracle Big Data Appliance

If your Hadoop system is an Oracle Big Data Appliance, the following tools to strengthen security are already available.

Kerberos authentication: Requires users and client software to provide credentials before accessing the cluster.
Apache Sentry authorization: Provides fine-grained, role-based authorization to data and metadata.
HDFS Transparent Encryption: Protects the data on disk and at rest. Data encryption and decryption is transparent to applications using the data.
HTTPS/ Network Encryption
: Provides HTTPS for Cloudera Manager, Hue, Oozie, and Hadoop Web UIs. Also Enables network encryption for other internal Hadoop data transfers, such as those made through YARN shuffle and RPC.
Oracle Audit Vault and Database Firewall monitoring: The Audit Vault plug-in on Oracle Big Data Appliance collects audit and logging data from MapReduce, HDFS, and Oozie services. You can then use Audit Vault Server to monitor these services on Oracle Big Data Appliance