Securing Oracle Big Data SQL

7.1 Security Overview

This section covers installer, network, file, and password security.

Network Security

In Oracle Big Data SQL, network traffic between the database and the Hadoop cluster, is no longer guaranteed to be over a private InfiniBand network, but can occur over a client network. This network traffic is not currently secured. Therefore when operating a secured Hadoop cluster (e.g., Kerberos-enabled, RPC encryption), Oracle Big Data SQL requires the following:

All members of the client network are trusted, or
Private network connectivity is used exclusively for communication between the Hadoop nodes and Oracle database instances.

This private network is commonly referred to as the Big Data SQL interconnect network. The interconnect network must be a private network with only trusted users, use at least one switch, and 10 Gigabit Ethernet adapters. Ensure that only the nodes in the Hadoop cluster and Oracle RAC cluster can access the interconnect network. Do not use the interconnect network for other user communication.

Installer File Security

The new Jaguar installer incorporates the following best practices for secure Linux installers and applications:

No persistent or temporary world-writable files are created.
No setuid or setgid files are used.

In addition, the installer works with hardened Oracle Database environments as well as hardened CDH and HDP clusters as described in the Cloudera CDH and Hortonworks HDP security documentation

Password Safety

The Jaguar installer provides these password security measures:

Passwords for the Ambari and Cloudera Manager cluster management servers are not be passed in on the command line and are not saved in any persistent file during the installation or after the installation is complete.
Passwords are not logged to any log or trace files.

Security of Related Software

Oracle Big Data SQL relies on other installed software, including third party projects. It is the customer’s responsibility to ensure that such software is kept up-to-date with the latest security patches. This software includes (but is not limited to):

Python
Perl

7.2 Big Data SQL Communications and Secure Hadoop Clusters

It is generally a good security practice to ensure that HDFS file access permissions are minimized in order to prevent unauthorized write/read access. This is true regardless of whether or not the Hadoop cluster is secured by Kerberos.

Please refer to MOS Document 2123125.1 at My Oracle Support for detailed guidelines on securing Hadoop clusters for use with Oracle Big Data SQL.

7.2.1 Connecting a Database to Both Secure and Unsecured Clusters

Oracle Big Data SQL can connect a single database to multiple Kerberos secured or unsecured clusters at the same time.

There is limitation on the use of Hive metadata queries where a database has mixed connections to both secured and unsecured clusters. If the default (the first available connection) is secured, then Hive metadata on unsecured clusters is inaccessible. The reverse is also true – if the default cluster is unsecured, then Hive metadata on secured clusters is inaccessible. This is because the database can only support one Hadoop authentication method.

Suppose the database has these connections: cluster_1 (the default cluster, secured), cluster_2 (unsecured), cluster_3 (secured). In this case, a query such as the following succeeds against cluster_1 since is the default. It also succeeds against cluster_3. However it fails against cluster_2, which is not secured.

select cluster_id, database_name, table_name from all_hive_tables;

If cluster_1 (the default) were not secured, then instead the query would succeed against cluster_2 and fail against cluster_3.

If you already know the metadata (which you can acquire through other tools), then you can successfully create external tables over Hive sources. For example, the following create table command will work against both secured and unsecured clusters, regardless of the status of the default cluster.

  create table myhivetable (  
      id number(10),  
      name varchar2(4000),  
      type varchar2(4000))  
    organization external (
      type ORACLE_HIVE
      default directory default_dir
      access parameters (
        com.oracle.bigdata.cluster=<cluster name>
        com.oracle.bigdata.tablename=<Hive database name>.<Hive table name>)
    ) 
    parallel 2 reject limit unlimited;

7.3 Configuring Oracle Big Data SQL in a Kerberos-Secured Environment

If Kerberos is enabled on the Hadoop system, you must configure Oracle Big Data SQL on the database server to work with Kerberos. This requires a Kerberos client on each database node where Oracle Big Data SQL is installed. Also, the OS account that owns the database (oracle or another account ) must be provisioned as a user principal.

When operating a secured Hadoop cluster (e.g., Kerberos enabled, RPC encryption), Oracle Big Data SQL requires either that all members of the client network be trusted, or that private network connectivity is used exclusively for communication between the Hadoop nodes and Oracle database instances. This private network is commonly referred to as the Big Data SQL interconnect network. The interconnect network must be a private network with only trusted users, use at least one switch, and 10 Gigabit Ethernet adapters. Ensure that only the nodes in the Hadoop cluster and Oracle RAC cluster can access the interconnect network. Do not use the interconnect network for other user communication.

7.3.1 Enabling Oracle Big Data SQL Access to a Kerberized Cluster

You must configure Oracle Big Data SQL to use Kerberos in environments where user access is Kerberos-controlled.

There are two situations when this is required:

When enabling Oracle Big Data SQL on a Kerberos-enabled cluster.
When enabling Kerberos on a cluster where Oracle Big Data SQL is already installed.

Oracle Big Data SQL processes run on the nodes of the Hadoop cluster as the oracle Linux user. On the Oracle Database server, the owner of the Oracle Database process is also (usually) the oracle Linux user. When Kerberos is enabled on the Hadoop system, the following is required in order to give the user access to HDFS.

The oracle Linux user needs to be able to authenticate as a principal in the Kerberos database on the Kerberos Key Distribution Center (KDC) server. The principal name in Kerberos does not have to be 'oracle'. However, the principal must have access to the underlying Hadoop data being requested by Oracle Big Data SQL.
The following are required on all Oracle Database nodes and all Hadoop cluster nodes running Oracle Big Data SQL:
- Kerberos client software installed.
- A copy of the Kerberos configuration file from the KDC.
- A copy of the Kerberos keytab file generated on the KDC for the oracle user.
- A valid Kerberos ticket for the oracle Linux user.

Installing the Kerberos Client

If the Kerberos client is not installed, see Installing a Kerberos Client on the Oracle Database Nodes for instructions on installing the Kerberos client.

Creating a Kerberos Principal for the oracle User

On the Kerberos Key Distribution Center (KDC) server, become root and use kadmin.local to add a principal for the oracle user.

# kadmin.local

Within kadmin.local, type:

add_principal <user>@<realm>
quit

You have the option to include the password, as in:

add_principal <user>@<realm> -pw <password> 
quit

Creating a Kerberos Keytab for the oracle User

On the KDC, become root and run the following:
```
# kadmin.local
```
Within kadmin.local, type:
```
xst –norandkey -k /home/oracle/oracle.keytab oracle
quit
```
This creates the oracle.keytab file for the Kerberos oracle user in the /home/oracle directory.

Note:
If you run the xst(ktadd) command without the -norandkey option , this will invalidate other keytabs previously created for this user.
Ensure that oracle.keytab is owned by the oracle Linux user and is readable by that user only.
```
$ chown oracle oracle.keytab 
$ chmod 400 oracle.keytab
```

Kerberos Tasks Automated by Oracle Big Data SQL

The following Kerberos tasks are now automated:

Distributing Keytab and Kerberos configuration file distrbution files.

The Oracle Big Data SQL installation can now be configured to automatically distribute the keytab and Kerberos configuration files for the oracle user or other database owner to the Hadoop DataNodes (and Oracle Database compute nodes). This is done if the principal name and keytab file location parameters are set in the Jaguar configuration file. This automation is performed on both the Hadoop and Oracle Database side.

On Oracle Big Data Appliance, the keytab file distribution is done by default for the oracle account and you do not need to add the principal and keytab file path for this account to the configuration file.
Acquiring a Kerberos Ticket for designated principals.

For oracle and other principals that were listed in the Jaguar configuration file, the installation acquires a Kerberos ticket on each Hadoop DataNode and Oracle DB compute node
Ticket renewal

The installation automatically sets up cron jobs in the Hadoop cluster and on Oracle Database to kinit for new ticket for each principal in the configuration four times daily.

Cleaning up After Ticket Expirations

When the bd_cell process is running on the nodes of a secured Hadoop cluster but the Kerberos ticket is not valid, then the cell goes to quarantine status. You should drop all such quarantines.

Check that the oracle user has a valid Kerberos ticket on all Hadoop cluster nodes.
On each cluster node, become oracle and run the following:
```
$ /opt/oracle/bd_cell/cellsrv/bin/bdscli
```
In the bdscli shell, type:
```
list quarantine
```
While still in bdscli, drop each quarantine on the list:
```
drop quarantine <id>
```
Type exit to exit bdscli.

7.3.2 Installing a Kerberos Client on the Oracle Database Nodes

If the Oracle Database system is Kerberos secured, then Oracle Big Data SQL requires a Kerberos client. The client must be installed on each compute node of the database.

For commodity servers, download the Kerberos client software from a repository of your choice. If the database server is an Oracle Exadata Database Machine, download and install the software from the Oracle repository as shown below. The process should be similar for downloads from non-Oracle repositories.

Log on to the database server as root and use yum to install the krb5-libs and krb5-workstation packages. Download from the Oracle Linux 6 or Oracle Linux 5 repository as appropriate.

Check that the Oracle public-yum-ol6 or public-yum-ol5 repository ID is installed.
```
# yum repolist
```
Temporarily disable all repository IDs and then enable the Oracle repository only ( Oracle Linux 6 in this example).
```
# yum --disablerepo="*" --enablerepo="public-yum-ol6" list available 
```

Install the Kerberos packages.

# yum install krb5-libs krb5-workstation

Copy the /etc/krb5.conf file from the Key Distribution Center (KDC) to the same path on the database server.

These steps must be performed for each Oracle Database node.

You must also register the oracle Linux user (or other Linux user) and password in the KDC for the cluster as described in Enabling Oracle Big Data SQL Access to a Kerberized Cluster

7.4 Using Oracle Secure External Password Store to Manage Database access for Oracle Big Data SQL

On the Oracle Database server, you can use the Oracle Secure External Password Store (SEPS) to manage database access credentials for Oracle Big Data SQL.

This is done by creating an Oracle wallet for the oracle Linux user (or other database owner). An Oracle wallet is a password-protected container used to store authentication and signing credentials, including private keys, certificates, and trusted certificates needed by SSL.

See MOS Document 2126903.1 at My Oracle Support for information on using SEPS with Oracle Big Data SQL.

7.6 Authentication Between Oracle Database and Oracle Big Data SQL Offload Cell Server Processes

The Database Authentication feature described in this guide prevents unauthorized and potentially malicious processes (which can originate from anywhere) from connecting to Oracle Big Data SQL cell server processes in the DataNodes of the Hadoop cluster.

When Ethernet is selected for the connection between Oracle Databases and Oracle Big Data SQL, then by default this secured authentication framework is set up automatically during the installation. Database Authentication is also available as a configuration option for InfiniBand connections.

7.7 The Multi-User Authorization Model

Multi-User Authorization gives you the ability to use Hadoop Secure Impersonation to direct the oracle account to execute tasks on behalf of other designated users.

This enables HDFS data access based on the user that is currently executing the query, rather than the singular oracle user.

Administrators set up the rules for identifying the query user (the currently connected user) and for mapping this user to the user that is impersonated. Because there are numerous ways in which users can connect to Oracle Database, this user may be a database user, a user sourced from LDAP, from Kerberos, or other sources. Authorization rules on the files apply to the query user and audits will identify the user as the query user.

See Also:

The DBMS_BDSQL PL/SQL Package in the Oracle Big Data SQL User’s Guide describes the Multi-User Authorization security table and the procedures for adding user maps to the table and removing them from the table.

7.8 Sentry Authorization in Oracle Big Data SQL

In addition to supporting authorization for HDFS file access, Oracle Big Data SQL supports Sentry policies,which authorize access to Hive metadata. Sentry enables fine-grained control over user access, down to the column level.

Note:

Sentry is not supported for Big Data SQL installations on Oracle Database 12.1 systems.

7.8.1 Sentry and Multi-User Authorization

You can use Oracle Big Data SQL's Multi-User Authorization system to enhance Sentry security.

Oracle Big Data SQL utilizes Sentry policies whenever Sentry is enabled on the Hadoop cluster. Support for Sentry in Oracle Big Data SQL is most effective when used in conjunction with the Multi-User Authorization system. Multi-User Authorization enables Sentry authorization based on the actual identity of the currently connected user.

If Mulit-User Authorization is not enabled, then the oracle proxy user is used for authorization for all queries.

See Also:

The Multi-User Authorization Model, the previous section in this chapter.
DBMS_BDSQL PL/SQL Package in the Oracle Big Data SQL User's Guide describes SYS.BDSQL_USER_MAP and the procedures for managing the table.
Jaguar Configuration Parameter and Command Reference, in this guide, shows how to use Jaguar parameters to configure security features in Oracle Big Data SQL. The impersonation_enabled parameter enables or disables multi-user authorization (which is based on Apache's Hadoop Secure Impersonation).

7.8.2 Groups, Users, and Role-Based Access Control in Sentry

Oracle Big Data SQL does not directly control access to Hive tables. It respects the access controls defined by the Hive database administrator. For Sentry, these controls are role-based. A given user's access rights are defined by their group memberships and the roles assigned to those groups.

The administrator uses tools such as the HiverServer2 Beeline utility to assign privileges (such as SELECT and INSERT) to Hive tables and their columns. The administrator also creates Sentry roles, assigns roles to different user groups, and grants privileges to those roles. The Hadoop user operating in the Hadoop environment inherits all of the privileges from each role that is assigned to their group or groups.

For example, to provide access to salary data, an administrator may create a role for this purpose and then grant SELECT privileges to the role:

CREATE ROLE auditfixedcosts;
GRANT SELECT ON TABLE salary TO ROLE auditfixedcosts;

Grants may also be given to the role auditfixedcosts for access to data on other servers or in other databases, tables, or columns.

The administrator assigns the auditfixedcosts role to a group:

GRANT ROLE fixedcosts TO GROUP finance;

Members of the finance group then have SELECT access to all data in the salary table, any other SELECT access granted to the auditfixedcosts role, as well as access inherited from any other roles granted to the finance group.

See Also:

Hive SQL Syntax for Use with Sentry on Cloudera's web site provides information on how to configure Sentry permissions.
See the Apache Foundation documentation at https://sentry.apache.org for more details.

7.8.3 How Oracle Big Data SQL Uses Sentry

In Oracle Big Data SQL, Sentry provides a way to grant or withhold the SELECT privilege for individual Oracle Database users who attempt to query Oracle external tables over Hive tables.

As a result, Sentry policies do not need to be replicated to the Oracle external tables (i.e. using GRANT), which simplifies administration.

Sentry can be used to control the SELECT privilege at these levels of scope:

Server (cluster node)
Hive database
Specific columns within a Hive table

Oracle Big Data SQL does not support DML operations. Therefore, only the SELECT privilege applies to Oracle Big Data SQL queries. The Sentry privileges ALL, OWNER, CREATE, INSERT, and REFRESH are not relevant to Oracle Big Data SQL access.

How Oracle Big Data SQL Uses Sentry to Determine Access at Query Execution

When a user executes a query against an Oracle external tale over a Hive table, Oracle Big Data SQL checks for role-based Sentry privileges granted to the Hadoop user and applies them against the Oracle external table that is created over the current Hive table. It then compares these with the privileges required to execute the query. If the privileges granted to the user do not fulfill the requirements of the query, then an exception is raised.

7.8.4 Oracle Big Data SQL Privilege-Related Exceptions for Sentry

Oracle Big Data SQL raises one of two exceptions when a user's Sentry privileges are not sufficient to execute a query.

The exceptions and the errors returned by as follows.

Table-level privilege exception:

"User <the user> does not have SELECT privileges on oracle table <external table name> 
for Server=<server name>->Db=<Hive db name>->Table=<table name>”

Column-level privilege exception:

"User <the user>  does not have SELECT privileges on oracle table <external table name> 
column <column name> for Server=<server name>->Db=<Hive db name>->Table=<table name>->Field=<field name>”

7.8.5 Sentry Usage Examples

CREATE ROLE marketing_role;
CREATE ROLE hr_role;
CREATE ROLE developer_role;
CREATE ROLE dba_role;

show roles;

# Assign roles to groups.  Here, the groups come from either LDAP or Linux groups:
GRANT ROLE developer_role, hr_role, marketing_role TO GROUP developer;
GRANT ROLE hr_role TO GROUP hr;
GRANT ROLE marketing_role TO GROUP marketing;
GRANT ROLE dba_role TO GROUP dba;

# What roles does the developer have?
SHOW ROLE GRANT GROUP developer;

# Make the dba_role (user dataadmin) the "data boss":
GRANT ALL ON SERVER server1 TO ROLE dba_role;

# For the different Hive databases, assign power users and groups:
GRANT ALL ON DATABASE marketing_db TO ROLE marketing_role WITH GRANT OPTION;
GRANT ALL ON DATABASE hr_db TO ROLE hr_role WITH GRANT OPTION;


# Load data into a Hive table. First check that you have rights to access the file
GRANT ALL ON URI 'file:///tmp/salaries.txt' TO ROLE hr_role;
LOAD DATA LOCAL INPATH 'file:///tmp/salaries.txt'  INTO TABLE salaries;  

# Try some queries
# In Hue, as marketinguser1, try to select from salaries:
select * from hr_db.salaries  … fails.  No rights to the hr_db

# Allow the marketing user to select everything except salaries from the table:
GRANT SELECT(emp, mgr) ON TABLE salaries TO ROLE marketing_role;  

# As marketinguser1 (This fails because salaries is not found.)
SELECT * FROM hr_db.salaries;  … fails.  Can’t see salaries

# As marketinguser1  (This one succeeds.)
SELECT emp, mgr FROM hr_db.salaries;

7.9 Securing Object Store Access

Oracle Big Data SQL enables object store access from Oracle Database and also from the Oracle Big Data SQL offload server, which runs on Hadoop nodes.

The appropriate proxy setting for the Oracle Big Data offload server is required for access to object stores. These is set by including either the cell_http_proxy or cell_no_proxy parameters in the Jaguar configuration file.

Note:

Outbound HTTP connections from Hadoop nodes to object stores should be safeguarded. It is recommended that you use a firewall to whitelist target object store domain names.

The credential information required for object store access is encrypted before it is sent from Oracle Database to the Oracle Big Data SQL offload server.

See Also:

The Jaguar Configuration Parameter and Command Reference shows how to apply the proxy parameters for object store access.
Granting User Access to Oracle Big Data SQL describes serveral permissions that must be granted on the Oracle Database side before object store access can be authorized.

7 Securing Oracle Big Data SQL

7.1 Security Overview

7.2 Big Data SQL Communications and Secure Hadoop Clusters

7.2.1 Connecting a Database to Both Secure and Unsecured Clusters

7.3 Configuring Oracle Big Data SQL in a Kerberos-Secured Environment

7.3.1 Enabling Oracle Big Data SQL Access to a Kerberized Cluster

7.3.2 Installing a Kerberos Client on the Oracle Database Nodes

7.4 Using Oracle Secure External Password Store to Manage Database access for Oracle Big Data SQL

7.5 About Data Security on Oracle Big Data Appliance

7.6 Authentication Between Oracle Database and Oracle Big Data SQL Offload Cell Server Processes

7.7 The Multi-User Authorization Model

7.8 Sentry Authorization in Oracle Big Data SQL

7.8.1 Sentry and Multi-User Authorization

7.8.2 Groups, Users, and Role-Based Access Control in Sentry

7.8.3 How Oracle Big Data SQL Uses Sentry

7.8.4 Oracle Big Data SQL Privilege-Related Exceptions for Sentry

7.8.5 Sentry Usage Examples

7.9 Securing Object Store Access