2 Security for Oracle Big Data Appliance

Oracle Big Data Appliance development focuses on delivering an engineered system that is highly secure. This spans all aspects of the product: strong authentication (Kerberos), authorization, network encryption, encryption for data at rest, auditing and lineage/impact analysis.

2.1 Overview

You can take the precautions described in this section to thwart unauthorized use of the software and data on Oracle Big Data Appliance:

See Also:

Oracle Big Data Appliance development abides by Oracle's comprehensive OSSA (Oracle Software Security Assurance) standards.

https://www.oracle.com/corporate/security-practices/assurance/

2.2 About Predefined Users and Groups

Every open-source package installed on Oracle Big Data Appliance creates one or more users and groups. Most of these users do not have login privileges, shells, or home directories. They are used by daemons and are not intended as an interface for individual users. For example, Hadoop operates as the hdfs user, MapReduce operates as mapred, and Hive operates as hive.

You can use the oracle identity to run Hadoop and Hive jobs immediately after the Oracle Big Data Appliance software is installed. This user account has login privileges, a shell, and a home directory.

Oracle NoSQL Database and Oracle Data Integrator run as the oracle user. Its primary group is oinstall.

Note:

Do not delete, re-create, or modify the users that are created during installation, because they are required for the software to operate.

The following table identifies the operating system users and groups that are created automatically during installation of Oracle Big Data Appliance software for use by CDH components and other software packages.

Table 2-1 Operating System Users and Groups

User Name Group Used By Login Rights

flume

flume

Apache Flume parent and nodes

No

hbase

hbase

Apache HBase processes

No

hdfs

hadoop

NameNode, DataNode

No

hive

hive

Hive metastore and server processes

No

hue

hue

Hue processes

No

mapred

hadoop

ResourceManager, NodeManager, Hive Thrift daemon

Yes

mysql

mysql

MySQL server

Yes

oozie

oozie

Oozie server

No

oracle

dba, oinstall

Oracle NoSQL Database, Oracle Loader for Hadoop, Oracle Data Integrator, and the Oracle DBA

Yes

puppet

puppet

Puppet parent (puppet nodes run as root)

No

sqoop

sqoop

Apache Sqoop metastore

No

svctag

Auto Service Request

No

zookeeper

zookeeper

ZooKeeper processes

No

2.3 About HDFS Transparent Encryption

HDFS Transparent Encryption protects Hadoop data that is at rest on disk. After HDFS Transparent Encryption is enabled for a cluster on Oracle Big Data Appliance, data writes and reads to encrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it is invisible to the application working with the data.

HDFS Transparent Encryption does not affect user access to Hadoop data, although it can have a minor impact on performance.

HDFS Transparent Encryption is an option that you can select during the initial installation of the software by the Mammoth utility. You can also enable or disable HDFS Transparent Encryption at any time by using the bdacli utility. Note that HDFS Transparent Encryption can be installed only on a Kerberos-secured cluster.

Oracle recommends that you set up the Navigator Key Trustee (the service that manages keys and certificates) on a separate server, external to the Oracle Big Data Appliance.

See the following MOS documents at My Oracle Support for instructions on installing and enabling HDFS Transparent Encryption.

Title MOS Doc ID
How to Setup Highly Available Active and Passive Key Trustee Servers on BDA V4.4 Using 5.5 Parcels 2112644.1

Installing using parcels as described in this MOS document is recommended over package-based installation. See Cloudera’s comments on Parcels.

How to Enable/Disable HDFS Transparent Encryption on Oracle Big Data Appliance V4.4 with bdacli 2111343.1
How to Create Encryption Zones on HDFS on Oracle Big Data Appliance V4.4 2111829.1

Note:

If either HDFS Transparent Encryption or Kerberos is disabled, data stored in the HDFS Transparent Encryption zones in the cluster will remain encrypted and therefore inaccessible. To restore access to the data, re-enable HDFS Transparent Encryption using the same key provider.

See Also:

Cloudera documentation about HDFS at-rest encryption at http://www.cloudera.com for more information about managing files in encrypted zones.

2.4 About User Authentication

Oracle Big Data Appliance supports Kerberos security as a software installation option. See Supporting User Access to Oracle Big Data Appliance for details about setting up clients and users to access a Kerberos-protected cluster.

2.5 About Fine-Grained Authorization

The typical authorization model on Hadoop is at the HDFS file level, such that users either have access to all of the data in the file or none. In contrast, Apache Sentry integrates with the Hive and Impala SQL-query engines to provide fine-grained authorization to data and metadata stored in Hadoop.

Oracle Big Data Appliance automatically configures Sentry during software installation, beginning with Mammoth utility version 2.5.

2.6 Port Numbers Used on Oracle Big Data Appliance

The following table identifies the port numbers that might be used in addition to those used by CDH.

To view the ports used on a particular server:

  1. In Cloudera Manager, click the Hosts tab at the top of the page to display the Hosts page.

  2. In the Name column, click a server link to see its detail page.

  3. Scroll down to the Ports section.

See Also:

For the full list of CDH port numbers, go to the Cloudera website at

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cm_ig_ports.html

Table 2-2 Oracle Big Data Appliance Port Numbers

Service Port

Automated Service Monitor (ASM)

30920

MySQL Database

3306

Oracle Data Integrator Agent

20910

Oracle NoSQL Database administration

5001

Oracle NoSQL Database processes

5010 to 5020

Oracle NoSQL Database registration

5000

Port map

111

Puppet master service

8140

Puppet node service

8139

rpc.statd

668

ssh

22

xinetd (service tag)

6481

2.7 About Puppet Security

The puppet node service (puppetd) runs continuously as root on all servers. It listens on port 8139 for "kick" requests, which trigger it to request updates from the puppet master. It does not receive updates on this port.

The puppet master service (puppetmasterd) runs continuously as the puppet user on the first server of the primary Oracle Big Data Appliance rack. It listens on port 8140 for requests to push updates to puppet nodes.

The puppet nodes generate and send certificates to the puppet master to register initially during installation of the software. For updates to the software, the puppet master signals ("kicks") the puppet nodes, which then request all configuration changes from the puppet master node that they are registered with.

The puppet master sends updates only to puppet nodes that have known, valid certificates. Puppet nodes only accept updates from the puppet master host name they initially registered with. Because Oracle Big Data Appliance uses an internal network for communication within the rack, the puppet master host name resolves using /etc/hosts to an internal, private IP address.

2.8 About HTTPS/Network Encryption

HTTPS Network/Encryption on the Big Data Appliance has two components :

  • Web Interface Encryption

    Configures HTTPS for the following web interfaces: Cloudera Manager, Oozie, and HUE. This encryption is now enabled automatically in new Mammoth installations. For current installations it can be enabled via the bdacli utility. This feature does not require that Kerberos is enabled.

  • Encryption for Data in Transit and Services

    There are two subcomponents to this feature. Both are options that can be enabled in the Configuration Utility at installation time or enabled/disabled using the bdacli utility at any time. Both require that Kerberos is enabled.
    • Encrypt Hadoop Services

      This includes SSL encryption for HDFS, MapReduce, and YARN web interfaces, as well as encrypted shuffle for MapReduce and YARN. It also enable authentication for access to the web consoles for the MapReduce, and YARN roles.

    • Encrypt HDFS Data Transport

      This option will enable encryption of data transferred between DataNodes and clients, and among DataNodes.

HTTPS/Network Encryption is enabled and disabled on a per cluster basis. The Configuration Utility described in the Oracle Big Data Appliance Owner’s Guide, includes settings for enabling encryption for Hadoop Services and HDFS Data Transport when a cluster is created. The bdacli utility reference pages (also in the Oracle Big Data Appliance Owner’s Guide ) provide HTTPS/Network Encryption command line options.

See Also:

Supporting User Access to Oracle Big Data Appliance for an overview of how Kerberos is used to secure CDH clusters.

About HDFS Transparent Encryption for information about Oracle Big Data Appliance security for Hadoop data at-rest.

Cloudera documentation at http://www.cloudera.com for more information about HTTPS communication in Cloudera Manager and network-level encryption in CDH.

2.8.1 Configuring Web Browsers to use Kerberos Authentication

If web interface encryption is enabled, each web browser accessing an HDFS, MapReduce, or YARN-encrypted web interface must be configured to authenticate with Kerberos. Note that this is not necessary for the Cloudera Manager, Oozie, and Hue web interfaces, which do not require Kerberos.

The following are the steps to configure Mozilla FirefoxFoot 1, Microsoft Internet ExplorerFoot 2, and Google ChromeFoot 3 for Kerberos authentication.

To configure Mozilla Firefox:

  1. Enter about:config in the Location Bar.

  2. In the Search box on the about:config page, enter: network.negotiate-auth.trusted-uris

  3. Under Preference Name, double-click the network.negotiate-auth.trusted-uris .

  4. In the Enter string value dialog, enter the hostname or the domain name of the web server that is protected by Kerberos. Separate multiple domains and hostnames with a comma.

To configure Microsoft Internet Explorer:

  1. Configure the Local Intranet Domain:

    1. Open Microsoft Internet Explorer and click the Settings "gear" icon in the top-right corner. Select Internet options.

    2. Select the Security tab.

    3. Select the Local intranet zone and click Sites.

    4. Make sure that the first two options, Include all local (intranet) sites not listed in other zones and Include all sites that bypass the proxy server are checked.

    5. Click Advanced on the Local intranet dialog box and, one at a time, add the names of the Kerberos-protected domains to the list of websites.

    6. Click Close.

    7. Click OK to save your configuration changes, then click OK again to exit the Internet Options panel.

  2. Configure Intranet Authentication for Microsoft Internet Explorer:

    1. Click the Settings "gear" icon in the top-right corner. Select Internet Options.

    2. Select the Security tab.

    3. Select the Local Intranet zone and click the Custom level... button to open the Security Settings - Local Intranet Zone dialog box.

    4. Scroll down to the User Authentication options and select Automatic logon only in Intranet zone .

    5. Click OK to save your changes.

To configure Google Chrome:

If you are using Microsoft Windows, use the Control Panel to navigate to the Internet Options dialogue box. Configuration changes required are the same as those described above for Microsoft Internet Explorer.

OnFoot 4 or on Linux, add the --auth-server-whitelist parameter to the google-chrome command. For example, to run Chrome from a Linux prompt, run the google-chrome command as follows

 google-chrome --auth-server-whitelist = "hostname/domain"

Note:

On Microsoft Windows, the Windows user must be an user in the Kerberos realm and must possess a valid ticket. If these requirements are not met, an HTTP 403 is returned to the browser upon attempt to access a Kerberos-secured web interface.

2.9 Additional Guidance for Securing Clusters

Use the following resources to learn how to further strengthen cluster security.

Oracle Blogs



Footnote Legend

Footnote 1: Mozilla Firefox is a registered trademark of the Mozilla Foundation.
Footnote 2: Microsoft Internet Explorer is a registered trademark of Microsoft Corporation.
Footnote 3: Google Chrome is a registered trademark of Google Inc
Footnote 4: Mac OS is a registered trademark of Apple, Inc.