2 Security for Oracle Big Data Appliance
Oracle Big Data Appliance development focuses on delivering an engineered system that is highly secure. This spans all aspects of the product: strong authentication (Kerberos), authorization, network encryption, encryption for data at rest, auditing and lineage/impact analysis.
2.1 Overview
You can take the precautions described in this section to thwart unauthorized use of the software and data on Oracle Big Data Appliance:
- About Predefined Users and Groups
- About User Authentication
- About Fine-Grained Authorization
- About HDFS Transparent Encryption
- About HTTPS / Network Encryption
- Port Numbers Used on Oracle Big Data Appliance
- About Puppet Security
- Additional Guidance for Securing Clusters
See Also:
Oracle Big Data Appliance development abides by Oracle's comprehensive OSSA (Oracle Software Security Assurance) standards.
https://www.oracle.com/corporate/security-practices/assurance/
2.2 About Predefined Users and Groups
Every open-source package installed on Oracle Big Data Appliance creates one or more users and groups. Most of these users do not have login privileges, shells, or home directories. They are used by daemons and are not intended as an interface for individual users. For example, Hadoop operates as the hdfs
user, MapReduce operates as mapred
, and Hive operates as hive
.
You can use the oracle
identity to run Hadoop and Hive jobs immediately after the Oracle Big Data Appliance software is installed. This user account has login privileges, a shell, and a home directory.
Oracle NoSQL Database and Oracle Data Integrator run as the oracle
user. Its primary group is oinstall
.
Note:
Do not delete, re-create, or modify the users that are created during installation, because they are required for the software to operate.
The following table identifies the operating system users and groups that are created automatically during installation of Oracle Big Data Appliance software for use by CDH components and other software packages.
Table 2-1 Operating System Users and Groups
User Name | Group | Used By | Login Rights |
---|---|---|---|
|
|
Apache Flume parent and nodes |
No |
|
|
Apache HBase processes |
No |
|
|
No |
|
|
|
No |
|
|
|
Hue processes |
No |
|
|
ResourceManager, NodeManager, Hive Thrift daemon |
Yes |
|
|
Yes |
|
|
|
Oozie server |
No |
|
|
Oracle NoSQL Database, Oracle Loader for Hadoop, Oracle Data Integrator, and the Oracle DBA |
Yes |
|
|
Puppet parent (puppet nodes run as |
No |
|
|
Apache Sqoop metastore |
No |
|
Auto Service Request |
No |
|
|
|
ZooKeeper processes |
No |
2.3 About User Authentication
Oracle Big Data Appliance supports Kerberos security as a software installation option. See Supporting User Access to Oracle Big Data Appliance for details about setting up clients and users to access a Kerberos-protected cluster.
2.4 About Fine-Grained Authorization
The typical authorization model on Hadoop is at the HDFS file level, such that users either have access to all of the data in the file or none. In contrast, Apache Sentry integrates with the Hive and Impala SQL-query engines to provide fine-grained authorization to data and metadata stored in Hadoop.
Oracle Big Data Appliance automatically configures Sentry during software installation, beginning with Mammoth utility version 2.5.
See Also:
-
Cloudera Manager Help
-
How to Add or Remove Sentry on Oracle Big Data Appliance v4.2 or Higher with bdacli (Doc ID 2052733.1)
on My Oracle Support.
2.5 About HDFS Transparent Encryption
HDFS Transparent Encryption protects Hadoop data that is at rest on disk. After HDFS Transparent Encryption is enabled for a cluster on Oracle Big Data Appliance, data writes and reads to encrypted zones (HDFS directories) on the disk are automatically encrypted and decrypted. This process is “transparent” because it is invisible to the application working with the data.
HDFS Transparent Encryption does not affect user access to Hadoop data, although it can have a minor impact on performance.
HDFS Transparent Encryption is an option that you can select during the initial installation of the software by the Mammoth utility. You can also enable or disable HDFS Transparent Encryption at any time by using the bdacli
utility. Note that HDFS Transparent Encryption can be installed only on a Kerberos-secured cluster.
Oracle recommends that you set up the Navigator Key Trustee (the service that manages keys and certificates) on a separate server, external to the Oracle Big Data Appliance.
See the following MOS documents at My Oracle Support for instructions on installing and enabling HDFS Transparent Encryption.
Title | MOS Doc ID |
---|---|
How to Setup Highly Available Active and Passive Key Trustee Servers on BDA V4.4 Using 5.5 Parcels | 2112644.1
Installing using parcels as described in this MOS document is recommended over package-based installation. See Cloudera’s comments on Parcels. |
How to Enable/Disable HDFS Transparent Encryption on Oracle Big Data Appliance V4.4 with bdacli | 2111343.1 |
How to Create Encryption Zones on HDFS on Oracle Big Data Appliance V4.4 | 2111829.1 |
Note:
If either HDFS Transparent Encryption or Kerberos is disabled, data stored in the HDFS Transparent Encryption zones in the cluster will remain encrypted and therefore inaccessible. To restore access to the data, re-enable HDFS Transparent Encryption using the same key provider.
See Also:
Cloudera documentation about HDFS at-rest encryption at http://www.cloudera.com for more information about managing files in encrypted zones.
2.6 About HTTPS/Network Encryption
HTTPS Network/Encryption on the Big Data Appliance has two components :
-
Web Interface Encryption
Configures HTTPS for the following web interfaces: Cloudera Manager, Oozie, and HUE. This encryption is now enabled automatically in new Mammoth installations. For current installations it can be enabled via the bdacli utility. This feature does not require that Kerberos is enabled.
-
Encryption for Data in Transit and Services
There are two subcomponents to this feature. Both are options that can be enabled in the Configuration Utility at installation time or enabled/disabled using the bdacli utility at any time. Both require that Kerberos is enabled.-
Encrypt Hadoop Services
This includes SSL encryption for HDFS, MapReduce, and YARN web interfaces, as well as encrypted shuffle for MapReduce and YARN. It also enable authentication for access to the web consoles for the MapReduce, and YARN roles.
-
Encrypt HDFS Data Transport
This option will enable encryption of data transferred between DataNodes and clients, and among DataNodes.
-
HTTPS/Network Encryption is enabled and disabled on a per cluster basis. The Configuration Utility described in the Oracle Big Data Appliance Owner’s Guide, includes settings for enabling encryption for Hadoop Services and HDFS Data Transport when a cluster is created. The bdacli utility reference pages (also in the Oracle Big Data Appliance Owner’s Guide ) provide HTTPS/Network Encryption command line options.
See Also:
Supporting User Access to Oracle Big Data Appliance for an overview of how Kerberos is used to secure CDH clusters.
About HDFS Transparent Encryption for information about Oracle Big Data Appliance security for Hadoop data at-rest.
Cloudera documentation at http://www.cloudera.com for more information about HTTPS communication in Cloudera Manager and network-level encryption in CDH.
2.6.1 Configuring Web Browsers to use Kerberos Authentication
If web interface encryption is enabled, each web browser accessing an HDFS, MapReduce, or YARN-encrypted web interface must be configured to authenticate with Kerberos. Note that this is not necessary for the Cloudera Manager, Oozie, and Hue web interfaces, which do not require Kerberos.
The following are the steps to configure Mozilla FirefoxFoot 1, Microsoft Internet ExplorerFoot 2, and Google ChromeFoot 3 for Kerberos authentication.
To configure Mozilla Firefox:
-
Enter
about:config
in the Location Bar. -
In the Search box on the about:config page, enter:
network.negotiate-auth.trusted-uris
-
Under Preference Name, double-click the
network.negotiate-auth.trusted-uris
. -
In the Enter string value dialog, enter the hostname or the domain name of the web server that is protected by Kerberos. Separate multiple domains and hostnames with a comma.
To configure Microsoft Internet Explorer:
-
Configure the Local Intranet Domain:
-
Open Microsoft Internet Explorer and click the Settings "gear" icon in the top-right corner. Select
Internet options
. -
Select the Security tab.
-
Select the Local intranet zone and click Sites.
-
Make sure that the first two options,
Include all local (intranet) sites not listed in other zones
andInclude all sites that bypass the proxy server
are checked. -
Click Advanced on the
Local intranet
dialog box and, one at a time, add the names of the Kerberos-protected domains to the list of websites. -
Click Close.
-
Click OK to save your configuration changes, then click OK again to exit the Internet Options panel.
-
-
Configure Intranet Authentication for Microsoft Internet Explorer:
-
Click the Settings "gear" icon in the top-right corner. Select
Internet Options
. -
Select the Security tab.
-
Select the Local Intranet zone and click the Custom level... button to open the Security Settings - Local Intranet Zone dialog box.
-
Scroll down to the User Authentication options and select
Automatic logon only in Intranet zone
. -
Click OK to save your changes.
-
To configure Google Chrome:
If you are using Microsoft Windows, use the Control Panel to navigate to the Internet Options dialogue box. Configuration changes required are the same as those described above for Microsoft Internet Explorer.
OnFoot 4 or on Linux, add the --auth-server-whitelist
parameter to the google-chrome
command. For example, to run Chrome from a Linux prompt, run the google-chrome command as follows
google-chrome --auth-server-whitelist = "hostname/domain"
Note:
On Microsoft Windows, the Windows user must be an user in the Kerberos realm and must possess a valid ticket. If these requirements are not met, an HTTP 403 is returned to the browser upon attempt to access a Kerberos-secured web interface.2.7 About Puppet Security
The puppet node service (puppetd
) runs continuously as root
on all servers. It listens on port 8139 for "kick" requests, which trigger it to request updates from the puppet master. It does not receive updates on this port.
The puppet master service (puppetmasterd
) runs continuously as the puppet user on the first server of the primary Oracle Big Data Appliance rack. It listens on port 8140 for requests to push updates to puppet nodes.
The puppet nodes generate and send certificates to the puppet master to register initially during installation of the software. For updates to the software, the puppet master signals ("kicks") the puppet nodes, which then request all configuration changes from the puppet master node that they are registered with.
The puppet master sends updates only to puppet nodes that have known, valid certificates. Puppet nodes only accept updates from the puppet master host name they initially registered with. Because Oracle Big Data Appliance uses an internal network for communication within the rack, the puppet master host name resolves using /etc/hosts
to an internal, private IP address.
2.8 Port Numbers Used on Oracle Big Data Appliance
The following table identifies the port numbers that might be used in addition to those used by CDH.
To view the ports used on a particular server:
-
In Cloudera Manager, click the Hosts tab at the top of the page to display the Hosts page.
-
In the Name column, click a server link to see its detail page.
-
Scroll down to the Ports section.
See Also:
For the full list of CDH component port numbers, go to the Cloudera website at
https://www.cloudera.com/documentation/enterprise/6/6.1/topics/cdh_ports.html#cdh_ports
Table 2-2 Oracle Big Data Appliance Port Numbers
Service | Port |
---|---|
30920 |
|
3306 |
|
20910 |
|
Oracle NoSQL Database administration |
5001 |
5010 to 5020 |
|
Oracle NoSQL Database registration |
5000 |
111 |
|
8140 |
|
Puppet node service |
8139 |
668 |
|
22 |
|
6481 |
|
Key Management Server (when hosted on the appliance) | 16000 |
2.9 Additional Guidance for Securing Clusters
Use the following resources to learn how to further strengthen cluster security.
Oracle Blogs
Footnote Legend
Footnote 1: Mozilla Firefox is a registered trademark of the Mozilla Foundation.Footnote 2: Microsoft Internet Explorer is a registered trademark of Microsoft Corporation.
Footnote 3: Google Chrome is a registered trademark of Google Inc
Footnote 4: Mac OS is a registered trademark of Apple, Inc.