Security Overview

Learn about Oracle Cloud SQL security.

Leveraging Hadoop Security

For Oracle Big Data Service clusters for which advanced security is not enabled, queries executed using Cloud SQL run as the oracle user on the Hadoop cluster. All Hadoop audits in this default configuration show that the oracle user accessed the files.

Cloud SQL provides a feature called Multi-User Authorization that enables it to impersonate the connected user when accessing data on the Hadoop cluster. With Multi-User Authorization, the oracle identity is no longer used to authorize data access. Instead, the identity of the actual connected user receives authorization. Additionally, Hadoop audits attribute file access to the connected user, rather than to oracle.

Users and applications can connect to Oracle Database in these distinct ways (and more):
  • As a Kerberos user
  • As a database user
  • As an LDAP user
  • As an application user

Multi-User Authorization allows the administrator to specify how this connected user should be derived. Alternatively, applications that manage their own users may use the Oracle Database client identifier to derive the currently connected user (and use that user's identity to authorize access to data on the Hadoop cluster). Cloud SQL provides a mapping that contains the rules for identifying the actual user.

Hadoop Authorization: File-Level Access and Apache Sentry

The ability to access source data is based on both the underlying access privileges on the source files and on the Hive authorization rules defined by Apache Sentry in clusters using Cloudera Distribution including Hadoop . To populate Cloud SQL external tables, either the default oracle user or the actual connected user (when using Multi-User Authorization) must be authorized to read the source data or the Hive metadata.

Access to data files in Hadoop is very similar to the POSIX permissions model (the model used by Linux). Each file and directory has an associated owner and group. The file and directory permission bits are used to determine who has access to that information.

In clusters using Cloudera Distribution including Hadoop, Apache Sentry is a role-based authorization engine used for Hive metadata. Sentry roles are defined for different data access needs (such as the finance role, marketing role, and so on). Access to objects (a server, Hive database, table, and column) is granted to specific roles. Users can then view those data objects if their group has been given appropriate rights.

In clusters using Cloudera Distribution including Hadoop Cloud SQL supports Sentry in addition to supporting file-level authorization. It processes the Sentry policy rules when a user attempts to query Cloud SQL external tables, down to the column level. This means that authorization rules don't need to be replicated in Oracle Database. A user may have rights to select from an Oracle external table. However, Hadoop authorization only allows the user to see the data if that user has the appropriate Sentry roles and data access privileges.