4.4 Hadoop Authorization: File Level Access and Apache Sentry

The ability to access source data is based on both the underlying access privileges on the source files and Hive authorization rules defined by Apache Sentry. To access data in Oracle Big Data SQL external tables, either the default oracle user or the actual connected user (when using Multi-User Authorization) must be authorized to read the source data and/or Hive metadata.

Hadoop file permissions are very similar to POSIX file permissions. Each file and directory has an associated owner and group. Besides file permissions, HDFS also supports Access Control Lists (ACL), to provide more fine grained authorization for specific users and groups. See Synchronizing HDFS ACLs and Sentry Permissions for information on how to synchronize Sentry privileges with HDFS ACLs for specific HDFS directories.

Apache Sentry is a role based authorization engine used for Hive metadata. Sentry roles are defined for different data access needs (e.g. finance role, marketing role, etc.). Access to objects (a server, Hive database, table and column) is granted to specific roles. Users can then view those data objects if their group has been given appropriate rights.

Oracle Big Data SQL supports Sentry in addition to supporting file-level authorization. It processes the Sentry policy rules when a user attempt to query Oracle Big Data SQL external tables, down to the column level. This means that authorization rules do not need to be replicated in Oracle Database. A user may have rights to select from an Oracle external table. However, Hadoop authorization only allows the user to see the data if that user has the appropriate Sentry roles and data access privileges.