4.4 Hadoop Authorization: File Level Access and Apache Sentry

The ability to access source data is based on both the underlying access privileges on the source files and Hive authorization rules defined by Apache Sentry. To populate Oracle Big Data SQL external tables, either the default oracle user or the actual connected user (when using Multi-User Authorization) must be authorized to read the source data and/or Hive metadata.

Access to data files in Hadoop is very similar to the POSIX permissions model (the model used by Linux). Each file and directory has an associated owner and group. The file and directory permission bits are used to determine who has access to that information.

Apache Sentry is a role based authorization engine used for Hive metadata. Sentry roles are defined for different data access needs (e.g. finance role, marketing role, etc.). Access to objects (a server, Hive database, table and column) is granted to specific roles. Users can then view those data objects if their group has been given appropriate rights.

Oracle Big Data SQL supports Sentry in addition to supporting file-level authorization. It processes the Sentry policy rules when a user attempt to query Oracle Big Data SQL external tables, down to the column level. This means that authorization rules do not need to be replicated in Oracle Database. A user may have rights to select from an Oracle external table. However, Hadoop authorization only allows the user to see the data if that user has the appropriate Sentry roles and data access privileges.