apache-sentry

Does Sentry control access to HDFS files for clients using the HDFS protocol?


The Apache Sentry docs describe Sentry as follows:

Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table data).

The docs also show an image that suggests applications that access HDFS directly will not use Sentry and instead use the file ACL - is my understanding correct?

enter image description here


Solution

  • You understanding from the document is correct.

    For example with Hive, the data associated with the Managed tables stored in HDFS Path /user/hive/warehouse (by default) would be owned by hive:hive (user:group) when Sentry is enabled. This way other users will be restricted from accessing the files under these directories except the users who are authorised to access using Sentry rules.

    The data those exist outside this default HDFS path of Hive, i.e., usually the data associated with the External table and other data in HDFS can still be accessed normally bypassing Sentry since these are not managed by Sentry by default.

    So, if we want to write data in one of these directories through a Hive query in a Sentry enabled cluster, we need to assign required privileges to the role that is assigned to the group (group in which the user who runs the query is a member of).

    Hope this helps!

    More about Sentry Rules HERE