Share via


Introducing File and Folder ACLs for Azure Data Lake Store

Overview

We’re excited today to announce the availability of File and Folder ACLs for the Azure Data Lake Store. Many of you have been eagerly awaiting this feature because it is critical in securing their big data.

When we launched the preview of Data Lake Store in October 2015, filesystem security was controlled by a single ACL at the root of store that applied to all files and folders underneath.

Starting today, ACLs can be set on any file or folder within the store, not just the root folder.

The Access Control Model used by Data Lake Store

We’ve emphasized that Azure Data Lake Store is compatible with WebHDFS. Now that ACLs are fully available, it’s important to understand the ACL model in WebHDFS/HDFS because they are POSIX-style ACLs and not Windows-style ACLs.  Before we dive deep into the details on the ACL model, here are key points to remember.

  • POSIX-STYLE ACLs DO NOT ALLOW INHERITANCE. For those of you familiar with POSIX ACLs, this is not a surprise. For those coming from a Windows background this is very important to keep in mind. For example, if Alice can read files in folder /foo, it does not mean that she can read files in /foo/bar. She must be granted explicit permission to /foo/bar. The POSIX ACL model is different in some other interesting ways, but this lack of inheritance is the most important thing to keep in mind.
  • ADDING A NEW USER TO DATA LAKE ANALYTICS REQUIRES A FEW NEW STEPS. Fortunately, a portal wizard automates the most difficult steps for you.

The FULL DESCRIPTION of the Access Control model is here: https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-access-control/

Adding a New Data Lake Analytics User

If you want a new user to run U-SQL jobs in ADLA, the overall steps are shown below:

  1. Assign the user to a role in the Azure Data Lake Analytics account (using Azure RBAC)
  2. *Optional* Assign the user to a role in the Azure Data Lake Store account (using Azure RBAC)
  3. Run the ADLA “Add User Wizard” for the user
  4. Give user R-X access on all folders and their subfolders recursively where data must be read by U-SQL jobs
  5. Give user RWX access on all folders and their subfolders recursively where data must be written by U-SQL jobs

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gzohZ69Ob47k-P_y

Adding an New Data Lake Store User

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gzyviEyNrAn8kAqS

Giving an HDInsight Cluster Access to Data Lake Store

Detailed Instructions can be found here: https://1drv.ms/w/s!AvdZLquGMt47gz3ks4YwQRMXGi3j

ProTip: Leverage the power of Active Directory Security groups

Repeating manual steps is both irritating and prone to error. It’s easier if you use Active Directory security groups.

First give the needed permissions to the security group. Afterwards, adding new users is simple: just add them to the security group. This will dramatically simplify maintaining and securing your Data Lake.

Comments

  • Anonymous
    July 31, 2016
    The comment has been removed
    • Anonymous
      August 01, 2016
      The comment has been removed
  • Anonymous
    December 10, 2016
    When creating my HDInsight hadoop cluster, I am finding that providing my HDInsight Azure AD service principal to a folder within my Azure Data Lake Store sub folder takes a long time since there are 2500 files (i.e. around 5 mins). Any guidance to make this faster or should I keeping my file count (i.e. json, csv) lower (but larger)? I intend on working in a scenario where I reach terabytes of data by large file size and large quantity of files. Appreciate any guidance.
  • Anonymous
    December 13, 2017
    The comment has been removed