azureazure-data-lakeu-sqlazure-data-lake-gen2

Referencing ADL storage gen2 files from U-SQL


I have an ADL account set up with two storages: the regular ADLS gen1 storage set up as a default and a blob storage with "Hierarchical namespace" enabled which is connected to ADLS using storage key if that matters (no managed identities at this point). The first one is unrelated to the question, it just is, the second one for the sake of this question is registered under the name testdlsg2. I see both in data explorer in Azure portal.

Now, I have a container in that blob storage called logs and at the root of that container there are log files I want to process.

How do I reference those files in that particular storage and that particular container from U-SQL?

I've read the ADLS Gen2 URI documentation and came up with the following U-SQL:

@data =
    EXTRACT
        Timestamp long,
        // skip, skip, skip
        LogDate DateTime,
        LogOrder int
    FROM "abfss://logs@testdlsg2.dfs.core.windows.net/log_{LogDate:yyyy}{LogDate:MM}{LogDate:dd}_{LogOrder}.log.gz"
    USING Extractors.Text(delimiter: ' ', quoting: true, skipFirstNRows: 1);

// the rest is irrelevant

Unfortunately, when I submit that to ADL, the job fails with the following error:

CsEnumerateDirectoryWithPaging failed with error 0x83090A1A (The operation is not supported on the provided Url type). Cosmos Path: abfss://logs@testdlsg2.dfs.core.windows.net/

The query works fine locally when using local storage with relative paths.


Solution

  • As per the comment, U-SQL does not work with Azure Data Lake Gen 2 and it's unlikely it ever will. There is a feedback item which you should read:

    https://feedback.azure.com/forums/327234-data-lake/suggestions/36445702-add-support-for-adls-gen2-to-adla

    In the year 2020, consider starting new Azure analytics projects with Azure Databricks.