sql-serverazureazure-sql-databaseexternal-tablessql-data-warehouse

Create External Table in Azure SQL Data warehouse to a wild card based file or folder path


I know we can create an External table in Azure SQL Data warehouse pointing to a LOCATION that is either a file path or a folder path. Can this file or folder path be based on a wild card pattern instead of an explicit path.

Here my file path is a location in Azure Data Lake Store.

-- Syntax for SQL Server

-- Create a new external table  
CREATE EXTERNAL TABLE [ database_name . [ schema_name ] . | schema_name. ] table_name   
    ( <column_definition> [ ,...n ] )  
    WITH (   
        **LOCATION = 'folder_or_filepath'**,  
        DATA_SOURCE = external_data_source_name,  
        FILE_FORMAT = external_file_format_name  
        [ , <reject_options> [ ,...n ] ]  
    )  
[;]  

Solution

  • Polybase / External Tables do not support wildcards at this time. Simply have one folder per external table you require. If you feel this is an important missing feature you can create a request and vote for it here:

    https://feedback.azure.com/forums/307516-sql-data-warehouse

    Bear in mind Polybase (in Azure SQL Data Warehouse) can now read files either in blob storage or in Azure Data Lake Storage (ADLS). Therefore as another workaround, Azure Data Lake Analytics (ADLA) and U-SQL support Polybase, so you could use U-SQL to move the files you want from blob store into your lake, eg

    // Move data from blob store to data lake
    // add filename and structure as one file
    DECLARE @inputFilepath string = "wasb://someContainer@someStorageAccount.blob.core.windows.net/someFilter/{filepath}.csv";
    DECLARE @outputFilepath string = "output/special folder/output.csv";
    
    
    @input =
        EXTRACT 
                ...     // your column list
                filepath string
        FROM @inputFilepath
        USING Extractors.Csv()
    
    
    @input =
        SELECT * FROM @input
        WHERE filename.Contains("yourFilter");
    
    
    // Export as csv
    OUTPUT @input
    TO @outputFilepath
    USING Outputters.Csv(quoting:false);
    
    
    // Now the data is in Data Lake which Polybase can also use as a source