azure-data-lakeu-sql

how to query differently-structured files?


Is it possible to execute a query against files that have different schemas?

I have 2 sets of files in the same directory. The second type has an extra field.

Type 1

id, first, last
1, liza, smith

Type 2

id, first, last, state
4, alex, gordon, CT

Desired Result

1, liza
4, alex

How do we query files with different schemas, but where you want the same output fields?

Here's what I have:

SELECT id, first                
FROM "/one 1300/{files}.csv"
USING Extractors.Csv();

@interestingRows = SELECT id, first FROM  @interestingRows;

OUTPUT @interestingRows
TO @uriPrefix + "/one 1300/output/output.csv"
USING Outputters.Csv();

Solution

  • How about importing the column as one using a delimiter you know does not exist in the data, then splitting it afterwards using the Split method of the string class? Something like this:

    @working =
        EXTRACT wholeRow  string
        FROM "/one 1300/{*}.csv"
        USING Extractors.Text(delimiter:'|');
    
    
    @working =
        SELECT 
            wholeRow.Split(',')[0] AS id,
            wholeRow.Split(',')[1] AS first,
            wholeRow.Split(',')[2] AS last
        FROM @working;
    
    
    OUTPUT @working
    TO "/output/output.csv"
    USING Outputters.Csv(quoting:false);