databricksazure-databricksdelta-lakeazure-data-lake-gen2

Databricks Delta Table from Datalake with Partitioned Folders


I have data on Azure Storage Account in a Container called Staging.

Using the SourceSystem/Dataset hierarchy I also have date based folder per month. enter image description here Inside the folder the data is stored in a delta format. enter image description here

In data bricks I already have the the container mounted. I want to create a Delta Table whereby I can see ALL the data in the underlying folder printtransaction folder.

DROP TABLE IF EXISTS staging.default.ysoft_printtransaction;

CREATE TABLE staging.default.ysoft_printtransaction
USING delta
OPTIONS ('delta.checkpoint.writeStatsAsJson' 'false', 'delta.checkpoint.writeStatsAsStruct' 'true')
LOCATION 'abfss://staging@saccdssanpoc01.dfs.core.windows.net/ysoft/printtransaction/*/';

SELECT * FROM staging.default.ysoft_printtransaction;

This however does not work as I am getting an issue when reading from the table stating: enter image description here

Obviously I don't want to create a table per partitioned folder.

What are my options?


Solution

  • As per information you provided these two folders are separate delta table. I agree with @Alex Ott as he said as these two folders are separate delta tables you need to read them separately then you can merge them and combine it using merge query in the different table.

    It will not be able read the data from multiple delta tables in a single table.

    enter image description here

    Sample code to read each folder separately:

    CREATE  TABLE  ysoft_printtransaction1
    USING delta
    Location  'abfss://containername@storageaccname.dfs.core.windows.net/sampledb/202113';
    
    CREATE  TABLE  ysoft_printtransaction2
    USING delta
    Location  'abfss://containername@storageaccname.dfs.core.windows.net/sampledb/202110';
    

    To merge table, you can use following command.

    MERGE  INTO  target  USING  source
      ON  target.key  =  source.key
      WHEN  NOT  MATCHED  THEN  INSERT  *