azureazure-blob-storagedatabricksazure-databricksdatabricks-sql

Is there a way to use COPY INTO Databricks command to copy all files from an Azure Blob into a Delta table?


I was using Databricks to read data from an Azure Blob container (CSV files) using the COPY INTO command and copy it into Databricks Delta tables. Here is the code I was using:

f"""COPY INTO {schema_name}.{table_name}
                            FROM (SELECT {', '.join([f"_c{i}::{column[1]} {column[0]}" for i, column in enumerate(columns)])}
                                  FROM '{blob_url}/{table_name}.csv')
                            FILEFORMAT = CSV
                            FORMAT_OPTIONS ('sep' = '|',
                                            'header' = 'false');"""

Where columns is a list of pairs (column_name, type) of each column of the table.

Ex: [('ca_address_sk', 'integer'), ('ca_address_id', 'char(16)'), ('ca_gmt_offset', 'decimal(5,2)'), ('ca_location_type', 'char(20)')]

Now, instead of copying from a CSV into a delta table, I would like to copy all CSV files from a folder into a delta table. I mean, instead of doing something like COPY INTO table from file.csv, I want to do something like COPY INTO table from folder/*, since the folder is filled with the CSV files I want, all with the same structure obviously (in reality, they are partitions of a single file, like the one I used in the first example).

I have read the COPY INTO documentation but have not found anything similar to what I want. Also, running a ls of the blob and iterating over all the files in the folder is not possible because it would add too much complexity to the code, which should be as simple as possible.


Solution

  • I do agree with @Chen Hirsh, If you give the path only till the folder name it will read all the files in that folder.

    Here is a sample on reading multiple csv files using COPY INTO. To read only .csv files, use wild card.

    %sql
    copy into mytable1
    from "abfss://inputdata@rakeshadls.dfs.core.windows.net/folder1"
    with (credential (AZURE_SAS_TOKEN = "SAS")
    )
    FILEFORMAT = CSV
    FILES = ('*.csv')
    FORMAT_OPTIONS ('mergeSchema'='true', 'header' = 'true', 'inferSchema'='true')
    COPY_OPTIONS ('force' = 'true', 'mergeSchema'= 'true')
    

    enter image description here