azuredatabricksazure-databricksdatabricks-sql

Unable to rename an external location in Azure ADLS for existing tables


We have an ETL pipeline with landing, bronze, silver layer tables. The data is stored in Azure ADLS containers as external tables. The pipeline is running on daily basis, and data is stored in the desired location.

But now we found that the external location directory name is incorrect and this need to be renamed without impacting the daily pipelines, and downstream data access for these tables.

I found some references about ALTER EXTERNAL LOCATIONS, etc. but when I try it with some assumptions since am not sure this COMMAND NEED TO BE RUN after renaming the folder in ADLS container or should this be done later.

Also am getting errors "AnalysisException: /mnt/bronze/user_data is not a Delta table." when i try the command. Can someone please explain the steps to follow so that the external location can be renamed, and also apply to all the tables referencing to older location.

I am providing herewith reproducible code for testing please.

  1. Create a directory in a ADLS container with name "users_info"

  2. Create a databricks notebook, and have a mount point created for this location, say "/mnt/bronze/users_info"

  3. Create a database : user_master

  4. Create a table with following code:

    %sql CREATE TABLE user_master.user ( id INT, name STRING, age INT ) USING DELTA LOCATION '/mnt/bronze/users_info/user';

  5. Insert few sample records:

    %sql insert into user_master.user values(1, 'James', 25) insert into user_master.user values(2, 'Rechard', 25)

  6. Verify its external location path by running the command "describe extended user_master.user"

  7. Now try to rename the external location as follows:

ALTER TABLE hive_metastore.user_master.user SET LOCATION '/mnt/bronze/user_data';

If I try to rename the folder in ADLS container before running this command, then we get table not found error.

enter image description here

APPRECIATE ANY help, to rename the ADLS container/folder path and also rename path external locations of existing tables, without any impact.


Solution

  • I have tried the below and I received same ERROR like you:

    ALTER TABLE hive_metastore.default.b02 SET LOCATION '/mnt/raw/new02/dds.csv';
    
    [DELTA_MISSING_DELTA_TABLE] `/mnt/raw/new02/dds.csv` is not a Delta table. SQLSTATE: 42P01
    

    As the ERROR message says the created external table is not Delta table. I have tried the below approach creating a Delta table:

    path = "/mnt/raw02/dilip/external_delta_table"
    df.write.format("delta").save(path)
    spark.sql(f"""
    CREATE TABLE hive_metastore.default.b02
    USING DELTA
    LOCATION '{path}'
    """)
    spark.sql("""
    INSERT INTO hive_metastore.default.b02
    VALUES (4,"jaya shankar")
    """)
    

    In the above code specifying the path in ADLS where you want to save the Delta table Writing the DataFrame out in the Delta format to create an external Delta table Creating an external table pointing to this Delta table Creating a new directory with the desired name in ADLS. Using the Databricks filesystem utilities (dbutils.fs.cp) to copy the data from the old directory to the new one. After copying the data, update the Delta table's location to point to the new directory. This can be done using the ALTER TABLE SQL command.

    Results:

    df = spark.read.format("delta").option("header", "true").option("inferSchema", "true").load("/mnt/raw02/dilip/external_delta_table")
    display(df)
    
    ID  Name
    1   dilip
    2   raj
    3   Narayan
    4   jaya shankar
    

    As you mentioned you want to rename an external location in ADLS for existing tables.

    The below is the approach:

    dbutils.fs.cp("/mnt/raw02/dilip/external_delta_table", "/mnt/raw02/dilip/new_external_delta_table", recurse=True)
    spark.sql("""
    ALTER TABLE hive_metastore.default.b02
    SET LOCATION '/mnt/raw02/dilip/new_external_delta_table'
    """)
    

    Results:

    spark.sql("DESCRIBE DETAIL hive_metastore.default.b02").display(truncate=False)
    

    enter image description here

    enter image description here You can also use the below command to delete the old directory.

    dbutils.fs.rm("/mnt/raw02/dilip/external_delta_table", recurse=True)
    

    You can also refer ALTER EXTERNAL LOCATION