I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a parquet file in my Unity Catalog Volume (whose files and stuff I can also see within my Microsoft Azure Storage Explorer) such that I can download it easily. That did not work. So, what I tried were the following approaches:
spark.table(my_unity_catalog_table_path).repartition(1).write.format('parquet').mode('overwrite').save('/Volumes/my_volume_name/my_table'). Databricks told me that I'm not allowed to write to a Volume like thatspark.table(my_unity_catalog_table_path).toPandas().to_parquet('/Workspace/Users/myuser/my_table.parquet') which worked but not for bigger tables as I guess the Workspace has limits regarding file sizespark.table(my_unity_catalog_table_path).toPandas().to_parquet('/Volumes/my_volume_name/my_table.parquet') but that also didn't work out...spark.table(my_unity_catalog_table_path).toPandas().to_parquet('/tmp/my_table.parquet') in order to move it to the Volume afterwards using dbutils.fs.mv or shutil.move. None of those options worked either.So how can this be done?
If you're working with Azure Data Lake you can try writing directly to it and then download file from there using Storage Explorer. Try specifying path like in a like snippet below:
spark.table(my_unity_catalog_table_path).repartition(1).write.format('parquet').mode('overwrite').save("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")