I have the below folder structure on ADLS gen2:
abfss://mycontainer@mystorageaccount.dfs.core.windows.net/original_data/
which has the below folders inside it.
abc1/<child_folder_main>
abc2/<child_folder_main>
abc_34/<child_folder_main>
xyf_11/<child_folder_main>
sjw93/<child_folder_main>
But the issue here is that the names of the first folder inside the original_data
directory is not properly known and needs to be extracted at runtime based on the name of its corresponding <child_folder_main>
.
In short, I need to input the real name of the child_folder_main
and I want to output abc1
or abc_34
or xyf_11
or whatever it's parent folder name is based on the given input.
I'm using dbutils.fs
operations. But I don't know how to achieve this. Can someone please help?
You can follow below approach.
First get the folder names at level 1 under you original_data
directory,
then check if the folder exists with child_folder_main
and level 1 folders you got.
Use below code.
def find_parent_folder(child_folder_main):
directories = dbutils.fs.ls(original_data_path)
for directory in directories:
parent_folder = directory.path.split("/")[-2]
try:
if dbutils.fs.ls(directory.path + "/" + child_folder_main):
return parent_folder
except:
res= None
return res
child_folder_main = "sample.csv"
parent_folder = find_parent_folder(child_folder_main)
print("Parent folder:", parent_folder)
Here, i done splitting on the path and extracted parent folder by indexing it and with child folder checked if the path exists, if it is present return it or keep the result as None
Output: