pythonmssparkutils

List the content of a directory mssparkutils


I used the following code to list the files in a directory but it is showing the entire path instead of just the file name:

historical_logs_adls_path = (
    f"abfss://{staging_container_name}@{staging_account_name}.dfs.core.windows.net/"
    f"{staging_dirname}"
 )


mssparkutils.fs.ls("/")
mssparkutils.fs.ls(historical_logs_adls_path)

I simply need the list of files.


Solution

  • Use basename function: https://www.geeksforgeeks.org/python-os-path-basename-method/

    Try the below:

    import os
    
    historical_logs_adls_path = (
        f"abfss://{staging_container_name}@{staging_account_name}.dfs.core.windows.net/"
        f"{staging_dirname}"
     )
    
    file_list = [os.path.basename(file.path) for file in mssparkutils.fs.ls(historical_logs_adls_path)]
    print(file_list)