pythonazurestatsmodelsazure-data-lake-gen2

saving statsmodel to adls blob storage


i currently have a model fit using statsmodel OLS formula and I am trying to save this model to ADLS blob storage. '/mnt/outputs/' is a mount point I have created and I am able to read and write other files from this directory.

import statsmodels.formula.api as smf
fit = smf.ols(formula=f"Pressure ~ {cat_vars_int} + Speed + dose_time:Speed + Speed:log_curr_speed_time", data=df_train).fit()

path = f'/mnt/outputs/Models/20240406_M2.pickle'
fit.save(path)

However I am getting this error when I am saving. I am trying to write a new file not read an existing file, so i am not sure why i am getting this error. Any help would be great, thanks!

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/outputs/Models/20240406_M2.pickle'

Solution

  • Default the mount point will be under dbfs context, whenever you reference files without spark you need prefix path with /dbfs.

    So, save the file giving path like below.

    path = f'/dbfs/mnt/outputs/Models/20240406_M2.pickle'
    fit.save(path)
    

    and whenever accessing via spark context give like below.

    spark.read.csv("dbfs:/path_to_file")
    

    Listing files.

    Dbutils

    display(dbutils.fs.ls(mount_point))
    

    Output:

    enter image description here

    Python OS module

    os.listdir("/dbfs/"+mount_point)
    

    enter image description here

    Learn more about handling files in databricks here.