pythonvaex

Vaex: The process cannot access the file because it is being used by another process


I am working on an application that uses Vaex for accessing data from a feather file. We are creating virtual columns in a dataframe that store Boolean values which are used to filter rows of data in the dataset. Every time a new filter is made a file is saved to cache the data. We are using export_feather to save the filter to a file, we are dropping the virtual column, then we are joining with the cache. Here is the part of the code that is being used:

filename = f"filter__{fiter_id}.feather"
df[[f"filter__{filter_id}"]].export_feather(
    str(export_path.joinpath(filename)).replace("\\", "/")
)

# Once the file is saved, drop the virtual column and join the cached selection
df.drop([f"filter__{filter_id}"], inplace=True)
df.join(vaex.open(export_path.joinpath(filename)), inplace=True)

In the application we look to clean up and delete cached files. When we try to delete files with

os.chmod(file,0o777) 
os.remove(file)

We get the error PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'data\\my-collection\\.my-app\\filters\\data.feather\\filter__1.feather'

If I remove the df.drop and df.join when creating the files, the error doesn't occur and the files are deleted. I tried looking at the vaex source code to see what the df.join function does, but I'm rather new to python and didn't see anything that jumped out at me. How are the file and dataset being handled and why is the file handle not released? In this context, what process is using the file and how can I close it so I can delete the file?


Solution

  • The answer to this question can be found on the Issue board on Vaex's github page: https://github.com/vaexio/vaex/issues/2119

    In summary: use df.close() before deleting the file from disk. Explanation on why this is needed is in the answer in the link above.