[SOLVED] dill.load_session() causes `UnpicklingError: pickle data was truncated` even though dill.dump

dill.load_session() causes `UnpicklingError: pickle data was truncated` even though dill.dump_session() pickled perfectly fine

I'm running a Jupyter notebook on a remote server through interactive nodes. I ran many cells of the notebook, and in order to save the state of the notebook, I ran

import dill
filename = <insert filename>
with open(filename, 'wb') as f:
    dill.dump_session(f)

This works perfectly fine since

print("File exists:", os.path.exists(filename))  # Should return True
print("File size:", os.path.getsize(filename))  # Should return a non-zero value

After pickling return true, and a large number respectively. The .pkl file size is 7.2GB at this point.

Then I closed the notebook and the interactive node session. I requested a new interactive node session and opened the notebook again through it. The .pkl file size was still 7.2GB.

But then I ran

import dill
with open(filename, 'rb') as f:
    dill.load_session(f)

It gave me an UnpicklingError: pickle data was truncated error, and when I checked now, the .pkl file was empty (0 B).

Does anybody know what is happening here?

As of now, I've saved some of the important objects from my code, like a random forest classifier and a one vs rest logistic regression model, in their own pkl files, but now I am afraid to unpickle them in case that erases their contents like it did for the whole notebook session.

Solution

The most probable cause is that the 7.2GB file wasn't fully written to disk before your interactive node session terminated.
You could try to explicitly flush the file buffer after dill.dump_session(f).

import dill
import os


filename = "my_session.pkl"

with open(filename, 'wb') as f:
    dill.dump_session(f)
    f.flush() # Ensure that any data stored in the buffer is written to the file 
    os.fsync(f.fileno()) # Ensure that the data is physically written to the disk