I'm running a Jupyter notebook on a remote server through interactive nodes. I ran many cells of the notebook, and in order to save the state of the notebook, I ran
import dill
filename = <insert filename>
with open(filename, 'wb') as f:
dill.dump_session(f)
This works perfectly fine since
print("File exists:", os.path.exists(filename)) # Should return True
print("File size:", os.path.getsize(filename)) # Should return a non-zero value
After pickling return true, and a large number respectively. The .pkl file size is 7.2GB at this point.
Then I closed the notebook and the interactive node session. I requested a new interactive node session and opened the notebook again through it. The .pkl file size was still 7.2GB.
But then I ran
import dill
with open(filename, 'rb') as f:
dill.load_session(f)
It gave me an UnpicklingError: pickle data was truncated
error, and when I checked now, the .pkl file was empty (0 B).
Does anybody know what is happening here?
As of now, I've saved some of the important objects from my code, like a random forest classifier and a one vs rest logistic regression model, in their own pkl files, but now I am afraid to unpickle them in case that erases their contents like it did for the whole notebook session.
The most probable cause is that the 7.2GB file wasn't fully written to disk before your interactive node session terminated.
You could try to explicitly flush the file buffer after dill.dump_session(f)
.
import dill
import os
filename = "my_session.pkl"
with open(filename, 'wb') as f:
dill.dump_session(f)
f.flush() # Ensure that any data stored in the buffer is written to the file
os.fsync(f.fileno()) # Ensure that the data is physically written to the disk