I'm brand new to GCP and Deep Learning VM. I got it to train some deep learning models. While training in the google cloud jupyter notebook it crashed as it could not copy input tensor from GPU to CPU: specifically:
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:GPU:0 to /job:localhost/replica:0/task:0/device:CPU:0 in order to run TensorDataset: Dst tensor is not initialized. [Op:TensorDataset]
After looking into it , it occurs when theres not enough memory in the GPU. I check my memory and my RAM was 95% full after just running for like an hour after I initialized the VM. I have no idea how this happened. How can I free up this memory?
Found out that the memory in the GPU was still present even after the python script terminated. Run nvidia-smi
to see if there is a python process taking up GPU memory, and if so run pkill -9 python
to kill all associated memory and processes for python.