pythondatasetgoogle-colaboratoryunzipunrar

Unzip failed to finish in Google Colab


So I try to train an autoencoder model but have difficulty on extracting large zipfile and rarfile in Google Drive. its a 3GB zipfile containing 500 dirs of images and a5GB rarfile containing 1.7 million images.

I try to ran this code in Colab and it finished extracting my 3 GB zipfile after 6 hours.

!unzip -q drive/"My Drive"/"Colab Notebooks"/"Dataset"/"Dataset_Final_500"/syn_train_3.zip -d drive/"My Drive"/"Colab Notebooks"/"Dataset"/"Dataset_Final_500"/ 

but when i checked, it only creates 86 out of 500 directories in my google drive. Why does it happen and how do I continue without re-extract it all over again. Any idea on extracting my 5GB rarfile to google drive?

Any help would be a blessing :)


Solution

  • As @BobSmith said, I move all of my dataset to the google colab's local disk first and extract all of it using :

    !unzip -u -q /content/syn_train_3.zip
    

    and for rar using unrar

    !unrar e real_train_500_2.rar train_dir
    

    the extraction is proved faster. and I split the dataset to .npy files and save it to the drive again.

    I found that Google Colab uses Google Drive File Stream like Backup and Sync in your desktop. It would be painful to wait the dataset synced between Colab and Drive.

    Careful, don't let the "/drive/My Drive" in Google Colab fools you that it already saved to Google Drive, it needs time to sync!.