google-cloud-platformpytorchdatasettpu

How to train a Pytorch model with a custom dataset on a TPU?


FYI, I can only spare the initial 300$ since I'm a student so I need to minimize the trial & error phase.

I have my Pytorch-based model which currently runs on local GPU with a ~100GB of frames dataset that is in my local storage, I'm looking for a guide that shows how to set up a machine to train & test my model with TPUs on the dataset which will be in my Google Drive(?)(or any other recommended cloud storage).

The guides I found don't match up to my description, most of them either run on GPU or TPU with a dataset that is included in a dataset library, I prefer not to waste time and budget on trying to assemble a puzzle from those pieces.


Solution

  • First, to use TPUs on Google Cloud TPUs you have to use the PyTorch/XLA library, as its enable the support to use TPUs with PyTorch.

    There is some options to do so, you can use code lab or create an environment on GCP to this. I understand that you may want to know how is to work in a "real environment" besides working on codelab, but there will be no much difference, and codelab is often used as main environment for ml development.

    Also, keep mind that with a TPU instance and a notebook to code in a notebook in GCP will drain your 300 $ in a fell days (or hours). Just the TPU v3 ready for pytorch you cost around $ 6k/month.

    On colab:

    On GCP:

    os.environ["XRT_TPU_CONFIG"]="tpu_worker;0;10.0.200.XX:8470"
    

    enter image description here