I am running the following code on colab taken from the example here: https://huggingface.co/transformers/model_doc/albert.html#albertformaskedlm
import os
import torch
import torch_xla
import torch_xla.core.xla_model as xm
assert os.environ['COLAB_TPU_ADDR']
dev = xm.xla_device()
from transformers import AlbertTokenizer, AlbertForMaskedLM
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForMaskedLM.from_pretrained('albert-base-v2').to(dev)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
data = input_ids.to(dev)
outputs = model(data, masked_lm_labels=data)
loss, prediction_scores = outputs[:2]
I haven't done anything to the example code except move input_ids
and model
onto the TPU device using .to(dev)
. It seems everything is moved to the TPU no problem as when I input data
I get the following output: tensor([[ 2, 10975, 15, 51, 1952, 25, 10901, 3]], device='xla:1')
However when I run this code I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-f756487db8f7> in <module>()
1
----> 2 outputs = model(data, masked_lm_labels=data)
3 loss, prediction_scores = outputs[:2]
9 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_albert.py in forward(self, hidden_states, attention_mask, head_mask)
277 attention_output = self.attention(hidden_states, attention_mask, head_mask)
278 ffn_output = self.ffn(attention_output[0])
--> 279 ffn_output = self.activation(ffn_output)
280 ffn_output = self.ffn_output(ffn_output)
281 hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])
RuntimeError: Unknown device
Anyone know what's going on?
Solution is here: https://github.com/pytorch/xla/issues/1909
Before calling model.to(dev)
, you need to call xm.send_cpu_data_to_device(model, xm.xla_device())
:
model = AlbertForMaskedLM.from_pretrained('albert-base-v2')
model = xm.send_cpu_data_to_device(model, dev)
model = model.to(dev)
There are also some issues with getting the gelu activation function ALBERT uses to work on the TPU, so you need to use the following branch of transformers when working on TPU: https://github.com/huggingface/transformers/tree/fix-jit-tpu
See the following colab notebook (by https://github.com/jysohn23) for full solution: https://colab.research.google.com/gist/jysohn23/68d620cda395eab66289115169f43900/getting-started-with-pytorch-on-cloud-tpus.ipynb