amazon-ec2pytorchtorchroberta-language-model

Error when loading torch.hub.load('pytorch/fairseq', 'roberta.large.mnli') on AWS EC2


I'm trying to run some code using Torch (and Roberta language model) on an EC2 instance on AWS. The compilation seems to fail, does anyone have a pointer to fix?

Confirm that Torch is correctly installed

import torch
a = torch.rand(5,3)
print (a)

Return this: tensor([[0.7494, 0.5213, 0.8622],...

Attempt to load Roberta

roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
Using cache found in /home/ubuntu/.cache/torch/hub/pytorch_fairseq_master
/home/ubuntu/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
fatal: not a git repository (or any of the parent directories): .git
running build_ext
/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
building 'fairseq.libnat' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/TH -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libnat/edit_dist.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                 from fairseq/clib/libnat/edit_dist.cpp:9:
/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)

It then ends, after a long while.

x86_64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1plus compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Solution

  • Got it to work by loading the pretrained model locally instead of from the hub.

    from fairseq.models.roberta import RobertaModel
    roberta = RobertaModel.from_pretrained('roberta.large.mnli', 'model.pt', '/home/ubuntu/deployedapp/roberta.large')
    roberta.eval()
    

    Note that I had to go for a XLarge EC2 instance to run this, otherwise process would be killed due to low memory.