I am trying to use tensorboardX to debug a pytorch NN that is running in a p2.xlarge instance of AWS.
I followed this tutorial to open the port 6006.
The model is running and tensorboardX is making its writer file. I get the following warning there. I am not sure how relevant it is.
WARNING:root:tuple appears in op that does not forward tuples (VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117) frame #0: std::function::operator()() const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #4: torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x130cfc (0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
WARNING:root:tuple appears in op that does not forward tuples (VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117) frame #0: std::function::operator()() const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #4: torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x130cfc (0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
The problem is that I don't have access to the tensorboard browser user interface. I take the following steps:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate pytorch_p36
$ tensorboard --logdir=runs
Where I get the error message:
Segmentation fault (core dumped)
When I check the syslog var/log/syslog
I see that following:
Jun 26 09:06:40 ip-172-xx-xx-xxx kernel: [515315.598917] tensorboard[1446]: segfault at 0 ip (null) sp 00007ffd64c5f178 error 14 in python2.7[55d8673d1000+1000]
My googling skills were far from enough. How can I access tensorboard through the browser with it running in the ASW instance?
Please let me know if something is unclear or if some info is missing.
Even though the code has to run in the environment pytorch_p36, tensorboard actually has to run on a different environment.
The sequence of commands in the terminal should be:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate tensorflow_p27
$ tensorboard --logdir=runs
Then the designated port opens.