I'm trying to setup tensorboard to show profile information so I can debug a slow model.
I have following the steps on https://www.tensorflow.org/guide/profiler
Running pip freeze | grep tensorboard
shows:
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.8.0
tensorboard-plugin-wit==1.8.1
Running /sbin/ldconfig -N -v $(sed 's/:/ /g' <<< $LD_LIBRARY_PATH) | grep libcupti
shows:
/sbin/ldconfig.real: Path `/usr/local/cuda-11.6/targets/x86_64-linux/lib' given more than once
/sbin/ldconfig.real: Path `/usr/local/cuda-11.4/targets/x86_64-linux/lib' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib' given more than once
libcupti.so.11.6 -> libcupti.so.2022.1.1
libcupti.so.11.4 -> libcupti.so.2021.2.0
/sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
libcupti.so.10.1 -> libcupti.so.10.1.208
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.31.so is the dynamic linker, ignoring
The instruction page was not clear what output should be shown, but I assumed the references to libcupti.so indicates success.
Tensorboard was started with tensorboard --logdir logs/
, which outputs:
2022-04-08 17:42:03.872466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-04-08 17:42:03.947825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-04-08 17:42:03.948334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
NOTE: Using experimental fast data loading logic. To disable, pass
"--load_fast=false" and report issues on GitHub. More details:
https://github.com/tensorflow/tensorboard/issues/4784
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
However the tensorboard GUI does not display a profile tab:
What am I missing?
I was missing the code to add the tensorboard callback:
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=os.path.join(self.run_folder.value, 'logs'),
histogram_freq=1,
profile_batch='100,120'