pythontensorflowgccubuntu-16.04gcc9

Tensorflow skylake-avx512 compiled from source missing __cpu_model symbol


I am compiling tensorflow with skylake-avx512 from source as follows, my python is built like this:


git clone https://github.com/python/cpython.git && cd cpython && git checkout 2.7
CXX="/usr/bin/g++" CXXFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" CFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" ./configure  \
            --enable-optimizations  \
            --with-lto \
            --enable-unicode=ucs4  \
            --with-threads \
            --with-libs="-lbz2 -lreadline -lncurses -lhistory -lsqlite3 -lssl" \
            --enable-shared \
            --with-system-expat \
            --with-system-ffi   \
            --with-ensurepip=yes \
            --enable-unicode=ucs4 \
            --disable-ipv6
RUN cd /opt/cpython && make -j16
RUN cd /opt/cpython && make install

Tensorflow build command:

bazel build   --copt=-O3  --copt=-mtune=skylake-avx512 --copt=-march=skylake-avx512        //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt

Only option set is XLA JIT, everything else is set to "no". I am using the docker image for tensorflow v1.12.0-devel, and I am copiling tag v1.12.3.

For completeness:

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/local/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/site-packages]

Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -O3 -mtune=skylake-avx512 -march=skylake-avx512


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
Configuration finished

I am copiling with gcc-9, g++-9, and ubuntu 16.04. I have fixed several isssues prior to this one, but I cannot figure out what I am missing here. Could someone please help me resolve this missing symbol?

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: __cpu_model


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.```







Solution

  • I figured out the problem.

    The reason this happens is because I am building tensorflow in one container, getting the wheel file and installing tensorflow in a different container.

    Unless all the associated libraries to tensorflow are built the same way, i.e. including the right symbols AND versions of the symbols/libraries, in both the container tensorflow is built and the container where it will be used problems like these will happen. I built python and numpy and pandas in my other container, along with other libraries. After I built these libraries from source, of course with the same TAG version and with the same compilers flags and packages installed on the system, in the tensorflow container as well all my issues went away and tensorflow works fine.

    Curious thing...tensorflow used to take 80+ minutes to build, after compiling python and a few other things it now takes about 35 minutes to build. Pretty sweet.