c++serialization pytorch deserialization libtorch

How to export PyTorch model to file (Python) and load it (libtorch C++) using TorchScript?

I am struggling with the (de)serialization of PyTorch data. I would like to save my model to a PT(H) file after training it with PyTorch (using GPU). Next I would like to load that serialized model in C++ context (using libtorch). Currently I am just experimenting with basic export/import functionality to get the hang of it.

The code is provided below. I am getting the following error:

Error loading model
Unrecognized data format
Exception raised from load at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\serialization\import.cpp:449 (most recent call first):
00007FFBB1FFDA2200007FFBB1FFD9C0 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFBB1FFD43E00007FFBB1FFD3F0 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]
00007FFB4B87B54700007FFB4B87B4E0 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]
00007FFB4B87B42A00007FFB4B87B380 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]
00007FF6089A737A00007FF6089A7210 pytroch_load_model.exe!main [c:\users\USER\projects\cmake dx cuda pytorch\cmake_integration_examples\pytorch\src\pytroch_load_model.cpp @ 19]
00007FF6089D8A9400007FF6089D8A60 pytroch_load_model.exe!invoke_main [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 79]
00007FF6089D893E00007FF6089D8810 pytroch_load_model.exe!__scrt_common_main_seh [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
00007FF6089D87FE00007FF6089D87F0 pytroch_load_model.exe!__scrt_common_main [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 331]
00007FF6089D8B2900007FF6089D8B20 pytroch_load_model.exe!mainCRTStartup [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp @ 17]
00007FFBDF8C703400007FFBDF8C7020 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFBDFBA265100007FFBDFBA2630 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]

Here is the code:

Python (PyTorch):

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class TestModel(nn.Module):
    def __init__(self):
        super(TestModel, self).__init__()
        self.x = 2

    def forward(self):
        return self.x

test_net = torch.jit.script(Net())
test_module = torch.jit.script(TestModel())
torch.jit.save(test_net, 'test_net.pt')
torch.jit.save(test_module, 'test_module.pt')

C++ (libtorch)

#include <torch/script.h>
#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
    if (argc != 2) {
        std::cerr << "usage: example-app <path-to-exported-script-module>\n";
        return -1;
    }

    torch::jit::script::Module module;
    try {
        std::cout << "Trying to load model..." << std::endl;
        // Deserialize the ScriptModule from a file using torch::jit::load().
        module = torch::jit::load(argv[1]);
    }
    catch (const c10::Error& e) {
        std::cerr << "Loading failed" << std::endl;
        std::cerr << e.what() << std::endl;
        return -1;
    }

    std::cout << "Loading successful" << std::endl;
}

I am using the shared distribution of libtorch 1.12.1. I have tried with both the GPU and CPU version (release, not debug builds) on Windows 10. The TestModel is even taken straight from the Torch JIT documentation...

CMakeLists.txt

cmake_minimum_required (VERSION 3.12 FATAL_ERROR)

project(pytroch
  DESCRIPTION "CMake example for PyTorch (libtorch C++) integration"
  LANGUAGES CXX
)

set(CMAKE_CXX_STANDARD 14)

set(SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src")
set(CMAKE_PREFIX_PATH "${CMAKE_SOURCE_DIR}/deps/libtorch/1.12.1/release/cpu/share/cmake/Torch")
find_package(Torch REQUIRED)
if(TORCH_FOUND)
    message(STATUS "Found Torch")
else()
    message(CRITICAL_ERROR "Unable to find Torch")
endif(TORCH_FOUND)

add_executable(pytroch_load_model
    "${SRC_DIR}/pytroch_load_model.cpp"
)
target_include_directories(pytroch_load_model PUBLIC ${TORCH_INCLUDE_DIRS})
target_link_libraries(pytroch_load_model PRIVATE ${TORCH_LIBRARIES})
message("${TORCH_LIBRARIES}")
file(GLOB LIBTORCH_DLLS
  "${CMAKE_SOURCE_DIR}/deps/libtorch/1.12.1/release/cpu/lib/*.dll"
)
file(COPY
    ${LIBTORCH_DLLS}
    DESTINATION "${CMAKE_BINARY_DIR}/bin/"
)

The CMakeLists.txt above is part of a larger project. I am posting it here to demonstrate how I am linking against the libraries required to run my code.

Since the PT file has mostly non-readable characters inside (after all it is serialized) I cannot really check what is going on in there. I can see though that Net as well as cpu are present as words (one can only partially read such a file).

Solution

I have created an issue on PyTorch GitHub page. It appears that one cannot combine a release build of the libtorch library with a debug build of the software that links against it.

The issue is gone once I switch to a release build. I will check with the debug build at some point but currently the code I have that uses libtorch is very tiny so no need for extensive debugging.

I see two problems with this:

The developer is forced to use the huge (especially the CUDA version) debug build of libtorch
The developer may not want to use a debug build especially if they don't want to debug libtorch itself.