c++pytorchsegmentation-faultfile-readlibtorch

C++ Load bin file to tensor SegFault


I have a tenor in pytorch and I am trying to port it to c++ libtorch. I made an isolated example to demonstrate the problem.

The python code to export the tensor

# Generate a range of values from 0 to 1000000
values = torch.arange(1000000, dtype=torch.float32)

# Reshape the values into a 1000x1000 tensor
tensor = values.reshape(1000, 1000)


def export_to_binary(tensor, file_path):
    # Convert tensor to NumPy array
    arr = np.array(tensor)
    # Write array to binary file
    with open(file_path, 'wb') as f:
        arr.tofile(f)


export_to_binary(tensor, 'tensor.bin')

In C++ I have Foo class with bar_ and baz_ private members.

foo.h

#ifndef FOO_H
#define FOO_H

#include <torch/torch.h>

class Foo
{
public:
  Foo();

private:
  torch::Tensor bar_;
  torch::Tensor baz_;
};

#endif // FOO_H

In the definition of the constructor I try to load the content the tensor.bin file, and populate _baz from it.

foo.cc

#define MATRIX_SIZE 1000

torch::Tensor LoadFromBinary(const std::string &file_path)
{
  // Open binary file
  std::ifstream file(file_path, std::ios::binary);
  if (!file)
  {
    throw std::runtime_error("Failed to open file: " + file_path);
  }

  // Determine file size
  file.seekg(0, std::ios::end);
  std::streampos file_size = file.tellg();
  file.seekg(0, std::ios::beg);

  // Check if file size matches the expected tensor size
  const std::size_t expected_size = MATRIX_SIZE * MATRIX_SIZE * sizeof(float);
  if (file_size != static_cast<std::streampos>(expected_size))
  {
    throw std::runtime_error("File size mismatch: " + file_path);
  }

  // Read file contents into vector
  std::vector<float> data(MATRIX_SIZE * MATRIX_SIZE);
  file.read(reinterpret_cast<char *>(data.data()), expected_size);

  // Convert vector to tensor
  torch::Tensor tensor = torch::from_blob(data.data(), {MATRIX_SIZE, MATRIX_SIZE});

  return tensor;
}

Foo::Foo()
{

  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});

  baz_ = LoadFromBinary("./tensor.bin");

  std::cout << "baz_ " << baz_[1][798] << std::endl; //SegFault
}

I run it through a simple gtest (Just Foo foo;) but it gives "Exception: SegFault". However I found an interesting thing: If I load to bar_ the same bin file, before loading to baz_, then I can access baz_, but only baz_.

Foo::Foo()

  bar_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});
  baz_ = torch::zeros({MATRIX_SIZE, MATRIX_SIZE});

  bar_ = LoadFromBinary("./tensor.bin");
  baz_ = LoadFromBinary("./tensor.bin");

  std::cout << "baz_ " << baz_[1][798] << std::endl;

baz_ gives back the correct values, but accessing bar_ is not possible, gives SegFault.

enter image description here

If I change the order, the same happens. It looks like at least 2 loads are necessary an alway the 2nd member is accessible.


Solution

  • There is an issue with your use of torch::from_blob : this function creates a tensor which does not have ownership of the underlying data. In your example, as soon as you exit the scope of the LoadFromBinary function, the vector<float> is deleted along with the data it contains and thus your tensor points toward unallocated memory. The weird case where it works for bar and not for baz seems like undefined behavior. The fix should be to return tensor.clone() I think