c++file-iocasting

Get the binary contents of a file as an uint32_t vector (SPIR-V module)


Within the Vulkan framework, in order to initialize a shader module you need a (pointer style) array of uint32_t (see this SO question).

Most online sources (like Vulkan tutorial) simply read a file as binary data and reinterpret_cast it to an uint32_t array. From what I have read this is not safe to do in general due to alignment issues.

What is the proper way to read the binary data: directly read a file as an uint32_t array or safely convert an existing uint8_t array to an uint32_t array?


Solution

  • Alignment is not an issue because the code uses a full std::vector. That container uses std::allocator for its memory, which in turn calls new[], which always provides storage with std::max_align_t alignment. That, by definition, is sufficient for every scalar type. Nevertheless it is a bit awkward to do those reinterpret-casts outside the file-reading function.

    A potential issue is the endianness of the file vs. that of the host machine. As I understand it from related bug reports, the API expects host-endianness and provides the magic number at the start of the file to check this.

    With that in mind, we can make an improved version.

    #include <spirv.hpp>
    // using spv::MagicNumber
    
    #include <cstdint>
    // using std::uint32_t
    #include <fstream>
    // using std::ifstream
    #include <stdexcept>
    // using std::runtime_error
    #include <vector>
    #include <version>
    // using __cpp_lib_byteswap
    
    #ifdef __cpp_lib_byteswap
    # include <bit>
    // using std::byteswap
    #elif defined(_MSC_VER)
    # include <stdlib.h>
    // using _byteswap_ulong
    #endif
    
    
    static inline std::uint32_t swapEndianness(std::uint32_t word) {
    # ifdef __cpp_lib_byteswap /* C++23 available */
        return std::byteswap(word);
    # elif defined(__GNUC__)
        return __builtin_bswap32(word);
    # elif defined(_MSC_VER)
        return _byteswap_ulong(word);
    # else
        /*
         * insert rant about the late addition of <bit> header
         * and remaining lack of hton / ntoh equivalents here
         */
        return ((word & 0xff) << 24)
             | ((word & 0xff00) << 8)
             | ((word & 0xff0000) >> 8)
             | ((word & 0xff000000) >> 24);
    # endif
    }
    
    std::vector<std::uint32_t> readFile(const std::string& filename) {
        std::ifstream file(filename, std::ios::ate | std::ios::binary);
        if (!file.is_open())
            throw std::runtime_error("failed to open file");
        auto fileSize = file.tellg();
        if (! fileSize) // explicit check required as we access rtrn.front() later
            throw std::runtime_error("file appears empty");
        if (fileSize % sizeof(std::uint32_t))
            throw std::runtime_error("file size not a multiple of word size");
        file.seekg(0);
        std::vector<std::uint32_t> rtrn(fileSize / sizeof(std::uint32_t));
        if (!file.read(reinterpret_cast<char*>(rtrn.data()), fileSize))
            throw std::runtime_error("file reading failed");
        if (rtrn.front() == swapEndianness(spv::MagicNumber))
            for (auto& word: rtrn)
                word = swapEndianness(word);
        else if(rtrn.front() != spv::MagicNumber)
            throw std::runtime_error("unrecognized file format");
        return rtrn;
    }
    

    Make sure to adjust the initialization of the VkShaderModuleCreateInfo accordingly.

    std::vector<std::uint32_t> code = readFile(...);
    VkShaderModuleCreateInfo createInfo{};
    createInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
    createInfo.codeSize = code.size() * sizeof(std::uint32_t);
    createInfo.pCode = code.data();