The /NATVIS
linker option can be used to embed debug visualizers into a PDB.
Given a PDB, is there a way to recover all embedded debug visualizers? I'm looking for a first-party tool (like DUMPBIN), and if that cannot do, a solution based on a first-party API (like DIA).
As noted in a comment, the ability to store .natvis files in a PDB is implemented by reusing the infrastructure for embedding arbitrary source files in a PDB.
The exercise thus comes down to parsing the respective tables in a PDB and filtering relevant entries. Thankfully, the parsing is already done for us by the Debug Interface Access SDK (DIA SDK) that ships with Visual Studio1. What's left is navigating the reference documentation to discover applicable building blocks.
The following steps solve the problem statement:
IDiaDataSource
interfaceIDiaSession
IDiaEnumInjectedSources
tableIDiaInjectedSource
row and extract relevant dataThe DIA SDK ships with Visual Studio. Technically, that makes it a 3rd-party library, and the natural ordeal of setting things up is due. I covered the prerequisite steps here:
With the proposed changes applied, the following program should successfully compile and link:
#include "dia2.h"
#pragma comment(lib, "diaguids")
int main() { auto const clsid { CLSID_DiaSource }; }
IDiaDataSource
This should be as simple as following the official example. However, it is not. The following program fails with a REGDB_E_CLASSNOTREG
error code:
#include "dia2.h"
#pragma comment(lib, "diaguids")
#include <objbase.h>
int main()
{
::CoInitialize(nullptr);
IDiaDataSource* pSource;
HRESULT hr = ::CoCreateInstance(CLSID_DiaSource,
nullptr,
CLSCTX_INPROC_SERVER,
IID_IDiaDataSource,
(void**)&pSource);
if (FAILED(hr))
throw hr;
}
There isn't anything inherently wrong with this code. It follows the standard pattern for in-proc COM server activation. The issue is that the COM server isn't registered (on my machine, anyways2). The documentation lists "msdia80.dll" (VS 2005), and things apparently changed between then and "msdia140.dll" (VS 2015+), and what was right once is wrong now.
I didn't spend a whole bunch of time trying to register the COM server or investigating the use of side-by-side assembly manifests, or fooling about with Activation Contexts to trick the COM infrastructure into discovering "msdia140.dll".
Either of the above may well be more correct, though I settled for using an undocumented export of "diaguids.lib" instead:
HRESULT NoRegCoCreate(const wchar_t *dllName,
REFCLSID rclsid,
REFIID riid,
void **ppv);
This looks like a (homebrew) version of registration-free COM, which is good enough for now. The following program successfully executes:
#include "dia2.h"
#include "diacreate.h"
#pragma comment(lib, "diaguids")
#include <objbase.h>
int main()
{
::CoInitialize(nullptr);
IDiaDataSource* pSource;
HRESULT hr = ::NoRegCoCreate(L"msdia140.dll",
CLSID_DiaSource,
IID_IDiaDataSource,
(void**)&pSource);
if (FAILED(hr))
throw hr;
}
This loads "msdia140.dll" from "C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\msdia140.dll" on my system, so there may be additional dependencies which I didn't investigate.
IDiaSession
The IDiaSession
is at the center of the DIA SDK. It is the pivot point for all queries against the symbol store managed by the IDiaDataSource
. Initiating a session is as simple as calling IDiaDataSource::openSession()
:
/// @brief Initiates a DIA session for queries against a PDB.
///
/// @param pdb_file Fully qualified pathname to the PDB file.
///
/// @return Returns an `IDiaSession` smart pointer on success. Errors are
/// reported via C++ exceptions.
///
[[nodiscard]] wil::com_ptr<IDiaSession>
session_from_pdb(fs::path const& pdb_file)
{
wil::com_ptr<IDiaDataSource> source {};
THROW_IF_FAILED(::NoRegCoCreate(L"msdia140.dll", CLSID_DiaSource,
IID_PPV_ARGS(&source)));
THROW_IF_FAILED(source->loadDataFromPdb(pdb_file.c_str()));
wil::com_ptr<IDiaSession> session {};
THROW_IF_FAILED(source->openSession(&session));
return session;
}
The code is using the Windows Implementation Libraries (WIL) for convenient resource management and error handling. A C++17 compiler is required due to the use of the filesystem library.
"Streams" in the PDB file format are represented as "tables" in the DIA SDK. IDiaSession::getEnumTables()
returns an iterator over all tables, where IDiaEnumTables::Next()
returns a generic IDiaTable
interface for each entry. A call to QueryInterface()
allows us to discover the specific table type. We are interested in the IDiaEnumInjectedSource
table specifically so that's what the code is requesting. Since there can be at most one such table3, we can return early once identified:
/// @brief Attempts to find the "injected sources" table.
///
/// @param session The `IDiaSession` to use for the query. The caller must
/// ensure that the pointer is valid for the duration of the
/// call. Ownership remains with the caller.
///
/// @return Returns an `IDiaEnumInjectedSources` interface if found, a
/// `std::nullopt` otherwise. Errors are reported via C++ exceptions.
///
[[nodiscard]] std::optional<wil::com_ptr<IDiaEnumInjectedSources>>
get_source_iterator(IDiaSession* session)
{
THROW_HR_IF_NULL(E_POINTER, session);
wil::com_ptr<IDiaEnumTables> tables {};
THROW_IF_FAILED(session->getEnumTables(&tables));
ULONG celt {};
wil::com_ptr<IDiaTable> table {};
while (tables->Next(1, &table, &celt) == S_OK && celt == 1)
{
// Check whether the table implements the `IDiaEnumInjectedSources`
// interface
auto const pdb_table = table.try_query<IDiaEnumInjectedSources>();
if (pdb_table)
{
// Found the table, so let's stop looking and return it
return pdb_table;
}
}
return {};
}
With an IDiaEnumInjectedSources
iterator, we can reuse the pattern above to discover an IDiaInjectedSource
interface for each entry and extract the relevant information (file name and source code bytes):
/// @brief Stores source file name and corresponding data.
///
struct Source
{
fs::path name;
std::vector<unsigned char> data;
};
/// @brief Extracts source data for each entry in the "injected sources" table.
///
/// @param source_it The iterator over the "injected sources" table. The caller
/// must ensure that the pointer is valid for the duration of
/// the call. Ownership remains with the caller.
///
/// @return Returns a list of `Source` objects on success. Errors are reported
/// via C++ exceptions.
///
[[nodiscard]] std::list<Source> get_sources(IDiaEnumInjectedSources* source_it)
{
THROW_HR_IF_NULL(E_POINTER, source_it);
std::list<Source> src_list {};
ULONG celt {};
wil::com_ptr<IDiaInjectedSource> source {};
while (source_it->Next(1, &source, &celt) == S_OK && celt == 1)
{
ULONGLONG length {};
wil::unique_bstr filename {};
if (source->get_length(&length) == S_OK
&& source->get_filename(&filename) == S_OK)
{
std::vector<unsigned char> data(length, {});
DWORD bytes_written {};
if (source->get_source(static_cast<DWORD>(data.size()),
&bytes_written, data.data())
== S_OK
&& bytes_written == data.size())
{
src_list.emplace_back(
Source { filename.get(), std::move(data) });
}
}
}
return src_list;
}
This is rather straightforward. However, there are a few points worth mentioning:
Injected source files can be compressed. IDiaInjectedSource::get_sourceCompression()
returns a loosely specified value, where 0
means "no compression". Other values are possible but their meaning is specific to the tool responsible for generating the PDB. More work is required if you plan to interpret the source data.
The IDiaInjectedSource
interface also doesn't offer a way to identify the type of source it refers to. I dumped the IDiaPropertyStorage
key/value pairs as well to make sure I wasn't overlooking something, but that didn't turn up anything useful either (at least for my test input). The file name extension thus serves as the only hint.
With everything covered, it would be a waste not to pull it all together into a program. The following compiles to a command line utility that takes a PDB file and an output directory as parameters and dumps all .natvis files found in the PDB:
#include <Windows.h>
#include <combaseapi.h>
#include "dia2.h"
#include "diacreate.h"
#pragma comment(lib, "diaguids.lib")
#include <wil/com.h>
#include <wil/resource.h>
#include <wil/result.h>
#include <cstdlib>
#include <cwchar>
#include <filesystem>
#include <fstream>
#include <list>
#include <optional>
#include <utility>
#include <vector>
namespace fs = std::filesystem;
/// @brief Initiates a DIA session for queries against a PDB.
///
/// @param pdb_file Fully qualified pathname to the PDB file.
///
/// @return Returns an `IDiaSession` smart pointer on success. Errors are
/// reported via C++ exceptions.
///
[[nodiscard]] wil::com_ptr<IDiaSession>
session_from_pdb(fs::path const& pdb_file)
{
wil::com_ptr<IDiaDataSource> source {};
THROW_IF_FAILED(::NoRegCoCreate(L"msdia140.dll", CLSID_DiaSource,
IID_PPV_ARGS(&source)));
THROW_IF_FAILED(source->loadDataFromPdb(pdb_file.c_str()));
wil::com_ptr<IDiaSession> session {};
THROW_IF_FAILED(source->openSession(&session));
return session;
}
/// @brief Attempts to find the "injected sources" table.
///
/// @param session The `IDiaSession` to use for the query. The caller must
/// ensure that the pointer is valid for the duration of the
/// call. Ownership remains with the caller.
///
/// @return Returns an `IDiaEnumInjectedSources` interface if found, a
/// `std::nullopt` otherwise. Errors are reported via C++ exceptions.
///
[[nodiscard]] std::optional<wil::com_ptr<IDiaEnumInjectedSources>>
get_source_iterator(IDiaSession* session)
{
THROW_HR_IF_NULL(E_POINTER, session);
wil::com_ptr<IDiaEnumTables> tables {};
THROW_IF_FAILED(session->getEnumTables(&tables));
ULONG celt {};
wil::com_ptr<IDiaTable> table {};
while (tables->Next(1, &table, &celt) == S_OK && celt == 1)
{
// Check whether the table implements the `IDiaEnumInjectedSources`
// interface
auto const pdb_table = table.try_query<IDiaEnumInjectedSources>();
if (pdb_table)
{
// Found the table, so let's stop looking and return it
return pdb_table;
}
}
return {};
}
/// @brief Stores source file name and corresponding data.
///
struct Source
{
fs::path name;
std::vector<unsigned char> data;
};
/// @brief Extracts source data for each entry in the "injected sources" table.
///
/// @param source_it The iterator over the "injected sources" table. The caller
/// must ensure that the pointer is valid for the duration of
/// the call. Ownership remains with the caller.
///
/// @return Returns a list of `Source` objects on success. Errors are reported
/// via C++ exceptions.
///
[[nodiscard]] std::list<Source> get_sources(IDiaEnumInjectedSources* source_it)
{
THROW_HR_IF_NULL(E_POINTER, source_it);
std::list<Source> src_list {};
ULONG celt {};
wil::com_ptr<IDiaInjectedSource> source {};
while (source_it->Next(1, &source, &celt) == S_OK && celt == 1)
{
ULONGLONG length {};
wil::unique_bstr filename {};
if (source->get_length(&length) == S_OK
&& source->get_filename(&filename) == S_OK)
{
std::vector<unsigned char> data(length, {});
DWORD bytes_written {};
if (source->get_source(static_cast<DWORD>(data.size()),
&bytes_written, data.data())
== S_OK
&& bytes_written == data.size())
{
src_list.emplace_back(
Source { filename.get(), std::move(data) });
}
}
}
return src_list;
}
int wmain(int argc, wchar_t* argv[])
{
if (argc != 3 || fs::path { argv[2] }.has_filename())
{
::wprintf(L"Usage: DumpNatvis <pdb file> <out dir>\n");
return EXIT_FAILURE;
}
// Make sure the output directory exists
fs::path const out_dir { argv[2] };
fs::create_directories(out_dir);
THROW_IF_FAILED(::CoInitialize(nullptr));
auto const session = session_from_pdb(argv[1]);
auto const source_it = get_source_iterator(session.get());
if (source_it)
{
auto const src_list = get_sources(source_it.value().get());
for (auto const& src : src_list)
{
// Filter .natvis data
if (src.name.extension() == L".natvis")
{
auto const path_name = out_dir / src.name.filename();
auto file = std::ofstream(
path_name, std::ofstream::out | std::ofstream::trunc
| std::ofstream::binary);
file.write(reinterpret_cast<char const*>(src.data.data()),
src.data.size());
}
}
}
}
1 It is included with the Desktop development with C++ workload in the Visual Studio Installer.
2 And someone else's machine, too.
3 Based on a comment in the official example code. Hopefully this statement is (still) true.