windowswinapirustcom-interfacewindows-rs

Convert bytes from "OpenSavePidlMRU" into "ITEMIDLIST"


I want to extract a path from the following registry key:

HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\OpenSavePidlMRU

I didn't find any documentation, but it seems that the value contains some IDLIST that can be converted with the GetPathFromIDList() function that accepts an ITEMIDLIST and a flag. If I'm right, and trust me I'm not very confident, there is a way to convert bytes into a ITEMIDLIST, but which one?

This question comes from an answer to my earlier question "Binding a Pidl with function BindToObject?". I have read the documentation from this question: How to retrieve the last folder used by OpenFileDialog?.


Solution

  • Disclaimer: What the question is asking for is inherently unsafe. Indeed, it's so incredibly unsafe that you might as well just write it in C++ instead. This has the benefit that no one is going to assume—by accident—that the code were safe. Or sound. Or even correct.

    ⚠️The following is provided for educational purposes only. Do not use this code. Anywhere. Ever.⚠️


    Addressing the literal question on how to convert a sequence of bytes into an ITEMIDLIST first: "Conversion" is not required, you just need to reinterpret the data. Assuming that you hold the data in a Vec<u8> you can get a pointer to its first element, and cast it to a pointer to the desired target type:

    let pidl = buffer.as_ptr() as *const ITEMIDLIST;
    

    While getting a pointer and reinterpreting it is safe, handing it off to an API call which will eventually dereference it requires a bit more thoughtfulness. If we were to pass this pointer and a buffer size, we could relax and lean back, knowing that the system can arrange to prevent out-of-bounds memory accesses.

    But we will be passing just a pointer, and the system relies on the pointed to data to be formatted in a particular way, so that it knows when to stop reading.

    <detour>

    Item ID lists are encoded similar to C-style strings: A pointer to the first element, with an—otherwise unused—value set aside to act as a terminator. A client receiving either one will continue reading elements until it discovers a value that matches the terminator, marking the end of the sequence.

    A meaningful difference is that, while the individual elements in a C-style string are of fixed size, they are variably sized in case of an item ID list, where the length of each element is encoded in the element itself.

    For an ITEMIDLIST the element is of type SHITEMID, with a rough transliteration into C99 looking something like this:

    struct SHITEMID {
      uint16_t      cb;
      unsigned char abID[];
    };
    

    In other words: A two-byte size prefix followed by a binary blob of arbitrary length. This encoding allows clients to determine element boundaries without having to know what the binary data means. Since cb includes the two bytes of the size prefix, any value less than 2 is invalid, and an SHITEMID with value 0x0000 is used as the terminator.

    </detour>

    The crucial point is that unlike C-style strings, item ID lists have internal structure. Verifying the integrity of the internal structure requires knowing the overall size of the data, so we cannot delegate this to the system that only receives a pointer.

    Since we cannot trust the data, the following function ensures that the segments can be iterated, so that the system can determine whether it is garbage, without risking to go out of bounds in doing so:

    /// Checks to see if `data` meets the structural requirements of an ID list
    ///
    /// This function does not attempt to interpret the binary data in each segment of the
    /// list. A return value of `true` does not imply that `data` refers to a *real* item ID
    /// list.
    ///
    fn has_pidl_structure(data: impl AsRef<[u8]>) -> bool {
        const TERMINATOR: [u8; 2] = [0, 0];
    
        let mut remainder = data.as_ref();
        while remainder != TERMINATOR {
            // `remainder` must be long enough to hold a `TERMINATOR`
            if remainder.len() < TERMINATOR.len() {
                return false;
            }
    
            // Read the `SHITEMID::cb` field
            let size_ptr = remainder.as_ptr() as *const u16;
            let size = unsafe { ptr::read_unaligned(size_ptr) } as usize;
    
            // `size` includes the size prefix, so it cannot be smaller than 2
            if size < mem::size_of::<u16>() {
                return false;
            }
    
            // `size` must not be larger than the remaining buffer size
            if size >= remainder.len() {
                return false;
            }
    
            remainder = &remainder[size..];
        }
        true
    }
    

    This doesn't make the data any more trustworthy. It's still straight out of an arbitrary, untrusted data store. But at least the system can trust the structure behind the pointer.

    That said, any solution is better than this.


    Full sample code below.

    Cargo.toml:

    [package]
    name = "mru_pidl"
    version = "0.0.0"
    edition = "2021"
    
    [dependencies.windows]
    version = "0.46.0"
    features = [
        "Win32_Foundation",
        "Win32_System_Com",
        "Win32_System_Registry",
        "Win32_UI_Shell",
        "Win32_UI_Shell_Common",
    ]
    

    src/main.rs:

    use std::{mem, ptr, vec};
    
    use windows::{
        core::Result,
        w,
        Win32::{
            Foundation::E_UNEXPECTED,
            System::{
                Com::{CoInitialize, CoTaskMemFree},
                Registry::{
                    RegOpenKeyExW, RegQueryValueExW, HKEY, HKEY_CURRENT_USER, KEY_READ, REG_BINARY,
                    REG_VALUE_TYPE,
                },
            },
            UI::Shell::{Common::ITEMIDLIST, IShellItem, SHCreateItemFromIDList, SIGDN_FILESYSPATH},
        },
    };
    
    fn main() -> Result<()> {
        // Open the registry key (arbitrarily picking the `*` subkey)
        let mut key = mem::MaybeUninit::<HKEY>::uninit();
        unsafe {
            RegOpenKeyExW(
                HKEY_CURRENT_USER,
                w!(r#"SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\OpenSavePidlMRU\*"#),
                Default::default(),
                KEY_READ,
                key.as_mut_ptr(),
            )
        }
        .ok()?;
        let key = unsafe { key.assume_init() };
    
        // Query for size and type of the `0` value
        let mut value_type = REG_VALUE_TYPE::default();
        let mut value_size = 0u32;
        unsafe {
            RegQueryValueExW(
                key,
                w!("0"),
                None,
                Some(&mut value_type),
                None,
                Some(&mut value_size),
            )
        }
        .ok()?;
        if value_size == 0 || value_type != REG_BINARY {
            return Err(E_UNEXPECTED.into());
        }
    
        let mut buffer = vec::Vec::<u8>::with_capacity(value_size as usize);
        unsafe {
            RegQueryValueExW(
                key,
                w!("0"),
                None,
                None,
                Some(buffer.as_mut_ptr()),
                Some(&mut value_size),
            )
        }
        .ok()?;
        // Adjust the buffer's valid range
        unsafe { buffer.set_len(value_size as usize) };
    
        if !has_pidl_structure(&buffer) {
            return Err(E_UNEXPECTED.into());
        }
    
        // Make sure COM is initialized for this thread
        unsafe { CoInitialize(None) }?;
    
        // "Blindly" cast the pointer into an opaque buffer (keeping fingers crossed)
        let pidl = buffer.as_ptr() as *const ITEMIDLIST;
        // ... and retrieve an `IShellItem` representing the item ID list
        let item: IShellItem = unsafe { SHCreateItemFromIDList(pidl) }?;
    
        // For debugging, let's get the display name
        let display_name_ptr = unsafe { item.GetDisplayName(SIGDN_FILESYSPATH) }?;
        let display_name = unsafe { display_name_ptr.to_string() }?;
        unsafe { CoTaskMemFree(Some(display_name_ptr.0 as *const _)) };
        println!("{:?}", &display_name);
    
        Ok(())
    }
    
    /// Checks to see if `data` meets the structural requirements of an ID list
    ///
    /// This function does not attempt to interpret the binary data in each segment of the
    /// list. A return value of `true` does not imply that `data` refers to a *real* item ID
    /// list.
    ///
    fn has_pidl_structure(data: impl AsRef<[u8]>) -> bool {
        const TERMINATOR: [u8; 2] = [0, 0];
    
        let mut remainder = data.as_ref();
        while remainder != TERMINATOR {
            // `remainder` must be long enough to hold a `TERMINATOR`
            if remainder.len() < TERMINATOR.len() {
                return false;
            }
    
            // Read the `SHITEMID::cb` field
            let size_ptr = remainder.as_ptr() as *const u16;
            let size = unsafe { ptr::read_unaligned(size_ptr) } as usize;
    
            // `size` includes the size prefix, so it cannot be smaller than 2
            if size < mem::size_of::<u16>() {
                return false;
            }
    
            // `size` must not be larger than the remaining buffer size
            if size >= remainder.len() {
                return false;
            }
    
            remainder = &remainder[size..];
        }
        true
    }