String
and Vec<u8>
have the same memory layout, although this is not guaranteed.
String also has an into_bytes
method returning a Vec<u8>
.
Is there a sound way to convert from an Arc<String>
to an Arc<Vec<u8>>
without allocating any memory for a new Arc
? We can assert that the arc refcount is 1. I don't mind using unsafe. I also don't mind asserting that they have the same size / falling back to alloc in that case. Is this possible?
@eggyal's approach is unfortunately fundamentally unsound with Stacked Borrows. Here is a somewhat different approach that is sound (I hope):
use std::mem::{align_of, size_of};
use std::sync::Arc;
// We can check that size and alignment match at compile-time, which saves us from
// performing these checks at runtime.
const _: () = {
assert!(size_of::<String>() == size_of::<Vec<u8>>());
assert!(align_of::<String>() == align_of::<Vec<u8>>());
};
pub fn convert(mut arc: Arc<String>) -> Arc<Vec<u8>> {
Arc::get_mut(&mut arc).unwrap();
let raw_string = Arc::into_raw(arc).cast_mut();
let raw_bytes = raw_string.cast::<Vec<u8>>();
// SAFETY: We don't drop the returned string, panic will just drop the string
// but leak the `Arc` (which is why there won't be a double drop),
// and we just transform the string and write it back.
let string = unsafe { raw_string.read() };
let bytes = string.into_bytes();
// SAFETY:
// - We are the only one pointing to this `Arc`, so no data race can occur.
// - `String` and `Vec<u8>` have the same size and alignment, so the write is in bounds.
// - We converted the `String` to `Vec` using its methods, so we know it is valid.
unsafe { raw_bytes.write(bytes) };
// SAFETY: We transformed the `String` into `Vec<u8>`, so it is now initialized as `Vec<u8>`.
// And we didn't involve reference at all, so we have no aliasing problems.
unsafe { Arc::from_raw(raw_bytes) }
}