I have a code base that loads a huge binary data file in memory and, without additional allocations, map it to proper data types (u32, f64, f32...). The problem is that this file is packed, and a recent change in rust 1.78 broke this.
I know that both C++ and rust declare accessing unaligned data an "undefined" behaviour, but, given the fact I'm using unsafe
, why should I not be able to do it?
fn main() {
let arr: &[u8] = &[100, 110, 120, 130, 140, 150, 160, 170, 180, 190];
let f1 = unsafe { std::slice::from_raw_parts(arr.as_ptr() as *const f32, 2) };
let f2 = unsafe { std::slice::from_raw_parts(arr[1..].as_ptr() as *const f32, 2) };
println!("F1: {}, {} F2: {}, {}", f1[0], f1[1], f2[0], f2[1]);
}
int main() {
uint8_t arr[] = {100, 110, 120, 130, 140, 150, 160, 170, 180, 190};
float *f1 = (float*)arr;
float *f2 = (float*)&arr[1];
std::cerr << "f1:" << std::dec << f1[0] << ", " << f1[1] << " f2:" << f2[0] << ", " << f2[1] << std::endl;
}
The results are the same and the code works properly both on arm64
and x86_64
IF I compile it with any version of GCC / CLANG (c++) or rust (... up to 1.77).
If I update rust to 1.78 the code fails with the error:
thread 'main' panicked at library/core/src/panicking.rs:156:5:
unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
Program terminated with signal: SIGSEGV
There is a zero-copy way to avoid this? The source file(s) are up to 1GB of size and having to copy the data in the arrays is not an option since they should be parsed as fast as possible... it's really annoying to have a rust version broke something that worked (even in an undefined behaviour scenario, that is anyway under control with unit tests in all the supported archs).
unsafe
is not a free ticket to cause UB. It only means you are responsible to not cause UB instead of the compiler.
To be clear, even before 1.77.0 you were not allowed to do this. It is just that up to this version the compiler, within its permission and as a possible outcome of the UB you invoked, compiled the program successfully (and maybe it did what you wanted, or maybe not, or maybe it depended on the environment variables and the position of you in the universe), and since 1.77.0 it does another possible outcome of UB, that is panicking at runtime. Any other outcome is also possible, including formatting your hard drive.
References cannot be unaligned. This is a rule you cannot break, ever. If you need unaligned data, you will have to stay in raw pointers land and use read_unaligned()
to read the bytes. You don't have to copy, you can just move the pointer around and only read when you need the values.
An alternative will be to use a #[repr(C, packed)]
struct representing your data format, possibly with a library such as zerocopy
.