I have a two Vec<arrow::record_batch::RecordBatch>
and I want to compare them.
I want to compare their schemas and their contents.
They might be partitioned into batches differently, which should not matter.
Ideally, I want to see the reason for comparison failure, something like assert_eq!()
macro.
I assume this is for unit tests. In that case, you can use assert_batches_eq!
from DataFusion. Example from their documentation:
let col: ArrayRef = Arc::new(Int32Array::from(vec![1, 2]));
let batch = RecordBatch::try_from_iter([("column", col)]).unwrap();
// Expected output is a vec of strings
let expected = vec![
"+--------+",
"| column |",
"+--------+",
"| 1 |",
"| 2 |",
"+--------+",
];
// compare the formatted output of the record batch with the expected output
assert_batches_eq!(expected, &[batch]);
DataFusion is a rather large dependency to add for such a simple thing, though. Fortunately, the implementation does not depend on DataFusion at all, and is small enough that you can copy it into your own code. Below, modified to accept arguments in either order and not require vec!
or &
:
macro_rules! assert_batches_eq {
($left:expr, [$($right:tt)*] $(,)*) => {
let expected_lines: Vec<&str> = [$($right)*].iter().map(|&s| s).collect();
let formatted = ::arrow::util::pretty::pretty_format_batches($left)
.unwrap()
.to_string();
let actual_lines: Vec<&str> = formatted.trim().lines().collect();
assert!(
expected_lines == actual_lines,
"record batches are not equal:\nleft:\n{:#?}\nright:\n{:#?}",
expected_lines,
actual_lines,
);
};
([$($left:tt)*], $right:expr, $(,)*) => {
assert_batches_eq!($right, [$($left)*])
};
}
I filed a feature request to have this added to the arrow
library as well.