I have loaded a large arrow2 Chunk with a large set of columns/arrays and before writing it to a parquet file, I would like to order it by a given column. Look at this code:
fn main(){
use arrow2::{array::*, compute::sort};
use arrow2::chunk::Chunk;
let mut col1: Int64Vec = Int64Vec::new();
col1.push(Some(0));
col1.push(Some(5));
col1.push(Some(3));
col1.push(Some(2));
let mut col2: Int64Vec = Int64Vec::new();
col2.push(Some(1));
col2.push(Some(2));
col2.push(Some(3));
col2.push(Some(4));
let mut chu = Chunk::new(vec![col1.into_arc(), col2.into_arc()]);
chu.sort_by_key();
}
Obviously this fails, since it wouldn't know by which column to sort, but I have been unable to use any of the .sort_* functions. I would like to sort 'chu' by the first column.
I have tried to write the index extracting function for the '.sort_by_key' function, but no dice. Also google and geminied about it...
TLDR: use the 'lexsort' function. It is a full fledged version of the simplistic 'sort' function.
At first, one would think this function is related to text ordering (capital vs non-capital, special characters, and such), but not really.
On another note, if you just want to save your columns to a parquet file, as I did, consider using parquet's own column sorting option inside 'WriterProperties'.