rustrust-polars

How to use Polars to read specific columns from a CSV file


I have a very large file generated by other tools, but I don't need all the information, only a few columns of information are enough. When I use Python pandas to read, I can specify the required columns, but I don't know how Rust implements it.

Thanks.

I hope Rust can achieve the same functionality as Python pandas.

data = pd.read_csv(file, sep='\t', header=None, usecols=[0,1,5])

Solution

  • I am assuming that you want to use rust-polars. This is how you could achieve the same using rust.

    I have also added the comment to understand what's going to with each steps.

    use polars::prelude::*;
    
    fn main() {
        let path = "<PATH_TO_THE_FILE>";
        // If there are no headers, polars automatically choose "column_1, column_2 etc"
        let columns_to_select = ["column_1".into(), "column_2".into()];
    
        let df = CsvReadOptions::default()
            .with_has_header(false) // equivalent to `header=None` in pandas 
            .map_parse_options(|parse_options| parse_options.with_separator(b'\t')) // use custom separator. equivalent to `sep=\t` in pandas 
            .with_columns(Some(Arc::new(columns_to_select))) // select the columns. equivalent to `usecols=[1, 2]` in pandas
            .try_into_reader_with_file_path(Some(path.into())) // specify the file path
            .unwrap()
            .finish()
            .unwrap();
        println!("{:?}", df);
    }
    

    Or use with_projection method if you want to select the columns based on index. For example, .with_projection(Some(Arc::new(vec![0, 1]))) will select the first and second column.