node.jsnodejs-polars

How to directly filter an unpivoted LazyDataFrame without having to collect?


I would like to filter an unpivoted LazyDataFrame by using nodejs-polars without having to collect (and lose the LazyDataFrame) in between.

Consider the following example csv's

1.csv:

asset_key
abc
asset_key
abc

2.csv:

id;asset_key_1;asset_key_2;asset_key_3
1;123;456;abc
id asset_key_1 asset_key_2 asset_key_3
1 123 456 abc

I would first like to unpivot 2.csv, to have all asset_keys available in a new column. Then, I want to filter that column on the value available in 1.csv ("abc"), such that the remaining result after filtering would be:

id variable value
1 asset_key_3 abc

Instead, I am getting an error

"Error: Not found: value"

If I would collect the LazyDataFrame into a DataFrame after melting and before filtering, it does work. But I would like to know if there is a way to do this without having to give up the LazyDataFrame.

This is the code I use:

import * as pl from 'nodejs-polars';

const df_1: LazyDataFrame = pl.scanCSV('1.csv', { sep: ';' });
const df_2: LazyDataFrame = pl.scanCSV('2.csv', { sep: ';' });

const isInFilter: LazyDataFrame = df_1.select('asset_key');

const df: DataFrame = await df_2
  .melt('id', ['asset_key_1', 'asset_key_2', 'asset_key_3'])
  .dropNulls()
  .filter(pl.col('value').isIn(isInFilter['asset_key']))
  .collect();  

Solution

  • this does look like a bug not only in nodejs-polars, but in polars as well. I opened up an issue for you! https://github.com/pola-rs/polars/issues/4368