pythonpytestpython-polars

How to identify differences in polars dataframe when assert_series_equal / assert_frame_equal fails?


I am using pl.testing.assert_frame_equal to compare two pl.DataFrames. The assertion fails. The traceback indicates that there are exact value mismatches in a certain column.

The column in question is of type bool. It also contains null values. This column has more than 20,000 rows and I need to figure out, where exactly the difference is.

What I did is to create a mask that shows a true value whenever there is a difference between the actual dataframe and the expectation dataframe.

mask = actual != expectation

What I then noticed is that the mask only contains false and null values in every column.

mask.sum().sum_horizontal() gives 0.

That means this is apparently not a good way to identify the rows with differences.

In my large dataframe I expect a situation like the following:

import polars as pl
from polars.testing import assert_frame_equal

df1 = pl.DataFrame(
    {
        "group": ["A", "A", "A", "B", "B"], 
        "value": [True, False, None, False, None]
    }
)
df2 = pl.DataFrame(
    {
        "group": ["A", "A", "A", "B", "B"], 
        "value": [True, False, False, False, None]
    }
)

Performing assert_frame_equal(df1, df2) will correctly result in an AssertionError.

AssertionError: DataFrames are different (value mismatch for column 'value')
[left]:  [True, False, None, False, None]
[right]: [True, False, False, False, None]

The inequality test doesn't help in order to identify where the differences is as there are no true values.

df1 != df2

shape: (5, 2)
┌───────┬───────┐
│ group ┆ value │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ false ┆ false │
│ false ┆ false │
│ false ┆ null  │
│ false ┆ false │
│ false ┆ null  │
└───────┴───────┘

Solution

  • If you look at the implementation:

    It is using .ne_missing() to compare the values.

    df1.select(df1[col].ne_missing(df2[col]) for col in df1.columns)
    
    shape: (5, 2)
    ┌───────┬───────┐
    │ group ┆ value │
    │ ---   ┆ ---   │
    │ bool  ┆ bool  │
    ╞═══════╪═══════╡
    │ false ┆ false │
    │ false ┆ false │
    │ false ┆ true  │
    │ false ┆ false │
    │ false ┆ false │
    └───────┴───────┘
    

    (As well as the schema / dtype validation, etc.)