pythonpandasdataframeindexingwarnings

Boolean Series key will be reindexed to match DataFrame index


Here is how I encountered the warning:

df.loc[a_list][df.a_col.isnull()]

The type of a_list is Int64Index, it contains a list of row indexes. All of these row indexes belong to df.

The df.a_col.isnull() part is a condition I need for filtering.

If I execute the following commands individually, I do not get any warnings:

df.loc[a_list]
df[df.a_col.isnull()]

But if I put them together df.loc[a_list][df.a_col.isnull()], I get the warning message (but I can see the result):

Boolean Series key will be reindexed to match DataFrame index

What is the meaning of this warning message? Does it affect the result that it returned?


Solution

  • Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.

    Solution 1, make the selection of indices in a_list a boolean mask:

    df[df.index.isin(a_list) & df.a_col.isnull()]
    

    Solution 2, do it in two steps:

    df2 = df.loc[a_list]
    df2[df2.a_col.isnull()]
    

    Solution 3, if you want a one-liner, use a trick found here:

    df.loc[a_list].query('a_col != a_col')
    

    The warning comes from the fact that the boolean vector df.a_col.isnull() is the length of df, while df.loc[a_list] is of the length of a_list, i.e. shorter. Therefore, some indices in df.a_col.isnull() are not in df.loc[a_list].

    What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull() the values corresponding to the indices in a_list. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.