pythonpandasdataframe

Create "Pass/Fail/Caution" status in pandas dataframe based another columns without "SettingWithCopy" warning


I wrote code where:

  1. All values above 3 are marked as "Fail".
  2. Values between 1 and 3 marked as "Caution".

However I have the following warning: A value is trying to be set on a copy of a slice from a DataFrame. And I'm not sure how I could avoid this warning.

Current code:

import pandas as pd
values = range(6)
df = pd.DataFrame({"Values":values, "Caution limit": [1]*len(values), "Fail limit": [3]*len(values)})
df["Status"] = "Pass"
df["Status"][df["Caution limit"] < df["Values"]] = "Caution"
df["Status"][df["Fail limit"] < df["Values"]] = "Fail"

Current Output:

Values Caution limit Fail limit Status
0 1 3 Pass
1 1 3 Pass
2 1 3 Caution
3 1 3 Caution
4 1 3 Fail
5 1 3 Fail

Warning message:

C:\.....py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Status"][df["Caution limit"] < df["Values"]] = "Caution"
C:\.....py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Status"][df["Fail limit"] < df["Values"]] = "Fail"

UPD: I found a way avoiding the SettingWithCopyWarning using lambda function:

update_status = lambda value, caution, failed: ["Fail" if f<v else "Caution" if c<v else "Pass" for v,c,f in zip(value, caution, failed)]
df["Status"] = update_status(df["Values"],df["Caution limit"],df["Fail limit"])

However it completely avoids using pandas capability and my main goal is to learn how I could use pandas to do so.


Solution

  • You're modifying a slice of the original dataframe, and not a copy of the data, which may be unclear in some situations. It's not an error in the Pandas version you're using, so you just get a warning. If you don't upgrade Pandas, you shouldn't have a problem with your current code.

    In more recent versions of Pandas 2, there is a new warning referring to a future change in Pandas 3, when the code you're using will actually be modifying a copy of the original data -- and your code will not work anymore. Read more here.

    To avoid the warning and possible future errors, you can modify the dataframe in place with loc:

    df.loc[df["Caution limit"] < df["Values"], "Status"] = "Caution"
    df.loc[df["Fail limit"] < df["Values"], "Status"] = "Fail"
    

    Or using mask as an alternative:

    df["Status"] = df["Status"].mask(df["Caution limit"] < df["Values"], "Caution")
    df["Status"] = df["Status"].mask(df["Fail limit"] < df["Values"], "Fail")
    

    Both will output the following with no warnings:

       Values  Caution limit  Fail limit   Status
    0       0              1           3     Pass
    1       1              1           3     Pass
    2       2              1           3  Caution
    3       3              1           3  Caution
    4       4              1           3     Fail
    5       5              1           3     Fail