pandasdataframeconditional-operator

pandas dataframe select rows with ternary operator


My requirement is simple, I need to select the rows from a Pandas DataFrame when one of two couple columns are populated or both. The attributes contain integer foreign keys. This works:

df_ofComment   = df_obsFinding.loc[ (None if   df_obsFinding['Comments'] is None  else df_obsFinding['Comments'].apply(len) != 0 )     ]

gives me the rows from df_obsFindings where there is data in Comments. Good

This fails:

df_ofComment   = df_obsFinding.loc[ (None if   df_obsFinding['Rejection_Comments'] is None  else df_obsFinding['Rejection_Comments'].apply(len) != 0 ) 

Tosses this error:

 TypeError: object of type 'NoneType' has no len()

I believe the data in 'Rejection_Comments' is dirtyer than 'Comments'

Under debug in the Comments col I see: [], [1234] , [1234], [ 456] etc.... Looks to me like lists and empty lists.

Under debug in Rejection_Comments I see None and Empty Boxes.
Silly me I thought checking for None would handle this.

In the end I was looking for a statement like this:

df_ofComment   = df_obsFinding.loc[ (None if   df_obsFinding['Comments'] is None  else df_obsFinding['Comments'].apply(len) != 0 )    | 
                                    ( None if   df_obsFinding['Rejection_Comments'] is None  else   df_obsFinding['Rejection_Comments'].apply(len)!= 0 )  ]

Maybe I am not going about this in a "Python" way

Many thanks for your attention to this matter.

With kind regard.

KD


Solution

  • Why the Error Occurs?

    The issue occurs because the original approach uses:

    None
    if df_obsFinding["Rejection_Comments"] is None
    else df_obsFinding["Rejection_Comments"].apply(len) != 0
    

    However, the condition df_obsFinding['Rejection_Comments'] is None does not check each row individually. Instead, it evaluates wherther the entier column object is None, which will never be the case. As a result, the code proceeds to the else part and calls .apply(len). This iterates over the entier column, and when it encounters None values, it results in:

    TypeError: object of type 'NoneType' has no len()
    

    Correct Approach

    To fix this, we must check each element in the column individually using apply(lambda x: isinstance(x, list) and len(x) != 0

    Solution
    df_ofComment = df_obsFinding.loc[
            (
                df_obsFinding["Comments"].apply(
                    lambda x: isinstance(x, list) and len(x) != 0
                )
            )
            | (
                df_obsFinding["Rejection_Comments"].apply(
                    lambda x: isinstance(x, list) and len(x) != 0
                )
            )
        ]
    
    How this works?

    isinstance(x, list) ensures x is a list before calling len(x). This is avoiding the errors from None values.
    len(x) != 0 filters out empty lists.
    ✅ The logical OR (|) selects rows where either Comments or Rejection_Comments contain a non-empty list.


    Handling Other Data Types (e.g., Strings)

    If Comments or Rejection_Comments might contain strings, we should also check for str:

    df_ofComment = df_obsFinding.loc[
            (
                df_obsFinding["Comments"].apply(
                    lambda x: isinstance(x, (list, str)) and len(x) != 0
                )
            )
            | (
                df_obsFinding["Rejection_Comments"].apply(
                    lambda x: isinstance(x, (list, str)) and len(x) != 0
                )
            )
        ]
    

    Note: This ensures the solution works even if Comments or Rejection_Comments contain strings instead of lists.

    Example

    Input DataFrame

    
    import pandas as pd
    
    df_obsFinding = pd.DataFrame(
            data={
                "Post_Name": [
                    "First Post",
                    "Second Post",
                    "Third Post",
                    "Fourth Post",
                    "Fifth Post",
                ],
                "Comments": [[], [1234], [1234], [], []],
                "Rejection_Comments": [None, [], [657], "Needs Review", [987]],
            }
        )
    

    Data Preview

    Post_Name Comments Rejection_Comments
    First Post [] None
    Second Post [1234] []
    Third Post [1234] [657]
    Fourth Post [] Needs Review
    Fifth Post [] [987]

    Filtered DataFrame (df_ofComment)

    Post_Name Comments Rejection_Comments
    Second Post [1234] []
    Third Post [1234] [657]
    Fourth Post [] Needs Review
    Fifth Post [] [987]