My requirement is simple, I need to select the rows from a Pandas DataFrame when one of two couple columns are populated or both. The attributes contain integer foreign keys. This works:
df_ofComment = df_obsFinding.loc[ (None if df_obsFinding['Comments'] is None else df_obsFinding['Comments'].apply(len) != 0 ) ]
gives me the rows from df_obsFindings where there is data in Comments. Good
This fails:
df_ofComment = df_obsFinding.loc[ (None if df_obsFinding['Rejection_Comments'] is None else df_obsFinding['Rejection_Comments'].apply(len) != 0 )
Tosses this error:
TypeError: object of type 'NoneType' has no len()
I believe the data in 'Rejection_Comments' is dirtyer than 'Comments'
Under debug in the Comments col I see: [], [1234] , [1234], [ 456] etc.... Looks to me like lists and empty lists.
Under debug in Rejection_Comments I see None and Empty Boxes.
Silly me I thought checking for None would handle this.
In the end I was looking for a statement like this:
df_ofComment = df_obsFinding.loc[ (None if df_obsFinding['Comments'] is None else df_obsFinding['Comments'].apply(len) != 0 ) |
( None if df_obsFinding['Rejection_Comments'] is None else df_obsFinding['Rejection_Comments'].apply(len)!= 0 ) ]
Maybe I am not going about this in a "Python" way
Many thanks for your attention to this matter.
With kind regard.
KD
The issue occurs because the original approach uses:
None
if df_obsFinding["Rejection_Comments"] is None
else df_obsFinding["Rejection_Comments"].apply(len) != 0
However, the condition df_obsFinding['Rejection_Comments'] is None
does not check each row individually. Instead, it evaluates wherther the entier column object is None
, which will never be the case. As a result, the code proceeds to the else part and calls .apply(len)
. This iterates over the entier column, and when it encounters None
values, it results in:
TypeError: object of type 'NoneType' has no len()
To fix this, we must check each element in the column individually using apply(lambda x: isinstance(x, list) and len(x) != 0
df_ofComment = df_obsFinding.loc[
(
df_obsFinding["Comments"].apply(
lambda x: isinstance(x, list) and len(x) != 0
)
)
| (
df_obsFinding["Rejection_Comments"].apply(
lambda x: isinstance(x, list) and len(x) != 0
)
)
]
✅ isinstance(x, list)
ensures x
is a list before calling len(x)
. This is avoiding the errors from None
values.
✅ len(x) != 0
filters out empty lists.
✅ The logical OR (|
) selects rows where either Comments
or Rejection_Comments
contain a non-empty list.
If Comments
or Rejection_Comments
might contain strings, we should also check for str
:
df_ofComment = df_obsFinding.loc[
(
df_obsFinding["Comments"].apply(
lambda x: isinstance(x, (list, str)) and len(x) != 0
)
)
| (
df_obsFinding["Rejection_Comments"].apply(
lambda x: isinstance(x, (list, str)) and len(x) != 0
)
)
]
Note: This ensures the solution works even if
Comments
orRejection_Comments
contain strings instead of lists.
Input DataFrame
import pandas as pd
df_obsFinding = pd.DataFrame(
data={
"Post_Name": [
"First Post",
"Second Post",
"Third Post",
"Fourth Post",
"Fifth Post",
],
"Comments": [[], [1234], [1234], [], []],
"Rejection_Comments": [None, [], [657], "Needs Review", [987]],
}
)
Data Preview
Post_Name | Comments | Rejection_Comments |
---|---|---|
First Post | [] |
None |
Second Post | [1234] |
[] |
Third Post | [1234] |
[657] |
Fourth Post | [] |
Needs Review |
Fifth Post | [] |
[987] |
Filtered DataFrame (df_ofComment
)
Post_Name | Comments | Rejection_Comments |
---|---|---|
Second Post | [1234] |
[] |
Third Post | [1234] |
[657] |
Fourth Post | [] |
Needs Review |
Fifth Post | [] |
[987] |