I have a dataframe like this
df_a = cudf.DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['values'] = [1,2,np.nan,3,np.nan]
and I would like to replace all 2s with np.nan
usually in pandas dataframe I would use
df_a[df_a==2]=np.nan
but in cudf dataframe I get cannot broadcast <class 'int'>
when I use
df_a[df_a['values']==2] =np.nan
I cannot make sense of the result
using
df_a.replace(2, np.NaN)
gives me cannot convert float NaN to integer
The original dataframe is very large so I want to avoid loops and it may contain different datatypes, meaning '2's coul also be floats
I can't find a good reference for this, but using None
instead of np.nan
seems to do the trick:
from cudf import DataFrame
from numpy import nan
df_a = DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['values'] = [1,2, nan,3,nan]
print(df_a)
# key values
# 0 0 1
# 1 1 2
# 2 2 <NA>
# 3 3 3
# 4 4 <NA>
# mask all 2's (in key and value)
mask = df_a==2
df_a[mask] = None
print(df_a)
# key values
# 0 0 1
# 1 1 <NA>
# 2 <NA> <NA>
# 3 3 3
# 4 4 <NA>