pythonrrpy2

NA_character_ not identidied as NaN after importing it into Python with rpy2


I am using the following code inside a R magic cell:

%%R -o df

library(tibble)

df <- tibble(x = c("a", "b", NA))

However, when I run in another cell (a Python one):

df.isna()

I get

       x
1  False
2  False
3  False

In fact, the imported dataframe is

               x
1              a
2              b
3  NA_character_

How can I convert NA_character_ to a Python NaN?

I have tried

df.replace('NA_character_', np.nan)

but with no success.


Solution

  • As you set out in the comments, the R NA_character_ value is not converted to np.nan, but has a different type, rpy2.rinterface_lib.sexp.NACharacterType. In this case, the solution is simply to iterate over the column and convert this type to np.nan:

    import rpy2 # if you haven't already
    df['x'] = df['x'].apply(lambda val: np.nan if isinstance(
        val, rpy2.rinterface_lib.sexp.NACharacterType) 
        else val
    )
    

    As for whether this is a bug, the changes for release 3.3.0 states:

    The value nan in pandas Series with strings is now converted to R NA (issue #668).

    However, the converse does not appear to happen. I don't know whether that means it's a bug, a design decision or simply that this has not yet been implemented.