pythonpandasnumpydataframestructured-array

Pandas dataframe.to_numpy() with specific dtypes


I have a dataframe with two columns:

  In[] df.head()

  Out[]      specific_death   months_survival
       0         False            179
       1         False            127
       2         False            67
       3         True             111
       4         False            118

The first column has booleans while the second has integers. If I convert the dataframe to a numpy ndarray with:

array_from_df = df.to_numpy()

I get an unstructured numpy.ndarray. Thus if I write:

array_from_df.dtype.fields 

The result is NoneType. For my program to work I need to have a structured array with the first field being a np.bool class and the second field a np.int. The way I see it there are two options but I couldn't find a way to do either:

Option one

Transform directly from a Pandas.DataFrame to a structured numpy.ndarray with the correct dtypes.

Option two

Transform from Pandas.DataFrame to an unstructured numpy.ndarray and then transform that to an structured numpy.ndarray. I found another SO question regarding this problem but I couldn't replicate the answer on my code.


Solution

  • As both comments suggested:

    array_from_df = df.to_records() # index=False to not include an index column
    

    Outputs an a numpy.recarray with the correct data types in:

    array_from_df.dtype.fields