python-2.7numpyrecarray

numpy recarray append_fields: can't append numpy array of datetimes


I have a recarray containing various fields and I want to append an array of datetime objects on to it.

However, it seems like the append_fields function in numpy.lib.recfunctions won't let me add an array of objects.

Here's some example code:

import numpy as np
import datetime
import numpy.lib.recfunctions as recfun

dtype= np.dtype([('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')])
obs = np.array([(0.1,10.0),(0.2,11.0),(0.3,12.0)], dtype=dtype)

dates = np.array([datetime.datetime(2001,1,1,0),
    datetime.datetime(2001,1,1,0),
    datetime.datetime(2001,1,1,0)])

# This doesn't work:
recfun.append_fields(obs,'obdate',dates,dtypes=np.object)

I keep getting the error TypeError: Cannot change data-type for object array.

It seems to only be an issue with np.object arrays as I can append other fields ok. Am I missing something?


Solution

  • The problem

    In [143]: recfun.append_fields(obs,'test',np.array([None,[],1]))
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-143-5c3de23b09f7> in <module>()
    ----> 1 recfun.append_fields(obs,'test',np.array([None,[],1]))
    
    /usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
        615     if dtypes is None:
        616         data = [np.array(a, copy=False, subok=True) for a in data]
    --> 617         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
        618     else:
        619         if not isinstance(dtypes, (tuple, list)):
    
    /usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in <listcomp>(.0)
        615     if dtypes is None:
        616         data = [np.array(a, copy=False, subok=True) for a in data]
    --> 617         data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
        618     else:
        619         if not isinstance(dtypes, (tuple, list)):
    
    /usr/local/lib/python3.5/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
        363 
        364     if newtype.hasobject or oldtype.hasobject:
    --> 365         raise TypeError("Cannot change data-type for object array.")
        366     return
        367 
    
    TypeError: Cannot change data-type for object array.
    

    So the problem is in this a.view([(name, a.dtype)]) expression. It tries to make a single field structured array from a. That works with dtypes like int and str, but fails with object. That failure is in the core view handling, so isn't likely to change.

    In [148]: x=np.arange(3)
    
    In [149]: x.view([('test', x.dtype)])
    Out[149]: 
    array([(0,), (1,), (2,)], 
          dtype=[('test', '<i4')])
    
    In [150]: x=np.array(['one','two'])
    
    In [151]: x.view([('test', x.dtype)])
    Out[151]: 
    array([('one',), ('two',)], 
          dtype=[('test', '<U3')])
    
    In [152]: x=np.array([[1],[1,2]])
    
    In [153]: x
    Out[153]: array([[1], [1, 2]], dtype=object)
    
    In [154]: x.view([('test', x.dtype)])
    ...
    TypeError: Cannot change data-type for object array.
    

    The fact that recfunctions requires a separate load indicates that it is somewhat of a backwater, that isn't used a lot, and not under active development. I haven't examined the code in detail, but I suspect a fix would be a kludge.

    A fix

    Here's a way of adding a new field from scratch. It performs the same basic actions as append_fields:

    Define a new dtype, using the obs and the new field name and dtype:

    In [158]: obs.dtype.descr
    Out[158]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')]
    
    In [159]: obs.dtype.descr+[('TEST',object)]
    Out[159]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', object)]
    
    In [160]: dt1  =np.dtype(obs.dtype.descr+[('TEST',object)])
    

    Make an empty target array, and fill it by copying data by field name:

    In [161]: newobs = np.empty(obs.shape, dtype=dt1)    
    In [162]: for n in obs.dtype.names:
         ...:     newobs[n]=obs[n]
    
    In [167]: dates
    Out[167]: 
    array([datetime.datetime(2001, 1, 1, 0, 0),
           datetime.datetime(2001, 1, 1, 0, 0),
           datetime.datetime(2001, 1, 1, 0, 0)], dtype=object)
    
    In [168]: newobs['TEST']=dates
    
    In [169]: newobs
    Out[169]: 
    array([( 0.1       ,  10., datetime.datetime(2001, 1, 1, 0, 0)),
           ( 0.2       ,  11., datetime.datetime(2001, 1, 1, 0, 0)),
           ( 0.30000001,  12., datetime.datetime(2001, 1, 1, 0, 0))], 
          dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', 'O')])
    

    datetime64 alternative

    With the native numpy datetimes, append works

    In [179]: dates64 = dates.astype('datetime64[D]')
    
    In [180]: recfun.append_fields(obs,'test',dates64,usemask=False)
    Out[180]: 
    array([( 0.1       ,  10., '2001-01-01'),
           ( 0.2       ,  11., '2001-01-01'), ( 0.30000001,  12., '2001-01-01')], 
          dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('test', '<M8[D]')])
    

    append_fields has some bells-n-whistles that my version doesn't - fill values, masked arrays, recarray, etc.

    structured dates array

    I could create a structured array with the dates

    In [197]: sdates = np.array([(i,) for i in dates],dtype=[('test',object)])
    In [198]: sdates
    Out[198]: 
    array([(datetime.datetime(2001, 1, 1, 0, 0),),
           (datetime.datetime(2001, 1, 1, 0, 0),),
           (datetime.datetime(2001, 1, 1, 0, 0),)], 
          dtype=[('test', 'O')])
    

    There must be a function that merges fields of existing arrays, but I'm not finding it.

    previous work

    This felt familiar:

    https://github.com/numpy/numpy/issues/2346

    TypeError when appending fields to a structured array of size ONE

    Adding datetime field to recarray