pythonarraysnumpy

How to create a numpy dtype object array from a python list without copying data?


As the numpy docs describe for the object dtype, arrays created with the object dtype are simply references to an underlying data store like a python list. The tobytes() method on such an object returns pointers to this data store.

I was wondering if it's possible to create an ndarray object from a python list without creating a copy on creation.

For example, trying to create an ndarray from a list then assigning copy=False to np.asarray raises an exception:

import numpy as np

l = ['spam', 'eggs']
arr = np.asarray(l, dtype='object', copy=False) # raises ValueError

I don't know how numpy is storing the underlying data, but it seems like it should be very similar (if not identical) to a python list.


Solution

  • Make a list of strings:

    In [1]: import numpy as np
    In [2]: alist = ['one', 'two', 'three']
    

    And an array from that:

    In [3]: arr = np.asarray(alist); arr
    Out[3]: array(['one', 'two', 'three'], dtype='<U5')
    

    Without dtype, it is a numpy string dtype (occupying 3*5*4=60 bytes).

    But with object dtype:

    In [4]: arr = np.asarray(alist, dtype=object); arr
    Out[4]: array(['one', 'two', 'three'], dtype=object)
    

    This is a shallow copy; the 3rd element is the same as the 3rd element of list - a python string:

    In [5]: id(alist[2])
    Out[5]: 2008637825040
    
    In [6]: id(arr[2])
    Out[6]: 2008637825040
    

    If the list contains a mutable object, such as a list of strings:

    In [7]: blist = ['one', 'two', 'three', ['a','b']]; barr=np.array(blist,object)
    
    In [8]: blist[3]
    Out[8]: ['a', 'b']
    
    In [9]: barr[3]
    Out[9]: ['a', 'b']
    

    modifying that object in one, modifies it in the other:

    In [10]: barr[3].append('c');barr
    Out[10]: array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object)
    
    In [11]: blist
    Out[11]: ['one', 'two', 'three', ['a', 'b', 'c']]
    

    But replacing a element of the list with a new value, does not change the array.

    In [13]: blist[1]=12.3; blist, barr
    Out[13]: 
    (['one', 12.3, 'three', ['a', 'b', 'c']],
     array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object))
    

    In many ways an object dtype array is like a list, e.g. alist.copy(). But methods are different. The list can append, the array can reshape etc. In general you don't gain much by making an object dtype array. Some operations may be simpler to write, but rarely are they faster.

    ps

    The `copy=False' error message:

    In [28]: np.asarray(alist, dtype=object, copy=False)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Cell In[28], line 1
    ----> 1 np.asarray(alist, dtype=object, copy=False)
    
    ValueError: Unable to avoid copy while creating an array as requested.
    If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
    For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
    

    is just telling us that the default copy=None is just as useful. It will copy only if needed. copy=True is more useful, forcing a copy (but it is still a shallow copy). To get a deep copy with object dtype, I think we have to use something like copy.deepcopy - but I haven't fiddled with that in a long time.