How to create a numpy dtype object array from a python list without copying data?

As the numpy docs describe for the object dtype, arrays created with the object dtype are simply references to an underlying data store like a python list. The tobytes() method on such an object returns pointers to this data store.

I was wondering if it's possible to create an ndarray object from a python list without creating a copy on creation.

For example, trying to create an ndarray from a list then assigning copy=False to np.asarray raises an exception:

import numpy as np

l = ['spam', 'eggs']
arr = np.asarray(l, dtype='object', copy=False) # raises ValueError

I don't know how numpy is storing the underlying data, but it seems like it should be very similar (if not identical) to a python list.

Solution

Make a list of strings:

In [1]: import numpy as np
In [2]: alist = ['one', 'two', 'three']

And an array from that:

In [3]: arr = np.asarray(alist); arr
Out[3]: array(['one', 'two', 'three'], dtype='<U5')

Without dtype, it is a numpy string dtype (occupying 3*5*4=60 bytes).

But with object dtype:

In [4]: arr = np.asarray(alist, dtype=object); arr
Out[4]: array(['one', 'two', 'three'], dtype=object)

This is a shallow copy; the 3rd element is the same as the 3rd element of list - a python string:

In [5]: id(alist[2])
Out[5]: 2008637825040

In [6]: id(arr[2])
Out[6]: 2008637825040

If the list contains a mutable object, such as a list of strings:

In [7]: blist = ['one', 'two', 'three', ['a','b']]; barr=np.array(blist,object)

In [8]: blist[3]
Out[8]: ['a', 'b']

In [9]: barr[3]
Out[9]: ['a', 'b']

modifying that object in one, modifies it in the other:

In [10]: barr[3].append('c');barr
Out[10]: array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object)

In [11]: blist
Out[11]: ['one', 'two', 'three', ['a', 'b', 'c']]

But replacing a element of the list with a new value, does not change the array.

In [13]: blist[1]=12.3; blist, barr
Out[13]: 
(['one', 12.3, 'three', ['a', 'b', 'c']],
 array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object))

In many ways an object dtype array is like a list, e.g. alist.copy(). But methods are different. The list can append, the array can reshape etc. In general you don't gain much by making an object dtype array. Some operations may be simpler to write, but rarely are they faster.

ps

The `copy=False' error message:

In [28]: np.asarray(alist, dtype=object, copy=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[28], line 1
----> 1 np.asarray(alist, dtype=object, copy=False)

ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

is just telling us that the default copy=None is just as useful. It will copy only if needed. copy=True is more useful, forcing a copy (but it is still a shallow copy). To get a deep copy with object dtype, I think we have to use something like copy.deepcopy - but I haven't fiddled with that in a long time.