As the numpy docs describe for the object dtype, arrays created with the object dtype are simply references to an underlying data store like a python list. The tobytes()
method on such an object returns pointers to this data store.
I was wondering if it's possible to create an ndarray object from a python list without creating a copy on creation.
For example, trying to create an ndarray from a list then assigning copy=False
to np.asarray raises an exception:
import numpy as np
l = ['spam', 'eggs']
arr = np.asarray(l, dtype='object', copy=False) # raises ValueError
I don't know how numpy is storing the underlying data, but it seems like it should be very similar (if not identical) to a python list.
Make a list of strings:
In [1]: import numpy as np
In [2]: alist = ['one', 'two', 'three']
And an array from that:
In [3]: arr = np.asarray(alist); arr
Out[3]: array(['one', 'two', 'three'], dtype='<U5')
Without dtype, it is a numpy string dtype (occupying 3*5*4=60
bytes).
But with object dtype:
In [4]: arr = np.asarray(alist, dtype=object); arr
Out[4]: array(['one', 'two', 'three'], dtype=object)
This is a shallow copy; the 3rd element is the same as the 3rd element of list - a python string:
In [5]: id(alist[2])
Out[5]: 2008637825040
In [6]: id(arr[2])
Out[6]: 2008637825040
If the list contains a mutable object, such as a list of strings:
In [7]: blist = ['one', 'two', 'three', ['a','b']]; barr=np.array(blist,object)
In [8]: blist[3]
Out[8]: ['a', 'b']
In [9]: barr[3]
Out[9]: ['a', 'b']
modifying that object in one, modifies it in the other:
In [10]: barr[3].append('c');barr
Out[10]: array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object)
In [11]: blist
Out[11]: ['one', 'two', 'three', ['a', 'b', 'c']]
But replacing a element of the list with a new value, does not change the array.
In [13]: blist[1]=12.3; blist, barr
Out[13]:
(['one', 12.3, 'three', ['a', 'b', 'c']],
array(['one', 'two', 'three', list(['a', 'b', 'c'])], dtype=object))
In many ways an object dtype array is like a list, e.g. alist.copy()
. But methods are different. The list can append
, the array can reshape
etc. In general you don't gain much by making an object dtype array. Some operations may be simpler to write, but rarely are they faster.
The `copy=False' error message:
In [28]: np.asarray(alist, dtype=object, copy=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[28], line 1
----> 1 np.asarray(alist, dtype=object, copy=False)
ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.
is just telling us that the default copy=None
is just as useful. It will copy only if needed. copy=True
is more useful, forcing a copy (but it is still a shallow copy). To get a deep copy with object dtype, I think we have to use something like copy.deepcopy
- but I haven't fiddled with that in a long time.