pythonnumpynumpy-ndarray

ndarray a obtained from b.diagonal() has its value changed after the b modification


I'm a bit confused by the behavior of the below code and wondering if someone could shed some light on this. Basically, I have a matrix called mat which is a numpy ndarray. I get its diagonal using mat.diagonal() and assign it to the variable diag. I changed all diagonal values of mat to 100. Now I find diag has its values all changed to 100 too, which seems to indicate that diag directly references elements in mat. Yet, when I check the memory address of the first element in diag and compare it to that of mat, they are different. What's the right way to look at this?

import numpy as np
import pandas as pd

mat_df = pd.DataFrame(data=[[1,2,3], [4,5,6], [7,8,9]])
print(mat_df)

mat = mat_df.values
diag = mat.diagonal()
print(diag)
diag_loc = np.diag_indices_from(mat)
mat[diag_loc] = 100
print(diag)

print(diag[0])
print(id(diag[0]))
print(mat[0][0])
print(id(mat[0][0]))

mat:

   0  1  2
0  1  2  3
1  4  5  6
2  7  8  9

diag:

[1 5 9]

diag's values change due to mat's change:

[100 100 100]

the first value of diag:

100

and its address

139863357577488

the first value of mat:

100

and its address

139863376059664

Solution

  • You can't know the address with id. First of all, id doesn't return an address (even tho, CPython implementation use memory address to build the id, that is just one implementation, and that is not the address per se). And secondly, that would only be the address of the python object (in your case the one wrapping the numpy.int64).

    That python object is just build to wrap whatever numpy functions (that are opaque to python: python doesn't know when they are supposed to return the same values) return.

    Simple experiment you can do to convince yourself how your id means nothing

    id(diag[0])
    # 139729998312368
    id(diag[0])
    # 139730045496016
    

    See, not even two consecutive exactly identical call does not return the same id!

    diag[0] is a call to numpy's diag.__getitem__(0). Wrapped into a python container that is different each time, as would be the result of a call to any function f(0), for which there is no reason to suppose that each identical call return the exact same result.

    So, if you want to know where the actual int64 are stored, you cannot ask python (with its id function), since not only that is now what id is for, but more importantly, python doesn't know. Where the int64 are stored is an internal problem of numpy's library. So you need to ask numpy.

    The best way to do that is using base imho.

    diag.base
    #array([[100,   2,   3],
    #       [  4, 100,   6],
    #       [  7,   8, 100]])
    diag.base is mat.base
    # True
    

    But if you, insist on having an address of some sort, you can also

    diag.ctypes.data
    # 61579664
    mat.ctypes.data
    # 61579664
    

    Or, for a more complete information on what data are viewed and how by the array

    mat.__array_interface__
    # {'data': (61579664, False), 'strides': (8, 24), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 3), 'version': 3}
    diag.__array_interface__
    # {'data': (61579664, True), 'strides': (32,), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3,), 'version': 3}
    

    showing how the 2 are using the same 'data' but using different 'strides' and 'shape' to use it.