numpyreshapein-placememory-consumption

Does numpy reshape create a copy?


Is there a way to do a reshape on numpy arrays but inplace. My problem is that my array is very big so any unnecessary copies strain the memory.

My current approach is like this:

train_x = train_x.reshape(n,32*32*3)

this doesn't exactly solve the problem since it creates a new array and then attributes the label train_x to the new array.

In a normal case this would be ok, since the garbage collector would very soon collect the original array.

The problem is that I have something like this:

train_x, train_y = train_set
train_x = train_x.reshape(n,32*32*3)

So in this case even though the train_x no longer points to the original array, there is still a pointer to the original array inside of train_set.

I want a way that changes all pointers of the previous array to this new array. Is there a way?

Or maybe there is some other way of dealing with this?


Solution

  • For Python keep in mind that several variables or names can point to the same object, such as a numpy array. Arrays can also have views, which are new array objects, but with shared data buffers. A copy has its own data buffer.

    In [438]: x = np.arange(12)
    In [439]: y = x                # same object
    In [440]: y.shape = (2,6)      # inplace shape change
    In [441]: y
    Out[441]: 
    array([[ 0,  1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10, 11]])
    In [442]: x
    Out[442]: 
    array([[ 0,  1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10, 11]])
    In [443]: y = y.reshape(3,4)        # y is a new view
    In [444]: y
    Out[444]: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])
    In [445]: x
    Out[445]: 
    array([[ 0,  1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10, 11]])
    

    y has a different shape, but shares the data buffer:

    In [446]: y += 1
    In [447]: y
    Out[447]: 
    array([[ 1,  2,  3,  4],
           [ 5,  6,  7,  8],
           [ 9, 10, 11, 12]])
    In [448]: x
    Out[448]: 
    array([[ 1,  2,  3,  4,  5,  6],
           [ 7,  8,  9, 10, 11, 12]])