I have two tensors, a
and b
, with sizes a.shape=(10000,10000,120)
and b.shape=(10000,10000,120)
.
I'm trying to get a cost matrix between a
and b
,
cost = torch.sub((a-b)**2,-1)
, where cost.shape=(10000,10000)
.
The problem is, when I tried to do a-b
or torch.sum(a,b,alpha=1)
, a "CUDA MEMORY OUT" error occurs.
I don't think it should cost that much. It works when the size of the tensor is small, like 2000.
Using a for
iteration is not an efficient way. How can I deal with it?
It does costs much (about 134 GB).
Let's do some calculations.
Assuming your data is of type torch.float32
, a
will occupy a memory size of:
32 bits (4 Bytes) * 10000 * 10000 * 120 = 4.8E10 bytes ≈ 44.7 G Bytes
So does b
. When you do b-a
, the result also has the same shape with a
and thus occupies the same amount of memory, which means you need a total of 44.7 GB * 3 (≈ 134 GB) memory to do this operation.
Is your available memory size greater than 134GB?
Possible solution:
If you will no longer use a
or b
afterwards, you can store the result in one of them to prevents to allocating another 44.7 GB space like this:
torch.sub(a, b, out=a) # In this case, the result goes to `a`