pytorchtensortorchcomputation

torch.sub cause cuda memory out


I have two tensors, a and b, with sizes a.shape=(10000,10000,120) and b.shape=(10000,10000,120).

I'm trying to get a cost matrix between a and b, cost = torch.sub((a-b)**2,-1), where cost.shape=(10000,10000).

The problem is, when I tried to do a-b or torch.sum(a,b,alpha=1), a "CUDA MEMORY OUT" error occurs.

I don't think it should cost that much. It works when the size of the tensor is small, like 2000.

Using a for iteration is not an efficient way. How can I deal with it?


Solution

  • It does costs much (about 134 GB).

    Let's do some calculations.

    Assuming your data is of type torch.float32, a will occupy a memory size of:

    32 bits (4 Bytes) * 10000 * 10000 * 120 = 4.8E10 bytes ≈ 44.7 G Bytes

    So does b. When you do b-a, the result also has the same shape with a and thus occupies the same amount of memory, which means you need a total of 44.7 GB * 3 (≈ 134 GB) memory to do this operation.

    Is your available memory size greater than 134GB?

    Possible solution:

    If you will no longer use a or b afterwards, you can store the result in one of them to prevents to allocating another 44.7 GB space like this:

    torch.sub(a, b, out=a)  # In this case, the result goes to `a`