[SOLVED] Is this benchmark valid? tinygrad is impossibly fast vs. torch or numpy for medium-sized (10000 by 10000) matrix multiplication (CPU)

Is this benchmark valid? tinygrad is impossibly fast vs. torch or numpy for medium-sized (10000 by 10000) matrix multiplication (CPU)

I ran the following benchmark code on google collab CPU with high ram enabled. Please point out any errors in the way I am benchmarking, (if any) as well as why there is a such a high performance boost with tinygrad.

# Set the size of the matrices
size = 10000

# Generate a random 10000x10000 matrix with NumPy
np_array = np.random.rand(size, size)

# Generate a random 10000x10000 matrix with PyTorch
torch_tensor = torch.rand(size, size)

# Generate a random 10000x10000 matrix with TinyGrad
tg_tensor = Tensor.rand(size, size)  

# Benchmark NumPy
start_np = time.time()
np_result = np_array @ np_array  # Matrix multiplication
np_time = time.time() - start_np
print(f"NumPy Time: {np_time:.6f} seconds")

# Benchmark PyTorch
start_torch = time.time()
torch_result = torch_tensor @ torch_tensor  # Matrix multiplication
torch_time = time.time() - start_torch
print(f"PyTorch Time: {torch_time:.6f} seconds")

# Benchmark TinyGrad
start_tg = time.time()
tg_result = tg_tensor @ tg_tensor  # Matrix multiplication
tg_time = time.time() - start_tg
print(f"TinyGrad Time: {tg_time:.6f} seconds")

NumPy Time: 11.977072 seconds
PyTorch Time: 7.905509 seconds
TinyGrad Time: 0.000607 seconds

These were the results. After running the code many times, the results were very similar

Solution

Tinygrad performs operations in a "lazy" way, so the matrix multiplication hasn't been performed yet. Change your matrix multiplication line to:

tg_result = (tg_tensor @ tg_tensor).realize()

tg_result = (tg_tensor @ tg_tensor).numpy()