pythonpython-3.xlistnumpy

How can I improve the efficiency of matrix multiplication in python?


I have written a code to do matrix multiplication of different range but it takes a lot of time to execute the code.

Code:

# Program to multiply two matrices using nested loops

import time

print("Enter the size of matrix A")
m = int(input())
n = int(input())
print("Enter the size of matrix A")
p = int(input())
q = int(input())
if(n==p):
     print('enter matrix A')
else:
    print("invalid entry")
    exit()
our_list1 = []
A = []
i = 0
int(i)

for i in range(m):
    for i in range(n):
            number = int(input('Please enter a element '))
            our_list1.append(number)
    A.append(our_list1)
    our_list1= []
print(A)
print('enter matrix B')
our_list1 = []
B = []

for i in range(p):
    for i in range(q):
            number = int(input('Please enter a element '))
            our_list1.append(number)
    B.append(our_list1)
    our_list1= []
print(B)
start_time = time.time()

#
our_list1 = []
R = []

for i in range(m):
    for i in range(q):
            number = 0
            our_list1.append(number)
    R.append(our_list1)
    our_list1= []
print(R)
for i in range(len(A)):

    # iterating over column by B
    for j in range(len(B[0])):

        # iterating by rows of B
        for k in range(len(B)):
            R[i][j] += A[i][k] * B[k][j]
print(R)
print("--- %s seconds ---" % (time.time() - start_time))

It takes more time to execute this method of matrix multiplication, how can I choose the efficient way of matrix multiplication of huge dimension range? So higher dimension array can be executed smoothly and quickly.

Sample output:

Matrix A[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
Matrix B[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[27, 27, 27], [27, 27, 27], [27, 27, 27]]
--- 0.00014400482177734375 seconds ---

it takes 0.00014400482177734375 seconds can I improve this run time when I do for higher dimension multiplication?


Solution

  • This timings in your comments have some significant drawbacks:

    1. print() is comparatively expensive and has nothing to do with the calculation. Including it in the timings could take up a big chunk of the overall time.

    2. Using wallclock (time.time()) is not a good way of getting stable timings; you get one run and anything could be happening on your system.

    This should give a better test case for comparison:

    import numpy as np
    
    def python_lists():
        A = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
        B = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
        our_list1 = []
        R = []
        
        for i in range(3):
            for i in range(3):
                    number = 0
                    our_list1.append(number)
            R.append(our_list1)
            our_list1= []
        
        for i in range(len(A)):
        
            # iterating over column by B
            for j in range(len(B[0])):
        
                # iterating by rows of B
                for k in range(len(B)):
                    R[i][j] += A[i][k] * B[k][j]
        
        
    def numpy_array():
        A = np.full((3, 3), 3)
        B = np.full((3, 3), 3)
        result = np.dot(A, B)
    

    And the timings:

    %timeit python_lists()
    15 µs ± 45.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    %timeit numpy_array()
    5.57 µs ± 44.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    So, NumPy is ~3 times faster for this example. But this would be more significant if you had bigger arrays.

    EDIT: And actually, you could argue that creating A and B inside the function is not helpful for timing the actual matrix multiplication, so if I instead create the lists/arrays first and pass them, the new timings are:

    %timeit python_lists(A, B)
    14.4 µs ± 98.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    %timeit numpy_array(A, B)
    1.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    

    And, for the sake of completeness, for an array with shape (200, 200):

    %timeit python_lists()
    6.99 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    %timeit numpy_array()
    5.77 ms ± 43.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)