algorithmstrassen

Strassen matrix multiplication to store linear equations


One of the questions I've come across in my textbook is:

In Computer Graphics transformations are applied on many vertices on the screen. Translation, Rotations
and Scaling.
Assume you’re operating on a vertex with 3 values (X, Y, 1). X, Y being the X Y coordinates and 1 is always
constant
A Translation is done on X as X = X + X’ and on Y as Y = Y + Y’
X’ and Y’ being the values to translate by
A scaling is done on X as X = aX and on Y as Y = bY
a and b being the scaling factors
Propose the best way to store these linear equations and an optimal way to calculate them on each vertex

It is hinted that it involves matrix multiplication and Strassen. However, I'm not sure where to start here? It doesn't involve complex code and it states to come up with something simple to showcase my idea but all Strassen implementations I've come across are definitely large enough to not call complex. What should my thought process be here?

Would my matrix look like this? 3x3 for each equation or do I combine them all in one?

[ X X X']
[ Y Y Y']
[ 1 1 1 ]

Solution

  • What you're trying to find is a transformation matrix, which you can then use to transform some current (x, y) point into the next (nx, ny) point. In other words, we want

    start = Vec([x, y, 1])
    matrix = Matrix(...)
    
    next = start * matrix  // * is matrix multiplication
    

    Now, if your next is supposed to look something like Vec([a * x + x', b * y + y', 1]), we can work our way backwards to figure out the matrix. First, look at just the x component. We're going to effectively take the dot product of our start vector and the topmost row of our matrix, yielding a * x + x'.

    If we write it out more explicitly, we want a * x + 0 * y + x' * 1. Hopefully that makes it a bit more easy to see that the vector we want to dot start with is Vec([a, 0, x']). We can repeat this for the remaining two rows of the matrix, and obtain the following matrix:

    matrix = Matrix(
      [[a, 0, x'],
       [0, b, y'],
       [0, 0, 1]])
    

    Double check that this makes sense and seems reasonable to you. If we take our start vector and multiply it with this matrix, we'll get the translated vector next as Vec([a * x + x', b * y + y', 1]).

    Now for the real beauty of this- the matrix itself doesn't care at all about what our inputs are, its completely independent. So, we can repeatedly apply this matrix over and over again to step forward through more scaling and translations.

    next_next_next = start * matrix * matrix * matrix
    

    Knowing this, we can actually compute many steps ahead really quickly, using some mathematical tricks. Multiplying but the matrix n times is the same as multiplying by matrix raised to the nth power. And fortunately, we have an efficient method for computing a matrix to a power- its called exponentiation by squaring (actually applies to regular numbers as well, but here we're concerned with multiplying matrices, and the logic still applies). In a nutshell, rather than multiplying the number or matrix over and over again n times, we square it and multiply intermediate values by the original number / matrix at the right times, to very rapidly approach the desired power (in log(n) multiplications).

    This is almost certainly what your professor is wanting you to realize. You can simulate n translations / scalings / rotations (yes, there are rotation matrices as well) in log(n) time.

    Extra Mile

    What's even cooler is that using some more advanced linear algebra, you can actually do it even faster. You can diagonalize your matrix (meaning you rewrite your matrix as P * D * P^-1, that is, the product of a some matrix P with a matrix D where the only non-zero elements are along the main diagonal, multiplied by the inverse of P). You can then raise this diagonalized matrix to a power really quickly, because (P * D * P^-1) * (P * D * P^-1) simplifies to P * D * D * P^-1, and this generalizes to:

    M^N = (P * D * P^-1)^N = (P * D^N * P^-1)
    

    Since D only has non-zero elements along its diagonal, you can raise it to any power by just raising each individual element to that power, which is just the normal cost of integer multiplication, across as many elements as your matrix is wide/tall. This is stupidly fast, and then you just do a single matrix multiplication on either side to arrive at M^N, and then multiply your start vector with this, for your end result.