algorithmmatrixbit-manipulationpseudocodematrix-multiplication# Binary matrix multiplication bit twiddling hack

# Abstract

# Example

# Question

# P.S

# Edit

Hi, suppose you have two different independent 64-bit binary matrices `A`

and `T`

(`T`

is another matrix that is stored in transposed form, using the transposed version of matrix allows during multiplication to operate on `T`

's rows rather than columns which is super cool for binary arithmetic) and you want to multiply these matrices the only thing is that matrix multiplication result is truncated to 64-bits and if you yield to a value greater that `1`

in some specific matrix cell the resulting matrix cell will contain `1`

otherwise `0`

```
A T
00000001 01111101
01010100 01100101
10010111 00010100
10110000 00011000 <-- This matrix is transposed
11000100 00111110
10000011 10101111
11110101 11000100
10100000 01100010
```

Binary and traditional multiplication results:

```
Binary Traditional
11000100 11000100
11111111 32212121
11111111 32213421
11111111 21112211
11101111 22101231
11001111 11001311
11111111 54213432
11001111 11001211
```

How do you multiply these matrices in a way described above in most efficient matter?

I was trying to take advantage of binary `and`

(i.e. `&`

operator) instead of performing multiplication on separate bits, in that case I had to prepare data for multiplication:

```
ulong u;
u = T & 0xFF;
u = (u << 00) + (u << 08) + (u << 16) + (u << 24)
+ (u << 32) + (u << 40) + (u << 48) + (u << 56);
```

now by performing binary `and`

over two integers `A`

and `u`

it would yield to the following:

```
A u R C
00000001 01111101 00000001 1
01010100 01111101 01010100 3
10010111 01111101 00010101 3
10110000 01111101 00110000 2
11000100 01111101 01000100 2
10000011 01111101 00000001 1
11110101 01111101 01110101 5
10100000 01111101 00100000 1
```

In the example above `R`

contains result of multiplication of `A`

bits to `u`

bits and to obtain the final value we must `sum`

all bits in a row. Notice that column `C`

contains values equal to ones found in first column of resulting `Traditional`

matrix multiplication above. The problem is that during this step I have to operate on a separate bits which I think is sub-optimal approach, I've read through http://graphics.stanford.edu/~seander/bithacks.html looking for a way to do that on parallel but no luck, if anyone has any idea on how to "flatten" and "merge" the values located in `R`

column into resulting 64-bit matrix, I would appreciate if you drop me several lines,

Thank you,

With big thank you to David Eisenstat, the final algorithm would then look like:

```
var A = ...;
var T = ...; // T == transpose(t), t is original matrix, algorithm works with transposed matrix
var D = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D);
```

The following piece of code:

```
public static void Main (string[] args){
ulong U;
var Random = new Xor128 ();
var timer = DateTime.Now;
var A = Random.As<IUniformRandom<UInt64>>().Evaluate();
var T = Random.As<IUniformRandom<UInt64>>().Evaluate();
var steps = 10000000;
for (var i = 0; i < steps; i++) {
ulong r = 0;
var d = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d);
}
Console.WriteLine (DateTime.Now - timer);
var m1 = new Int32[8,8];
var m2 = new Int32[8,8];
var m3 = new Int32[8,8];
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
m1 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m2 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m3 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
}
}
timer = DateTime.Now;
for (int i = 0; i < steps; i++) {
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
var sum = 0;
for (int temp = 0; temp < 8; temp++) {
sum += m1 [row, temp] * m2 [temp, row];
}
m3 [row, col] = sum;
}
}
}
Console.WriteLine (DateTime.Now - timer);
}
```

Shows me the following results:

```
00:00:02.4035870
00:00:57.5147150
```

And that's a 23x performance improvement under Mac OS X / Mono, thanks everyone

Solution

I'm not sure about *most* efficient, but here's something to try. The following sequence of instructions computes the main diagonal of the product A * T'. Rotate both T and D by 8 bits and repeat for 7 more iterations.

```
// uint64_t A, T;
uint64_t D = UINT64_C(0x8040201008040201);
uint64_t P = A & T;
// test whether each byte is nonzero
P |= P >> 1;
P |= P >> 2;
P |= P >> 4;
P &= UINT64_C(0x0101010101010101);
// fill each nonzero byte with ones
P *= 255; // or P = (P << 8) - P;
// leave only the current diagonal
P &= D;
```

- Difference between back tracking and dynamic programming
- How can we optimize this algorithm from O(N ** 2) to better complexity in order to pass performance test?
- How do I predict the required size of a Base32 Decode output?
- Reversing AND Bitwise
- Why does my binary search need an extra comparison? log2(N)+1
- How to build a trie for finding exact phonetic matches, sorted globally by weight, and paginated? (Building off this example)
- What is the Time Complexity and Space Complexity of extending a string according to a rule?
- Skyscraper puzzle algorithm
- Check if all elements in a list are equal
- Bitwise Interval Arithmetic
- fast algorithm for drawing filled circles?
- How to find distance from the latitude and longitude of two locations?
- Determine if two rectangles overlap each other?
- Randomly Splitting a Graph according to Conditions
- Maximize distance while pushing crates
- Free a binary tree without recursion
- How can I estimate number of nodes in given tree structure?
- Explanation of class Definition for Binary Trees in leetcode
- Procedural Generation of 2D Rooms
- Is there an algorithm to find the closest element to X in an unsorted array in Ω(logN)?
- Advanced Java idiom for 2+ classes implementing a generic interface
- Is there any algorithm in c# to singularize - pluralize a word?
- Number of Common sub sequences of two strings
- Trying to solve problem 19 on Euler Project
- is a "non-decreasing" sequence "increasing"?
- Is it possible to get the original value of a number, after several multiplications **with overflow**?
- Algorithm to determine the highest and lowest possible finishing position of a team in a league
- Algorithm to calculate the number of divisors of a given number
- Rolling or sliding window iterator?
- best way to pick a random subset from a collection?