cembeddedtransposebitarraydspic

What is the fastest way to transpose the bits in an 8x8 block on bits?


I'm not sure the exact term for what I'm trying to do. I have an 8x8 block of bits stored in 8 bytes, each byte stores one row. When I'm finished, I'd like each byte to store one column.

For example, when I'm finished:

Byte0out = Byte0inBit0 + Bit0inByte1 + Bit0inByte2 + Bit0inByte3 + ...
Byte1out = Bit1inByte0 + Bit1inByte1 + Bit1inByte2 + Bit1inByte3 + ...

What is the easiest way to do this in C which performs well? This will run on a dsPIC microcontroller


Solution

  • This code is cribbed directly from "Hacker's Delight" - Figure 7-2 Transposing an 8x8-bit matrix, I take no credit for it:

    void transpose8(unsigned char A[8], int m, int n, 
                    unsigned char B[8]) {
       unsigned x, y, t; 
    
       // Load the array and pack it into x and y. 
    
       x = (A[0]<<24)   | (A[m]<<16)   | (A[2*m]<<8) | A[3*m]; 
       y = (A[4*m]<<24) | (A[5*m]<<16) | (A[6*m]<<8) | A[7*m]; 
    
       t = (x ^ (x >> 7)) & 0x00AA00AA;  x = x ^ t ^ (t << 7); 
       t = (y ^ (y >> 7)) & 0x00AA00AA;  y = y ^ t ^ (t << 7); 
    
       t = (x ^ (x >>14)) & 0x0000CCCC;  x = x ^ t ^ (t <<14); 
       t = (y ^ (y >>14)) & 0x0000CCCC;  y = y ^ t ^ (t <<14); 
    
       t = (x & 0xF0F0F0F0) | ((y >> 4) & 0x0F0F0F0F); 
       y = ((x << 4) & 0xF0F0F0F0) | (y & 0x0F0F0F0F); 
       x = t; 
    
       B[0]=x>>24;    B[n]=x>>16;    B[2*n]=x>>8;  B[3*n]=x; 
       B[4*n]=y>>24;  B[5*n]=y>>16;  B[6*n]=y>>8;  B[7*n]=y; 
    }
    

    I didn't check if this rotates in the direction you need, if not you might need to adjust the code.

    Also, keep in mind datatypes & sizes - int & unsigned (int) might not be 32 bits on your platform.

    BTW, I suspect the book (Hacker's Delight) is essential for the kind of work you're doing... check it out, lots of great stuff in there.