cascii

Extract a bit sequence from a character


So I have an array of characters like the following {h,e,l,l,o,o} so I need first to translate this to its bit representation, so what I would have is this

h = 01101000
e = 01100101
l = 01101100
l = 01101100
o = 01101111
o = 01101111

I need divide all of this bits in groups of five and save it to an array so for example the union of all this characters would be

011010000110010101101100011011000110111101101111

And now I divide this in groups of five so

01101 00001 10010 10110 11000 11011 00011 01111 01101 111

and the last sequence should be completed with zeros so it would be 00111 instead. Note: Each group of 5 bits would be completed with a header in order to have 8 bits.

So I havent realized yet how to accomplish this, because I can extract the 5 bits of each character and get the representation of each character in binary as following

 for (int i = 7; i >= 0; --i)
  {
     printf("%c", (c & (1 << i)) ? '1' : '0');
  }

The problem is how to combine two characters so If I have two characters 00000001 and 11111110 when I divide in five groups I would have 5 bits of the first part of the character and for the second group I would have 3 bits from the last character and 2 from the second one. How can I make this combination and save all this groups in an array?


Solution

  • Assuming that a byte is made of 8 bits (ATTENTION: the C standard doesn't guarantee this), you have to loop over the string and play with bit operations to get it done:

    This can be coded like this:

    char s[]="helloo";
    
    unsigned char last=0;          // remaining bits from previous iteration in high output part
    size_t j=5;                    // number of high input bits to keep in the low output part 
    unsigned char output=0; 
    for (char *p=s; *p; p++) {     // iterate on the string 
        do {
            output = ((*p >> (8-j)) | last) & 0x1f;  // last high bits set followed by j bits shifted to lower part; only 5 bits are kept 
            printf ("%02x ",(unsigned)output);
            j += 5;                                  // take next block  
            last = (*p << (j%8)) & 0x1f;             // keep the ignored bits for next iteration 
        } while (j<8);                               // loop if second block to be extracted from current byte
        j -= 8;                                      
    }
    if (j)                                           // there are trailing bits to be output
       printf("%02x\n",(unsigned)last); 
    

    online demo

    The displayed result for your example will be (in hexadecimal): 0d 01 12 16 18 1b 03 0f 0d 1c, which corresponds exactly to each of the 5 bit groups that you have listed. Note that this code ads 0 right padding in the last block if it is not exactly 5 bits long (e.g. here the last 3 bits are padded to 11100 i.e. 0x1C instead of 111 which would be 0x0B)

    You could easily adapt this code to store the output in a buffer instead of printing it. The only delicate thing would be to precalculate the size of the output which should be 8/5 times the original size, to be increased by 1 if it's not a multiple of 5 and again by 1 if you expect a terminator to be added.