cbinarybinaryfileshuffman-code

How can I read a binary file bit-by-bit?


I'm working on a project to compress/decompress a text file with the Huffman algorithm. I've successfully managed to compress a given text file into binary -- I've checked with the Visual Studio Hex Editor extension and it works correctly.

However, I don't really know how to go about implementing decompression. What I'd like to do is read the binary file bit-by-bit, traverse the corresponding Huffman tree, and write a character when a leaf is reached. However, I'm not sure how to go about reading a binary file bit-by-bit (I know that it can't be done directly as is: of course I had to do bit manipulation to implement compression in the first place).

I've tried using the fread() function to store the bits into a buffer, but frankly, I think that I'm misunderstanding how it works, and I'm not sure how much memory to give the buffer, nor how to retrieve the bits from the buffer, which is a char array.

Thanks in advance for any help.


Solution

  • Here is a simple read wrapper using an int buffer. It reads bits from the stream one at a time, starting from the least significant bit of the first byte:

    // Read the next bit from the stream.
    // `buffer` holds the remaining bits from the last byte read
    // initial value of `buffer` should be `0`.
    // Return the bit value or `EOF` at end of file
    int read_next_bit(FILE *stream, int *buffer)
    {
        int bits = *buffer;
        if (bits <= 1) {
            // either no bits left or EOF reached already
            if (bits == EOF) {
                return EOF;
            }
            // get the next byte from the stream
            bits = getc(stream);
            if (bits == EOF) {
                // if EOF, make it sticky and return it
                return *buffer = EOF;
            }
            // set the guard bit
            bits |= 0x100;
        }
        // shift the current bit out of the buffer
        *buffer = bits >> 1;
        // and return it.
        return bits & 1;
    }
    

    This function tries to minimize the number of memory reads, writes and the number of tests.

    Here is an alternative that uses a structure to encapsulate the bitstream with an array to bufferize the file contents.

    #include <stdio.h>
    #include <stdbool.h>
    
    typedef struct bitstream {
        FILE *stream;
        int pos, len;
        unsigned char buf[BUFSIZ];
    } bitstream;
    
    #define LSB_FIRST  1
    #define MSB_FIRST  2
    #define BIT_ORDER  LSB_FIRST  // define as appropriate
    
    // open a bitstream, return a boolean success indicator
    bool bitstream_open(bitstream *bt, const char *filename) {
        bt->len = bt->pos = 0;
        bt->stream = fopen(filename, "rb");
        return bt->stream != NULL;
    }
    
    // close a bitstream
    void bitstream_close(bitstream *bt) {
        if (bt->stream) {
            bt->len = bt->pos = 0;
            fclose(bt->stream);
            bt->stream = NULL;
        }
    }
    
    // read a bit from a bitstream:
    // return EOF at end of file
    //   otherwise return the next bit
    int bitstream_read(bitstream *bt) {
        int pos = bt->pos;
        if (pos >= bt->len) {
            bt->pos = pos = 0;
            bt->len = fread(bt->buf, sizeof bt->buf, bt->stream) * 8;
            if (bt->len == 0)
                return EOF;
        }
        bt->pos++;
    #if BIT_ORDER == LSB_FIRST
        return (buf[pos >> 3] >> (pos & 7)) & 1;
    #else
        return (buf[pos >> 3] >> (7 - (pos & 7))) & 1;
    #endif
    }