I'm working on a project to compress/decompress a text file with the Huffman algorithm. I've successfully managed to compress a given text file into binary -- I've checked with the Visual Studio Hex Editor extension and it works correctly.
However, I don't really know how to go about implementing decompression. What I'd like to do is read the binary file bit-by-bit, traverse the corresponding Huffman tree, and write a character when a leaf is reached. However, I'm not sure how to go about reading a binary file bit-by-bit (I know that it can't be done directly as is: of course I had to do bit manipulation to implement compression in the first place).
I've tried using the fread()
function to store the bits into a buffer, but frankly, I think that I'm misunderstanding how it works, and I'm not sure how much memory to give the buffer, nor how to retrieve the bits from the buffer, which is a char
array.
Thanks in advance for any help.
Here is a simple read wrapper using an int
buffer. It reads bits from the stream one at a time, starting from the least significant bit of the first byte:
// Read the next bit from the stream.
// `buffer` holds the remaining bits from the last byte read
// initial value of `buffer` should be `0`.
// Return the bit value or `EOF` at end of file
int read_next_bit(FILE *stream, int *buffer)
{
int bits = *buffer;
if (bits <= 1) {
// either no bits left or EOF reached already
if (bits == EOF) {
return EOF;
}
// get the next byte from the stream
bits = getc(stream);
if (bits == EOF) {
// if EOF, make it sticky and return it
return *buffer = EOF;
}
// set the guard bit
bits |= 0x100;
}
// shift the current bit out of the buffer
*buffer = bits >> 1;
// and return it.
return bits & 1;
}
This function tries to minimize the number of memory reads, writes and the number of tests.
Here is an alternative that uses a structure to encapsulate the bitstream with an array to bufferize the file contents.
#include <stdio.h>
#include <stdbool.h>
typedef struct bitstream {
FILE *stream;
int pos, len;
unsigned char buf[BUFSIZ];
} bitstream;
#define LSB_FIRST 1
#define MSB_FIRST 2
#define BIT_ORDER LSB_FIRST // define as appropriate
// open a bitstream, return a boolean success indicator
bool bitstream_open(bitstream *bt, const char *filename) {
bt->len = bt->pos = 0;
bt->stream = fopen(filename, "rb");
return bt->stream != NULL;
}
// close a bitstream
void bitstream_close(bitstream *bt) {
if (bt->stream) {
bt->len = bt->pos = 0;
fclose(bt->stream);
bt->stream = NULL;
}
}
// read a bit from a bitstream:
// return EOF at end of file
// otherwise return the next bit
int bitstream_read(bitstream *bt) {
int pos = bt->pos;
if (pos >= bt->len) {
bt->pos = pos = 0;
bt->len = fread(bt->buf, sizeof bt->buf, bt->stream) * 8;
if (bt->len == 0)
return EOF;
}
bt->pos++;
#if BIT_ORDER == LSB_FIRST
return (buf[pos >> 3] >> (pos & 7)) & 1;
#else
return (buf[pos >> 3] >> (7 - (pos & 7))) & 1;
#endif
}