cbz2

Find invalid bz2 file preferable using C/C++


I have around 200 thousand bz2 files in which only one 1 valid. The size of each bz2 file is less than 200 bytes. I need to find the valid one. The command line bz2 utility is taking too much time.

Is there minimal check using file bytes by which I can find invalid bz2 and ignore further processing. I want to do in C/C++ as it would be way faster than shell scripts.


Solution

  • Got the solution. As per bz2 format, first 3 characters should be 'BZh'. This filtered out all but 19 files.