bashpipetararchivemd5sum

Get md5sum in pipe


I am creating archives of very large directories and splitting these archives in smaller parts as follows:

tar -vcz target_dir | pigz > target_dir.tar.gz

md5sum target_dir.tar.gz > md5sum.txt

split -n 10 target_dir.tar.gz target_dir.tar.gz.part-

The problem is with this approach that I basically need twice the space of the tar.gz file, which is problematic as some of the target directories are huge (TBs).

I could pipe the tar output into split to reduce the required disk space:

tar -vcz target_dir | pigz | split -n 10 - target_dir.tar.gz.part-

But how would I calculate the md5sum of the tar.gz file before it goes into split?


Solution

  • Use tee to split a stream. Use bash process substitution to run a temporary process with input from a temporary fifo.

    tar -vcz target_dir |
        pigz |
        tee >(md5sum > md5sum.txt) |
        split -n 10 - target_dir.tar.gz.part-