amazon-s3hashsha256gnu-parallelsha512

Using gnu parallel to simultaneously calculate hashes


I need to calculate the SHA256 and SHA512 hashes of a very large file held in AWS S3. I need the processing to block until both computations have completed and have tried using gnu parallel to do that. I don't want to copy the file in from S3 locally but to just stream it. Here's the code I've got:

aws s3 cp s3://bucket-name/large-file - \
| parallel -j2 --halt now,fail=1 --pipe --keep-order --will-cite \
"sha256sum >/tmp/sha256.txt; sha512sum >/tmp/sha512.txt" 

This doesn't error but the resulting hashes are not correct (compare with what is seen when running for example aws s3 cp s3://bucket-name/large-file - | shasum256).

Ideally I want parallel to spawn 2 jobs to calculate each hash and block until they have both completed (which is what parallel does I believe by default). I also want to only stream the file in once from S3 (not once for 256 and then again for 512).


Solution

  • Mark's answer is the correct one. Here it is just spelled out:

    aws s3 cp s3://bucket-name/large-file - |
      parallel -j2 --pipe --tee {} \
      ::: "sha256sum >/tmp/sha256.txt" "sha512sum >/tmp/sha512.txt" 
    

    or:

    aws s3 cp s3://bucket-name/large-file - |
      parallel -j2 --pipe --tee 'sha{}sum > /tmp/sha{}.txt' ::: 256 512