I need to calculate the SHA256 and SHA512 hashes of a very large file held in AWS S3. I need the processing to block until both computations have completed and have tried using gnu parallel to do that. I don't want to copy the file in from S3 locally but to just stream it. Here's the code I've got:
aws s3 cp s3://bucket-name/large-file - \
| parallel -j2 --halt now,fail=1 --pipe --keep-order --will-cite \
"sha256sum >/tmp/sha256.txt; sha512sum >/tmp/sha512.txt"
This doesn't error but the resulting hashes are not correct (compare with what is seen when running for example aws s3 cp s3://bucket-name/large-file - | shasum256
).
Ideally I want parallel
to spawn 2 jobs to calculate each hash and block until they have both completed (which is what parallel
does I believe by default). I also want to only stream the file in once from S3 (not once for 256 and then again for 512).
Mark's answer is the correct one. Here it is just spelled out:
aws s3 cp s3://bucket-name/large-file - |
parallel -j2 --pipe --tee {} \
::: "sha256sum >/tmp/sha256.txt" "sha512sum >/tmp/sha512.txt"
or:
aws s3 cp s3://bucket-name/large-file - |
parallel -j2 --pipe --tee 'sha{}sum > /tmp/sha{}.txt' ::: 256 512