So, I have the following situation:
A code which produces a large (must be zipped) set of outputs as follows:
line00
line01
...
line0N
.
line10
line11
...
line1M
.
...
I generate this content and zip it with:
./my_cmd | gzip -9 > output.gz
What I would like to do is, in pseudo code:
./my_cmd \
| csplit --prefix=foo '/^\.$/+1' {*} \ # <-- this will just create files
| tar -zf ??? \ # <-- don't know how to link files to tar
| gzip -9 > output.tar.gz
Ideally, nothing unzipped ever gets on the hard drive.
In summary: My objective is a set of files split at the delimiter on the hard drive in a zipped state, without intermediate read-write steps.
If I can't do this with tar/gzip/csplit, then maybe something else?
Tar can handle the compression itself.
./my_cmd | csplit --prefix=foo - '/^\.$/+1' {*} ; # writes foo?? files
printf "%s\n" foo[0-9][0-9] | tar czf output.tar.gz -T -
rm -f foo[0-9][0-9] # clean up the temps
If that's just not good enough, and you REALLY need that -9
compression,
printf "%s\n" foo[0-9][0-9] |
tar cOT - |
gzip -9 > output.tar.gz
Then you should be able to extract individual files from the archive for handling individually.
tar xvOf tst.tgz foo00 | wc -l
That lets you keep the file compressed, but pull out chunks to work on without writing them to disk.