bashcommand-linegziptarcsplit

Pipe output to zipped tar after cspilt


So, I have the following situation:

A code which produces a large (must be zipped) set of outputs as follows:

line00
line01
...
line0N
.
line10
line11
...
line1M
.
...

I generate this content and zip it with:

./my_cmd | gzip -9 > output.gz

What I would like to do is, in pseudo code:

./my_cmd \
| csplit --prefix=foo '/^\.$/+1' {*} \  # <-- this will just create files
| tar -zf ??? \                 # <-- don't know how to link files to tar
| gzip -9 > output.tar.gz

Ideally, nothing unzipped ever gets on the hard drive.

In summary: My objective is a set of files split at the delimiter on the hard drive in a zipped state, without intermediate read-write steps.

If I can't do this with tar/gzip/csplit, then maybe something else?


Solution

  • Tar can handle the compression itself.

    ./my_cmd | csplit --prefix=foo - '/^\.$/+1' {*} ; # writes foo?? files 
    
    printf "%s\n" foo[0-9][0-9] | tar czf output.tar.gz -T -
    rm -f foo[0-9][0-9]  # clean up the temps     
    

    If that's just not good enough, and you REALLY need that -9 compression,

    printf "%s\n" foo[0-9][0-9] | 
        tar cOT -               |
        gzip -9 > output.tar.gz
    

    Then you should be able to extract individual files from the archive for handling individually.

    tar xvOf tst.tgz foo00 | wc -l
    

    That lets you keep the file compressed, but pull out chunks to work on without writing them to disk.