I know that Linux Split
can Split large files by file size in the following way, and the result is in the form of a numeric suffix
split -b 1G -d filepath suffix"
# result
suffix01 suffix02 ...
But I would like to be able to get the total split result in it and use it for the split file, say five files
, and I would like the result to be as follows
suffix5-01 suffix5-02 suffix5-03 suffix5-04 suffix5-05
While you can use other methods like du
to get the total file size, I don't know if split is based on the size du
gets, and that's not an elegant way to do it.
Therefore, is there a perfect solution to achieve the desired results?
You can do that with GNU Parallel.
First make a 10MB file to work with:
dd if=/dev/zero bs=10240 count=1024 > data.bin
Now split into 1MB chunks, naming each chunk suffix{TOTALCHUNKS}-{CHUNKNUMBER}
parallel --recend '' --plus --pipepart --block 1M cat \> suffix{##}-{#} :::: data.bin
Result
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-1
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-2
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-3
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-4
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-5
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-6
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-7
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-8
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-9
-rw-r--r-- 1 mark staff 1048576 9 Aug 16:57 suffix10-10
Notes:
You need --recend ''
to stop GNU Parallel trying to split your file on linefeeds
You need --plus
so that {##}
is set to the total number of jobs
You need --pipepart
to make it faster on seekable files - if your file is not seekable, use --pipe
instead
{##} means the total number of chunks
{#} means the current chunk number