I'm merging three three files (ls -l):
-rw-rw-r-- 1 kacper kacper 1839510 sie 13 14:27 A.jpg
-rw-rw-r-- 1 kacper kacper 2014809 sie 13 14:27 B.jpg
-rw-rw-r-- 1 kacper kacper 1277047 sie 13 14:27 C.pdf
into one file (merged) in bash using:
cat A.jpg >> merged
echo $SEPARATOR >> merged
cat B.jpg >> merged
echo $SEPARATOR >> merged
cat C.pdf >> merged
where:
SEPARATOR=PO56WLH82SN1ZS5QH5EU9FOZVLBRLHAGHO3D5KOUSPMS6KYSFAYN2DBL
Next I'm splitting the merged file into three parts using:
csplit --suppress-matched merged --prefix="PART_" '/'$SEPARATOR'/' {*}
this produces PART_00, PART_01, PART_02 (ls -l):
-rw-rw-r-- 1 kacper kacper 1839398 sie 13 18:41 PART_00
-rw-rw-r-- 1 kacper kacper 2014507 sie 13 18:41 PART_01
-rw-rw-r-- 1 kacper kacper 1277047 sie 13 18:41 PART_02
PART_00 and PART_01 are JPG files and can be properly displayed. PART_02 is a PDF file and it can be opened and viewed. So, at first glance this looked to me like success.
The problem is that the size of PART_00 (1839398 bytes) is slightly smaller then A.jpg (1839510 bytes). The same goes for the other files (PART_01, B.jpg and PART_02, C.pdf). After checking the files byte by byte using
cmp
the pairs of files are exactly the same up to the point when one of them ends.
Anyone know why this is the case? Advice would be greatly appreciated.
The last lines in the files are not terminated by a newline character. As such, when you add your separator into the merged file you are adding it to the end of the last line in the files. This last line is then matched by csplit
and the entire line dropped. Hence the last few characters are being dropped.
The --supress-matched
option for csplit
will supress the entire line matching where the pattern is matched.