The aim of this question is to replace the /PageLabels code (source ) in a pdf file for another. We have to do this because there is a bug in the program which print the pdf (we can't change the program). By hand takes a lot of time (we made 50 pdf files per hour).
However to be pragmatic, the example can be summarized as follows.
Old /PageLabels code: Located in a original file called a.pdf.
We use the grep function to get the incorrect /PageLabels code:
grep -aPo '/PageLabels\K[^"]*>>]>>' a.pdf
<</Nums[0<</S/r/St 1>>6<</S/r/St 7>>10<</S/r/St 11>>12<</S/r/St 13>>14<</P(1-)/S/D/St 1>>20<</P(2-)/S/D/St 1>>28<</P(3-)/S/D/St 1>>80<</P(4-)/S/D/St 1>>116<</P(A-)/S/D/St 1>>132<</P(B-)/S/D/St 1>>134<</P(C-)/S/D/St 1>>138<</P(D-)/S/D/St 1>>148<</P(E-)/S/D/St 1>>168<</P(F-)/S/D/St 1>>176<</P(G-)/S/D/St 1>>182<</P(Glossary-)/S/D/St 1>>194<</P(Comments-)/S/D/St 1>>]>>
New /PageLabels code We want to substitute the "Old /PageLabels code" using the following. This is the result of another script which reevaluate the pdf and get the correct /PageLabel code of the pdf (tested and verified manually).
<</Nums[0<</S/r/St 1>>12<</P(1-)/S/D/St 1>>17<</P(2-)/S/D/St 1>>32<</P(3-)/S/D/St 1>>98<</P(4-)/S/D/St 1>>130<</P(A-)/S/D/St 1>>153<</P(B-)/S/D/St 1>>154<</P(C-)/S/D/St 1>>158<</P(D-)/S/D/St 1>>187<</P(E-)/S/D/St 1>>230<</P(F-)/S/D/St 1>>242<</P(G-)/S/D/St 1>>247<</P(Glossary-)/S/D/St 1>>259<</P(Comments-)/S/D/St 1>>]>>
It will be saved in another file called b.pdf
We don't know how to write it using the sed function.
Any ideas would be greatly appreciated.
You should be using replace
instead of sed
or regex
:
#! /bin/bash
old=$(grep -aPo '/PageLabels\K[^"]*>>]>>' a.pdf) ## Get Old /PageLabels code
new=$(/tmp/get_correct_code.sh ) ## Get New /PageLabels code
cat a.pdf |replace "$old" "$new" > new_a.pdf
From the man page:
DESCRIPTION
The replace utility program changes strings in place in files or on the standard input.
Invoke replace in one of the following ways:
shell> replace from to [from to] ... -- file_name [file_name] ...
shell> replace from to [from to] ... < file_name
UPDATE If you prefer to use sed
, you could try it this way:
#! /bin/bash
old=$(grep -aPo '/PageLabels\K[^"]*>>]>>' a.pdf) ## Get Old /PageLabels code
new=$(/tmp/get_correct_code.sh ) ## Get New /PageLabels code
# To replace $old with $new, first you'd have to escape those characters like [, ], -
eold=$(echo $old | sed 's@\([][-]\)@\\\1@g')
# Then do the replace using sed
sed "s@$eold@$new@g" a.pdf > b.pdf