I working with database of documents, contents from 5 to 20 pages of text data each.
I have three tasks:
I tried to use PHP Cphf library by Wayne Munro (pdf@ros.co.nz) and I have added a lot of regular expressions to it. I have added a lot of PDF text formatting operators, like Ts, TL, T*, Tc, Tw, Tz
and I almost done, but I can not reach glyphs of characters outside of the Type 1 chars table and I have no idea how to get the 'kern' and 'hmtx' tables from the font file? How to embed the glyphs?
I believe, I can do the kerning by the matrix transform PDF text directive:
[ (A) 120 (W) 120 (A) 95 (Y again) ] TJ
UPD#1: MinionPro font have no 'kern' table. It have 'GPOS' (The Glyph Positioning Table) and I am pretty close to solving the problem. By the way, walking inside the binary file by PHP is nightmare((
Using manual kerning in small font text strings is totally wrong strategy. Microsoft Excel PDF text exporters do the same and result is not acceptable.