Type1 font /Differences
encoding uses strings in mapping of values for example 1 character is encoded to 'one'. It is used for numbers and special characters only.
What is the standard way to use these encoding?
How should I decode string from PDF which uses such encoding?
Link for the file: http://www.filedropper.com/open
Here's the /Differences
array in your file (and honestly, you should have just posted this and not a link a skeevy download page):
/Differences [
24 /breve/caron/circumflex/dotaccent/hungarumlaut/ogonek/ring/tilde
39 /quotesingle
96 /grave
128 /bullet/dagger/daggerdbl/ellipsis...
]
The way this works is that the font also has an encoding associated with it (for example /MacRoman
or /WinANSI
). In the case of a Type 1 font, there is an encoding built into the font. Then given a copy of that encoding, you apply the differences to it. Start from the number (your first is 24), you change entries 24-31 inclusive to /breve
, /circumflex
and so on.
In Type 1 fonts, there is a dictionary called /CharStrings
, which an association of a name of a glyph with the data/code that will render it. If, for example, you get a character with code 26, you look it up in your encoding array (which should be a 256 element array for Type 1 fonts) and with the differences applied, you get the name /circumflex
. You then look that up in the CharStrings
dictionary, pull out the glyph data and render it. Any character that does not exist in the encoding should be set to /.notdef
which will then render an shape representing an undefined character (usually an empty box).
Now likely your problem is, how do I turn these glyph names in something that is more useful like, say Unicode?
If you look in Annex D, you'll see a set of tables that define the character sets for standard Latin encodings. You would make a lookup table that maps Adobe standard names to Unicode. Unfortunately, the tables in Annex D are incomplete. Fortunately, Adobe has a file that defines all of that for you here. There is a link in that file which is now dead, but most likely it was meant to go here.