pdfcharacter-encoding

Is my pdf file encoded in UTF-8?


I would like to find out, if a pdf file is encoded in UTF-8. How to check, which caracter encoding is used in a pdf file?


Solution

  • A PDF is a binary file, not a text file.

    A character encoding like "UTF-8" makes only sense in context with text files (*.txt, *.html, *.xml, *.csv, ...).

    Thus, a PDF never is UTF-8 encoded.


    Meanwhile we have PDF 2.0 which can contain UTF-8 encoded text. Nonetheless, even a PDF 2.0 file only can contain UTF-8 encoded text in certain objects, it still is not UTF-8 encoded as a whole.