pythonpdfencodingpython-pdfkit

How to generate a PDF with non-ascii characters using from_string from python-pdfkit


I'm struggling to generate just a simple PDF with non-ascii characters using Python 3.5.2, python-pdfkit and wkhtmltox-0.12.2.

This is the easiest example I could write:

import pdfkit
html_content = u'<p>ö</p>'
pdfkit.from_string(html_content, 'out.pdf')

This is like the output document looks like: non-ascii character incorrectly shown in the PDF


Solution

  • I found out that I just needed to add a meta tag with charset attribute to my HTML code:

    import pdfkit
    
    html_content = """
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="utf-8">
    </head>
    <body>
        <p>&euro;</p>
        <p>áéíóúñö</p>
    <body>
    </html>
    """
    
    pdfkit.from_string(html_content, 'out.pdf')
    

    I actually spent quite some time following wrong solutions like the one suggested here. In case someone is interested, I wrote a short story on my blog. Sorry for the SPAM :)