pythoncharacter-encodinglatexpdflatexpylatex

How to write unicode characters to LaTeX document via pylatex


Assume the string myStr which contains three special characters.

myStr = "emdash —; delta Δ; thin space:  ;"

Further assume that we wish to write this string to a LaTeX document via pylatex.

If we write the string as is to a LaTeX document, errors occur during its compilation:

import pylatex
doc = pylatex.Document()
with doc.create(pylatex.Section('myStr -- not encoded')):
    doc.append(myStr)
doc.generate_pdf("myStr_notEncoded", clean_tex=False)

...
! Package inputenc Error: Unicode character Δ (U+0394)
(inputenc)                not set up for use with LaTeX.
...
! Package inputenc Error: Unicode character   (U+2009)
(inputenc)                not set up for use with LaTeX.
...

If we first encode the string via pylatexenc, the special characters are either represented by their respective LaTeX encoding (emdash, delta) or encoded in a way unclear to me (thin space).

import pylatexenc
from pylatexenc import latexencode
myStr_latex = pylatexenc.latexencode.unicode_to_latex(myStr)
doc = pylatex.Document()
with doc.create(pylatex.Section('myStr')):
    doc.append(myStr_latex)
doc.generate_pdf("myStr", clean_tex=False)

enter image description here

How do I have to write the string into the LaTeX document so that the special characters are printed as the actual characters when compiling with pdflatex?

Edit 1:

I also tried to change the default encoding inside the LaTeX document for the unencoded pathway but it results in a series of compilation errors as well.

doc.preamble.append(pylatex.NoEscape("\\usepackage[utf8]{inputenc}"))

Solution

  • You were close with your pylatexenc solution. When you encode latex yourself, e.g. with pylatexenc.latexencode.unicode_to_latex() you have to ensure that you tell pylatex the string should not be additional escaped. To wit:

    Using regular LaTeX strings may not be as simple as is seems though, because by default almost all strings are escaped[...] there are cases where raw LaTeX strings should just be used directly in the document. This is why the NoEscapestring type exists. This is just a subclass of str, but it will not be escaped

    In other words to solve, just make sure to use NoEscape to tell pylatex your string is already encoded as latex and not to encode it again:

    import pylatex
    from pylatexenc import latexencode
    myStr_latex = latexencode.unicode_to_latex(myStr)
    doc = pylatex.Document()
    with doc.create(pylatex.Section('myStr')):
        doc.append(pylatex.utils.NoEscape(myStr_latex))
    doc.generate_pdf("myStr", clean_tex=False)
    

    The generated pdf, properly encoded (emdash —; delta ∆; thin space: ;)