pythonpdfunicodepymupdf

How to insert a unicode text to PDF using PyMuPDF?


I'm trying to use the PyMuPDF library to insert a Unicode text into a PDF file. I have the following code based on the documentation example:

import pymupdf

doc = pymupdf.open()
page = doc.new_page()
p = pymupdf.Point(50, 72)

# String in Sinhala language
text = (
    "ශ්‍රී දළදා මාලිගාව යනු බුදුරජාණන් වහන්සේගේ වම් දන්තධාතූන් වහන්සේ වර්තමානයේ තැන්පත් කර ඇති මාළිගාවයි."
)

font = pymupdf.Font(fontfile="ISKPOTAB.TTF") # Font file of the default Windows Sinhala font
page.insert_font(fontbuffer=font.buffer) # using font buffer since using name "Iskoola Pota Bold" produce an error
rc = page.insert_text(p, text, fontfile=font.buffer, fontsize=11, rotate=0)
print("%i lines printed on page %i." % (rc, page.number))

doc.save("text.pdf")

This code runs without any errors. However, the pdf file it produces only has dots("."). enter image description here

Am I missing anything here or it's just that PyMuPDF does not support unicode insertion?


Solution

  • You may need some further adjustments, I can't read Sinhalese, but this produces a PDF with your text in both Nirmala UI and Iskoola Pota:

    import fitz  # PyMuPDF
    
    doc = fitz.open()
    page = doc.new_page()
    
    # load and add the font, and name them for Fitz (PyMuPDF)
    page.insert_font(fontname='F0', fontfile=r'C:\Windows\Fonts\Nirmala.ttf')
    page.insert_font(fontname='F1', fontfile=r'C:\Temp\Fonts\iskpotab.ttf')
    
    sinhalese_text = "ශ්‍රී දළදා මාලිගාව යනු බුදුරජාණන් වහන්සේගේ වම් දන්තධාතූන් වහන්සේ වර්තමානයේ තැන්පත් කර ඇති මාළිගාවයි."
    
    # insert the text twice, using both fonts
    font_size = 24
    page.insert_text((50, 100), sinhalese_text, fontname='F0', fontsize=font_size)
    page.insert_text((50, 150), sinhalese_text, fontname='F1', fontsize=font_size)
    
    doc.save("text.pdf")
    doc.close()
    

    Note: as far as I can tell, the text reads "The Temple of the Tooth Relic is the temple where the left tooth relic of the Buddha is currently enshrined."