pythonhtmlencodingdominateyattag

Python HTML Accent mark issues with Yattag, dominate


Hopefully I find some answers here. I'm trying to write html with python 3. I have tried yattag and dominate modules but with both I have the same problem: when I try to write the content of my code to an HTML file, the generated document doesn't display the letters with accent mark, instead a question mark in a little black figure is displayed. (see image at the bottom)

my code looks like this.

Using dominate:

import dominate
import dominate.tags as tg
#an example doc
_html = tg.html(lang='es')
_head = _html.add(tg.head())
_body = _html.add(tg.body())
with _head:
    tg.meta(charset="UTF-8") #this line seems to be the problem
with _body:
    tg.p("Benjamín")
print(_html)
#when I print to console, the accent mark in the letter 'í' is there but...
#when I write the file, the weird character is displayed 

with open("document.html", 'w') as file:
    file.write(_html.render())

Same thing using yattag

from yattag import Doc
#another example doc
doc, tag, text = Doc().tagtext()
with tag("html", "lang='es'"):
    with tag("head"):
        doc.stag("meta", charset="UTF-8") #this line seems to be the problem
    with tag("body"):
        text("Benjamín")
#when I print to console, the accent mark in the letter 'í' is there but...
#when I write the file, the weird character is displayed 
with open("document2.html", 'w') as file:
    file.write(doc.getvalue())

So when I change or remove the charset in both cases, the problem seems to go away. I use the last two lines to write simple documents, as everyone does I guess, and no problem with accent marks. The problem seems to be how the imported modules manage the charset to display the content of the page. Well I don't know. do you know any way to get around this? Hope you're doing fine. Thank you.

You can see this annoying simbol


Solution

  • You can use the encoding parameter when you open the file:

    with open("document2.html", 'w', encoding='utf-8') as file:
    

    Pro tip: You can define the behavior in case of errors using the errors parameter:

    with open("document2.html", 'w', encoding='utf-8', errors='ignore') as file: