I have a translatable string in one of my Jinja2 templates:
Project can’t end sooner than it starts
(Note the UTF-8 apostrophe in “can’t”.)
When I extract messages and update my translation files, both the template (.pot
) and translation (.po
) files have the following msgid
:
msgid "Project canât end sooner than it starts"
It seems Babel “translated” the UTF-8 characters as if they were in some kind of 8-bit character set.
My babel.cfg
is a really short one:
[python: **.py]
[jinja2: **/templates/**.html]
extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension
Is there a way for Babel to notice the template is already in UTF-8, and not to transalete it from whatever charset it thinks? I can’t see any related option in the help output of pybabel extract --help
nor pybabel extract --help
.
I use Python3 exclusively, for the record.
Turns out it is supported out of the box, it’s just seems undocumented. All I had to do is changing the configuration:
[python: **.py]
[jinja2: **/templates/**.html]
encoding=utf-8
extensions=jinja2.ext.autoescape,jinja2.ext.with_,webassets.ext.jinja2.AssetsExtension
The encoding=utf-8
part did its magic, all HTML files are now treated as UTF-8 data.