pythonjsonencodingpydanticcyrillic

Incorrect encoding cyrillic symbols in Pydantic json method (Python)


Simple example of the problem is given below:

from pydantic import BaseModel

class City(BaseModel):
    name: str

city = City(name="Город")
print(city)  # name='Город'
print(city.json())  # {"name": "\u0413\u043e\u0440\u043e\u0434"}

My system info:

Problem remains with any chcp option (console encoding): 866, 1251, 65001. If I try to write json() output into txt file, the output is same \u0413\u043e\u0440\u043e\u0434. I would really appreciate if you could help me to fix the root problem. I want this code to output pure json with proper cyrillic symbols.

I've tried:


Solution

  • Python's JSON module tries to keep all JSON output within ASCII, which doesn't contain any cyrillic characters.

    You can turn off this setting with ensure_ascii=False:

    print(city.json(ensure_ascii=False))
    

    Output:

    {"name": "Город"}
    

    Note that some JSON parsers might not be able to read this file.

    If you want to output this string using codepage 866 instead of UTF-8, you might need this code, in order to encode the string from Python's str type into a bytes type:

    city.json(ensure_ascii=False).encode('cp866')
    

    Note that cp866 stands for Code Page 866.