mojibake

What causes the ГѓВ pattern in this Mojibake?


Google ГѓВ (UTF-8: D0 93 D1 93 D0 92) and you'll see a few examples of what seems to be Mojibake. A specific example is ö becoming ГѓВ¶.

What kind of encodings did the original ö go through to become ГѓВ¶? How would I figure this out?


Solution

  • When searching for ГѓВ¶, you can hit a website about Home Decor with a post titled äèçàéí èíòåðüåðîâ. Throwing this into an online mojibake decoder/fixer gives us the string äèçàéí èíòåðüåðîâ, which at first looks like garbage, but the mojibake decoder also gives us a list of steps:

    Mojibake Decoder's list of steps

    We can follow these steps backwards with ö to see if we get the original ГѓВ¶:

    1. Encode "ö" into UTF-8: C3 B6
    2. Decode C3 B6 into Latin-1: "ö"
    3. Encode "ö" into UTF-8: C3 83 C2 B6
    4. Decode C3 83 C2 B6 into Windows-1251: "ГѓВ¶"

    So, the ГѓВ (C3 C8 C2) pattern is caused specifically by characters in the C3 80-C3 BF range of UTF-8, or Unicode codepoints 00C0-00FF.

    Here is a CyberChef for the forwards conversion and another for the backwards conversion.