character-encodingelixirshift-jis

How can I encode/decode shift-jis in elixir?


Given text in shift-jis encoding, how can I decode it into Elixir's native UTF-8 encoding, and vice-versa?


Solution

  • The Codepagex library supports this. You just need to figure out what it calls SHIFT_JIS.

    Codepagex uses the mappings available from unicode.org. There is one for shift-jis but it's marked as OBSOLETE, so is not available in Codepagex. However, Microsoft's CP932 is also available, which is effectively SHIFT_JIS, so you can use that.

    Config

    It's not enabled by default, so you need to enable in in config (and re-compile with mix deps.compile codepagex --force if necessary):

    config :codepagex, :encodings, [
      "VENDORS/MICSFT/WINDOWS/CP932"
    ]
    

    Encode/Decode

    iex(1)> shift_jis = "VENDORS/MICSFT/WINDOWS/CP932"
    "VENDORS/MICSFT/WINDOWS/CP932"
    iex(2)> test = Codepagex.from_string!("テスト", shift_jis)
    <<131, 101, 131, 88, 131, 103>>
    iex(3)> Codepagex.to_string!(test, shift_jis)
    "テスト"
    

    Example repo

    I made an example repo where you can see it in action.