pythonunicodenormalizationunicode-normalization

How can I convert all Japanese hiragana to katakana characters in Python?


From hiragana and katakana charts, it looks like it should be possible to "normalize" japanese text into hiragana or katakana. It's pretty straight-forward to build a table and implement a dictionary/regex table for search/replace. Does anyone know where the work's already been done?


Solution

  • Why would you want to do this though? Katakana is traditionally used for words borrowed from other languages, while hiragana is used for the Japanese native language. By normalizing the japanese text to one form or another you could actually be hindering the reading of it (at least to me it would be harder since I am loosing context by having it normalized).

    But in answer to your question, this seems to do what your asking: JCONV