unicodeencodingutf-8character-encodingshift-jis

Converting UNICODE to shiftJIS


I have a Japanese client and have generated a large flat file (1.2 million rows) of data to send to them.

The file is UTF-8 encoded, which supports storing and displaying all the Japanese characters. The client wishes to receive this file in a shiftJIS encoded format that's designed for Japanese characters.

  1. From the wikipedia page I can get the conversion logic
  2. I see online converters such as motobit that let you convert encodings.

My issue is that my file is quite large and I will have to do this for several hundred more files repetitively. The copy-paste field on the online converter tool won't scale to that size and isn't quick enough.

Does anyone know of a free desktop application or perhaps even a ruby library that I could use to convert encodings? Or any other suggestions?

Thanks!


Solution

  • I guess what you want might be the nkf, Network Kanji Filter.

    You can convert a file from utf-8 into shift-jis like this:

    % nkf -s file-utf8.txt > file-sjis.txt
    

    manual page:
    http://linuxcommand.org/man_pages/nkf1.html

    wikipedia:
    http://en.wikipedia.org/wiki/Network_Kanji_Filter

    You can install nkf like this:

    % sudo yum install nkf 
    % sudo port install nkf
    % brew install nkf   
    

    Hope this helps.