csvutf-8mfcwindows-ceshift-jis

Write a CSV file in Shift-JIS (MFC VC++, Windows Embedded - WinCE)


As the title says, I have been trying to write data that the user enters into a CEdit control to a file.

The system is a handheld terminal running Windows CE, in which my test application is running, and I try to enter test data (Japanese characters in Romaji, Hiragana, Katakana and Kanji mixed along with normal English alphanumeric data) that initially is displayed in a CListCtrl. The characters display properly on the handheld display screen in my test application UI. Finally, I try to read back the data from the List control and write it to a text CSV file. The data I get on reading back from the control is correct, but on writing it to the CSV, things mess up and my CSV file is unreadable and shows strange symbols and nonsense alphanumeric garbage.

I searched about this, and I ended up with a similar question on stackOverflow: UTF-8, CString and CFile? (C++, MFC)

I tried some of their suggestions and finally ended up with a proper UTF-8 CSV file.

The write-to-csv-file code goes like this:

CStdioFile cCsvFile = CStdioFile();
cCsvFile.Open(cFileName, CFile::modeCreate|CFile::modeWrite);

char BOM[3]={0xEF, 0xBB, 0xBF};  // Utf-8 BOM
cCsvFile.Write(BOM,3);           // Write the BOM first

for(int i = 0; i < M_cDataList.GetItemCount(); i++)
{
    CString cDataStr = _T("\"") + M_cDataList.GetItemText(i, 0) + _T("\",");
    cDataStr += _T("\"") + M_cDataList.GetItemText(i, 1) + _T("\",");
    cDataStr += _T("\"") + M_cDataList.GetItemText(i, 2) + _T("\"\r\n");
    CT2CA outputString(cDataStr, CP_UTF8);
    cCsvFile.Write(outputString, ::strlen(outputString));
}
cCsvFile.Close();

So far it is OK. Now, for my use case, I would like to change things a bit such that the CSV file is encoded as Shift-JIS, not UTF-8. For Shift-JIS, what BOM do I use, and what changes should I make to the above code?

Thank you for any suggestions and help.


Solution

  • Codepage for Shift-JIS is apparently 932. Use WideCharToMultiByte and MultiByteToWideChar for conversion. For example:

    CStringW source = L"日本語ABC平仮名ABCひらがなABC片仮名ABCカタカナABC漢字ABC①";
    CStringA destination = CW2A(source, 932);
    CStringW convertBack = CA2W(destination, 932);
    
    //Testing:
    ASSERT(source == convertBack);
    AfxMessageBox(convertBack);
    

    As far as I can tell there is no BOM for Shift-JIS. Perhaps you just want to work with UTF16. For example:

    CStdioFile file;
    file.Open(L"utf16.txt", CFile::modeCreate | CFile::modeWrite| CFile::typeUnicode);
    
    BYTE bom[2] = { 0xFF, 0xFE };  
    file.Write(bom, 2);
    CString str = L"日本語";
    file.WriteString(str);
    file.Close();
    

    ps, according to this page there are some problems between codepage 932 and Shift-JIS, although I couldn't duplicate any errors.