vb.netencodingutf-8character-encodingcodepage-437

Converting char from CP437 encoding to UTF-8 encoding always yields the same character code, thus not the same character


Problem

I'm trying to convert a character and/or byte array from the CP437 encoding to UTF-8 (Encoding.UTF8). The problem is that no matter what I try the code always yields the same character code, but since the two encodings have a different set of characters mapped to the character codes the resulting char is not the same.

As an example I'm trying to convert the character with char code 3 from CP437 (a heart: ) to UTF-8, and I still want it to be the same character. However when converting to UTF-8 it still uses char code 3 which results in a control character called ETX (see UTF-8's codepage layout for a list of characters).


My attempts

Here are some of my attempts:

(General code)

Public Shared ReadOnly CP437 As Encoding = Encoding.GetEncoding("IBM437")
Public Shared ReadOnly BytesToConvert As Byte() = New Byte(3 - 1) {3, 4, 5} 'Characters: ♥, ♦, ♣.

Public Sub DebugEncodedArray(ByVal Bytes As Byte(), ByVal Encoding As Encoding)
    Dim ResultingString As String = Encoding.GetString(Bytes)
    MessageBox.Show( _
            String.Format("Encoding: {1}{0}" & _
                          "String: ""{2}""{0}" & _
                          "Bytes: {{{3}}}{0}", _
                          Environment.NewLine, _
                          Encoding.EncodingName, _
                          ResultingString, _
                          String.Join(", ", Bytes)), _
        "Debug", MessageBoxButtons.OK, MessageBoxIcon.Information _
    )
End Sub

Using Encoding.Convert():

Dim ConvertedBytes As Byte() = Encoding.Convert(CP437, Encoding.UTF8, BytesToConvert)
DebugEncodedArray(ConvertedBytes, Encoding.UTF8)


Using a StreamWriter, writing to a MemoryStream with a specific encoding:

Using MStream As New MemoryStream(16)
    Using Writer As New StreamWriter(MStream, CP437)
        Writer.Write(CP437.GetChars(BytesToConvert))
    End Using

    Dim UTF8Bytes As Byte() = Encoding.Convert(CP437, Encoding.UTF8, MStream.ToArray())
    DebugEncodedArray(UTF8Bytes, Encoding.UTF8)
End Using


Writing to a file, then reading it and convert the bytes (not optimal for what I need this code for):

File.WriteAllText("C:\Users\Vincent\Desktop\test.txt", CP437.GetString(BytesToConvert), CP437)

Dim FileBytes As Byte() = File.ReadAllBytes("C:\Users\Vincent\Desktop\test.txt")
Dim UTF8Bytes As Byte() = Encoding.Convert(CP437, Encoding.UTF8, FileBytes)

DebugEncodedArray(UTF8Bytes, Encoding.UTF8)


Results

All the above attempts give the same result:

UTF-8 result

and also if I pass CP437 to DebugEncodedArray() instead of Encoding.UTF8:

CP437 result


Expected result

The result I am expecting is:

Dim UTF8Bytes As Byte() = Encoding.UTF8.GetBytes("♥♦♣")
DebugEncodedArray(UTF8Bytes, Encoding.UTF8)

Expected UTF-8 result

Any clues on what I'm doing wrong?


Solution

  • The low range of CP437 is contextual. I think you have proven that for 1-31 & 127 you are going to need a simple lookup as .Net is interpreting them in the control code context not in the graphical context - i.e. ◙ (0xA) is \n not the equivalent Unicode code point for that graphic.