Given this piece of code:
Private Declare Auto Function GetPrivateProfileSection Lib "kernel32" _
(ByVal lpAppName As String, _
ByVal lpszReturnBuffer As Byte(), _
ByVal nSize As Integer, ByVal lpFileName As String) As Integer
Public Class IniClassReader
Public Function readWholeSection(iniFile as String, section as String) as String()
Dim buffer As Byte() = New Byte(SECTIONLENGTH) {}
GetPrivateProfileSection(section, buffer, SECTIONLENGTH, iniFile)
Dim sectionContent As String = Encoding.Default.GetString(buffer)
' Skipped code embedded in the function below, not the point of the question
return processSectionContent(sectionContent)
End Function
End Class
I figured out that buffer
contains a sequence of bytes interspersed with NULL
characters (\0
). Hence, sectionContent
value is seen by the spying variable feature as 'e n t r i e 1 = v a l u e 1 e n t r i e 2 = v a l u e 2'
. Each pair key/value is as expected followed by two NULL
characters instead of one.
I don't see why each character is stored as a two byte value. Replacing Default
by UTF8
gives the same result. I tried with a INI file encoded in UTF8 and Windows-1252 (so called "ANSI" by Microsoft).
I know how to get ride of those extra bytes:
Dim sectionContent As String = Encoding.Default.GetString(buffer)
sectionContent = sectionContent.Replace(Chr(0) & Chr(0), vbNewLine).Replace(Chr(0), "")
But I want to understand what's going on here to apply the best solution instead of some sloppy hack working only on some cases.
The bytes are UTF-16 encoded text. It looks like null character padding because all of your text consists of characters whose encodings fit in the low byte.
The Windows API exposes both an "A" and a "W" version of the function, with the "A" version working in narrow strings and the "W" version working in wide strings. The default for the Windows NT family tree (thus all Windows since XP) is wide as UCS-2/UTF-16 is the "native" Windows character encoding.