I would like to get the UTF-8 Code of a character, have attempted to use streams but it doesn't seem to work:
Example: פ should give 16#D7A4, according to https://en.wikipedia.org/wiki/Pe_(Semitic_letter)#Character_encodings
Const adTypeBinary = 1
Dim adoStr, bytesthroughado
Set adoStr = CreateObject("Adodb.Stream")
adoStr.Charset = "utf-8"
adoStr.Open
adoStr.WriteText labelString
adoStr.Position = 0
adoStr.Type = adTypeBinary
adoStr.Position = 3
bytesthroughado = adoStr.Read
Msgbox(LenB(bytesthroughado)) 'gives 2
adoStr.Close
Set adoStr = Nothing
MsgBox(bytesthroughado) ' gives K
Note: AscW gives Unicode - not UTF-8
The bytesthroughado
is a value of byte()
subtype (see 1st output line) so you need to handle it in an appropriate way:
Option Explicit
Dim ss, xx, ii, jj, char, labelString
labelString = "ařЖפ€"
ss = ""
For ii=1 To Len( labelString)
char = Mid( labelString, ii, 1)
xx = BytesThroughAdo( char)
If ss = "" Then ss = VarType(xx) & " " & TypeName( xx) & vbNewLine
ss = ss & char & vbTab
For jj=1 To LenB( xx)
ss = ss & Hex( AscB( MidB( xx, jj, 1))) & " "
Next
ss = ss & vbNewLine
Next
Wscript.Echo ss
Function BytesThroughAdo( labelChar)
Const adTypeBinary = 1 'Indicates binary data.
Const adTypeText = 2 'Default. Indicates text data.
Dim adoStream
Set adoStream = CreateObject( "Adodb.Stream")
adoStream.Charset = "utf-8"
adoStream.Open
adoStream.WriteText labelChar
adoStream.Position = 0
adoStream.Type = adTypeBinary
adoStream.Position = 3
BytesThroughAdo = adoStream.Read
adoStream.Close
Set adoStream = Nothing
End Function
Output:
cscript D:\bat\SO\61368074q.vbs
8209 Byte() a 61 ř C5 99 Ж D0 96 פ D7 A4 € E2 82 AC
I used characters ařЖפ€
to demonstrate the functionality of your UTF-8 encoder (the alts8.ps1
PowerShell script comes from another project):
alts8.ps1 "ařЖפ€"
Ch Unicode Dec CP IME UTF-8 ? IME 0405/cs-CZ; CP852; ANSI 1250 a U+0061 97 …97… 0x61 a Latin Small Letter A ř U+0159 345 …89… 0xC599 Å� Latin Small Letter R With Caron Ж U+0416 1046 …22… 0xD096 Ð� Cyrillic Capital Letter Zhe פ U+05E4 1508 …228… 0xD7A4 פ Hebrew Letter Pe € U+20AC 8364 …172… 0xE282AC â�¬ Euro Sign