Convert weird characters from lsusb output to stlink hla_serial in ruby

I currently use stlink board that have 'weird' iSerial given by lsusb: below an example:

lsusb -v -d 0483:3748 | grep iSerial
iSerial                 3 4ÿkVK607 C
iSerial                 3 4ÿkVK60'7 C

I already got the corresponding hla_serial that is used in openocd config files, it looks like "\x34\x3f\x6b\x06\x56\x4b\x36\x30\x27\x37\x20\x43" or if we remove hex sign => "343f6b06564b363027372043"

I want to be able to do conversion between those two formats, but it's not so easy. I know that the following list of hla_serial (ref) given to me by external team correspond to the following weird strings from lsusb (dec) (at least openocd work with those.

ref = ["343f6b06564b363027372043","343f6b06564b363011372043","343f6906564b363043372043","343f6c06564b363029372043","343f6b06564b363014362043","343f6d06564b363016372043","343f7106564b363016362043"]
dec = ["4ÿqVK606 C","4ÿmVK607 C","4ÿkVK606 C" ,"4ÿlVK60)7 C" ,"4ÿiVK60C7 C" ,"4ÿkVK607 C"  ,"4ÿkVK60'7 C"]

I've tried lots of pack/unpack but I was not able to find correct encoding/decoding. The closets things was

">4ÿkVK60'7 C".unpack('H*')
["34c3bf6b564b363027372043"]

So when trying to decode all iSerial:

> ref = "343f6b06564b363027372043"
=> "343f6b06564b363027372043"
> ["4ÿqVK606 C","4ÿmVK607 C","4ÿkVK606 C" ,"4ÿlVK60)7 C" ,"4ÿiVK60C7 C" ,"4ÿkVK607 C"  ,"4ÿkVK60'7 C"].each {|t| $stdout << "ref:#{ref}\ndec:#{t.unpack('H*').first}\n\n"}
ref:343f6b06564b363027372043
dec:34c3bf71564b3630362043

ref:343f6b06564b363027372043
dec:34c3bf6d564b3630372043

ref:343f6b06564b363027372043
dec:34c3bf6b564b3630362043

ref:343f6b06564b363027372043
dec:34c3bf6c564b363029372043

ref:343f6b06564b363027372043
dec:34c3bf69564b363043372043

ref:343f6b06564b363027372043
dec:34c3bf6b564b3630372043

ref:343f6b06564b363027372043
dec:34c3bf6b564b363027372043

Last on this list maybe the closest but did not match the ref. You can also note that some of decoded numbers do not have correct length.

I tried to get directly output of shell command in ruby and postprocess it

> require 'open3'
=> true
> o,e,s = Open3.capture3('lsusb -v -d 0483:3748 | grep iSerial')
> dec = o.split("\n").map {|l| l.sub(/.*iSerial.* 3 /,'').unpack('H*').first}
34c3bf7106564b363016362043
34c3bf6d06564b363016372043
34c3bf6b06564b363014362043
34c3bf6c06564b363029372043
34c3bf6906564b363043372043
34c3bf6b06564b363011372043
34c3bf6b06564b363027372043

This is better, at least everyone have same length (26 char instead of 24 for the ref) What is strange is that all of them are really close to the strings in ref and if we remove two chars like this:

dec: 34c3bf6b06564b363027372043
ref: 34_3_f6b06564b363027372043

Then i got the exact same list than in ref table.

Now I just want to know the reason of those 2 extra-chars. Should I decode differently?

Solution

What you need is "USB ECN: UNICODE UTF-16LE for String Descriptors" from 2005 Turns out that the original spec just said "unicode string"

chuckle

In the ECN it is described that USB string descriptors should officially contain utf16-le strings because that's what most programmers had been doing. So if you do any decoding at all it should be UTF-16LE... and it should be on the data you get from the USB subsystem.

You must also realize that what you see in your console isn't utf16. it's whatever your console has decided to render. Now the bytes you get on stdout might be utf16 but when you print them and copy paste them god only knows what you get. Your text editor is probably going to paste utf8. and I don't even want to think about what happens there.

You don't specify what language you are using, but looks like it's python... I'll be using python3 since it has the convenient bytes type and better unicode string stuff.

It looks like the devs at st had decided to use the unicode code points between 0 and 255 to encode the bytes of their version number. And what i mean by 'decided' is that they just did it that way and we're stuck with it now ;)

so that hex encoded serial number you see: 343f6b06564b363027372043 is actually b"\x34\x00\x3F\x00\x6B\x00\x06\x00\x56\x00\x4B\x00\x36\x00\x30\x00\x27\x00\x37\x00\x20\x00\x43\x00" as stored in the USB string descriptor. You can see that its a valid utf16 little endian encoded string.

In python3, when you ask pyusb for the serial number you'll get an actual 'unicode string' i.e. the sequence of code points that the utf16 USB string represents. All you really need are the code points thogh so:

Here's an example using python3

import usb

devicess = usb.core.find(idVendor=0x0483, find_all=True)
for dev in devicess:
    # encode the unicode character's code points as single bytes and concatenate them
    serial = b''.join([ord(uchar).to_bytes(1, byteorder='little') for uchar in dev.serial_number])

    # encode and print the serial number as a hex string
    print(serial.hex())

Or you could also use pyswd that does that for us.

import swd

for dev in swd.stlink.usb.StlinkUsb._find_all_devices():
    print(dev.serial_no)