I would like to create a binary blob from a binary character string, in the same way as when reading in a binary blob from a file, into a buffer, using .NET file stream. Then I would like to read 2 bytes from a particular offset in blob.
I create a file like this:
echo "AAAABBBB" > .\zzblob.txt
$bytes = "AAAABBBB`r`n"
$aa = [system.bitconverter]::touint16($bytes, 0)
# FAIL!
# Checking the type:
$bytes.GetType() | select Name, BaseType | ft -HideTableHeaders
# String System.Object
Now, doing the same using a stream buffer, we get something else.
$fp = ".\zzblob.txt"
$bf = (new-object byte[](256))
$sp = New-Object System.IO.FileStream($fp, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$sp.Length
$sp.Read($bf, 0, 256)
$sp.close()
$aa = [system.bitconverter]::touint16($bf, 2) # ..AA
d2h $aa
# 0x4141 ## OK!
# Checking type:
$bf.GetType() | select Name, BaseType | ft -HideTableHeaders
# Byte[] System.Array
How can I convert a string from String System.Object
to Byte[] System.Array
?
You can not convert an arbitrary .NET string to bytes without choosing a specific character encoding that should be applied to it.
The reasons is that different character encodings use different byte representations of characters, notably with respect to the number of bytes required to encode a single characters, which can even vary from character to character, as is the case with
UTF-8.
Whoever must interpret the resulting byte array as a string again must then use the same encoding for de-coding.
If all the characters in a given string happen to fall into the 8-bit subrange of Unicode code points, i.e. the 256 characters occupying the Unicode code points from 0x0
to 0xFF
(255
) (in Unicode terms: U+0000
to U+00FF
), you can use a shortcut, assuming that you want to use the Unicode code points as byte values:
Use [byte[]] [char[]] $string
(or [byte[]] $string.ToCharArray()
), as also shown in js2010's answer:
$string = 'AAAABBBB'
# Convert TO a byte array.
$byteArray = [byte[]] [char[]] $string
# OR:
# $byteArray = [byte[]] $string.ToCharArray()
# Convert back FROM a byte array.
[string]::new($byteArray) # [char[]] cast optional
# OR, more PowerShell-idiomatically, but less efficiently:
# -join [char[]] $byteArray
Caveat: Any character outside that range, i.e. one with a code point of U+0100
(256
) or above, e.g. €
(EURO SIGN, U+20AC
), breaks this approach, because its code point is by definition too large to fit into a [byte]
instance:
# -> ERROR:
# Cannot convert value "€" to type "System.Byte".
# Error: "Value was either too large or too small for an unsigned byte."
[byte[]] [char[]] '€'
This approach is tantamount to choosing the fixed-width, single-byte ISO-88591 character encoding for the byte representation, because the 8-bit subrange of Unicode coincides with this encoding.
That is, the equivalents of the above operations are (note that in PowerShell (Core) 7 you can more simpy use [Text.Encoding]::Latin1
in lieu of [Text.Encoding]::GetEncoding(28591)
):
$string = 'AAAABBBB'
# Convert TO a byte array.
$byteArray = [Text.Encoding]::GetEncoding(28591).GetBytes($string)
# Equivalent of:
# $byteArray = [byte[]] [char[]] $string
# Convert back FROM a byte array.
[Text.Encoding]::GetEncoding(28591).GetString($byteArray)
# Equivalent of:
# [string]::new($byteArray)
As for writing the byte representations to a file:
If you have an in-memory byte representation, it is safest to write to and read from files as bytes rather than via a character encoding:
$string = 'AAAABBBB'
$byteArray = [byte[]] [char[]] $string
# NOTE: Sadly, the syntax for requesting byte processing differs
# between Windows PowerShell and PowerShell 7
# (-Encoding Byte vs. -AsByteStream), so we construct an
# an edition-specific hashtable to be used for splatting below.
$encodingArg = if ($IsCoreClr) { @{ AsByteStream = $true } }
else { @{ Encoding = 'Byte' } }
# WRITE the byte array to a file.
Set-Content blob.txt @encodingArg -Value $byteArray
# READ the byte array from a file, as such.
# Note: -Raw -ReadCount 0 reads the entire file *at once* into
# a [byte[]] array.
$byteArrayFromFile =
Get-Content blob.txt @encodingArg -Raw -ReadCount 0
Alternatively, in PowerShell (Core) 7, you can use -Encoding Latin1
[1] with Set-Content
and Get-Content
to directly write and read 8-bit-Unicode range strings, but that doesn't work in Windows PowerShell, where you'd have to use .NET APIs directly.
[1] The ISO-88591 encoding that -Encoding Latin1
refers to is closely related to, but not identical to Windows-1252, so using the latter - which -Encoding Default
may refer to in Windows PowerShell, depending on the system locale (e.g. on US-English and Western European machines) - is not an option.