Importing a database dump using cmd works properly:
tar -xf dump.zip
mysql.exe --user=Username --passsword=Password Database < dump.sql
when I try something similar in a PoweShell
Expand-Archive -Path dump.zip
Get-Content dump.sql | mysql.exe --user-Username --password=Password Database
the file gets imported but utf-8 characters in the data in the dump file are not imported as such (resulting in '??' in the data). I'm using Get-Content here as the <
is currently not supported in PowerShell. If I open the dump.sql file in an editor (Notepad++) it seems to be in utf-8 format.
What would be the correct way to import the dump file?
The dump file created in mariadb 10.6.18 contains the proper settings: In the preamble:
/*!40101 SET NAMES utf8mb4 */;
Befor each table:
/*!40101 SET character_set_client = utf8 */;
Imported is in mariadb 10.6.18 The results of tar -xf and Expand-Archive are identical.
Preface:
As TessellatingHeckler's answer hints at, per the docs, the following may work in your specific, mysql.exe
-based case:
mysql.exe --user=Username --passsword=Password Database -e 'source dump.sql'
That said, the answer below answers your question generically, irrespective of the specific external program you're targeting.
The PowerShell-native, but slow solution: Ensure that PowerShell sends the .NET strings it has read from the file UTF-8-encoded to your external program via its object-based pipeline, by setting the $OutputEncoding
preference variable accordingly:
# Tell PowerShell to send text (strings) with UTF-8 encoding
# to external programs.
# NOTE: No longer necessary in PowerShell 7
$OutputEncoding = [Text.UTF8Encoding]::new()
# Use -Raw to read the entire file at once, which improves performance.
# Omit it if the file is too large to fit into memory.
Get-Content -Raw dump.sql | mysql.exe --user-Username --password=Password Database
Bewilderingly, in Windows PowerShell (the legacy, ships-with-Windows, Windows-only edition of PowerShell whose latest and last version is 5.1), $OutputEncoding
defaults to ASCII(!), which means that any non-ASCII characters are "lossily" transliterate to verbatim ?
characters.
PowerShell (Core) 7 now more sensibly defaults to UTF-8; while that in itself is commendable, it is at odds with how it decodes output from external programs by default, which on Windows still defaults to the legacy system locale's OEM code page; see GitHub issue #7233 for a discussion.
The reason that the solution is slow is twofold:
Because the file's raw bytes are not being passed through, each line read by Get-Content
is (a) first decoded into a .NET string in memory and (b) thereafter re-encoded when passed on to an external program (based on $OutputEncoding
).
By default, Get-Content
reads a file line by line, and decorates each in-memory string with (rarely used) ETS (Extended Type System) properties, which is costly both in terms of memory consumption and runtime.
Adding -Raw
reads the entire file at once, i.e. returns only a single, (typically) multiline string, which too has this decoration, but since it is only performed once, it is negligible; however, if your input file is unusually large, use of -Raw
may not be an option.
-NoExtendedMember
switch, but no one has stepped up to implement it yet.The faster, but non-obvious solution: Delegate to cmd.exe
via a cmd.exe /c
CLI call, to take advantage of cmd.exe
's pipeline / input redirections support for raw byte data.
cmd /c 'mysql.exe --user=Username --passsword=Password Database < dump.sql'
For background information, including about an important enhancement in PowerShell v7.4+ with respect to allowing the normally object-based pipeline situationally act like a raw byte conduit, as in cmd.exe
, see this answer.