I have several .csv files in a folder with the second column empty and I would like to fill it with the data present in as many csv files with the same name+column3.
Example:
firstfile.csv
header1,translation,source
first,,"third"
one,,three
firstfile_column3.txt (name of the file+_column3.txt)
source
second, fifth
two
firstfileoutput.csv (source became translation)
header1,translation,source
first,"second, fifth","third"
one,two,three
I was able to get it to work, but the accents and Asian characters are wrong. I am using Powershell 5 on Windows and should modify the output so that it is read as BOM-less UTF-8. Trying to edit the last few strings still can't solve it.
param(
$SourceDir = $PWD,
$OutDir = $PWD,
$OutFileSuffix = "output" # Define the suffix for the output file.
)
# Get all primary CSV files in the source directory.
$csvFiles = Get-ChildItem -Path $SourceDir -Recurse -Filter "*.csv"
foreach ($csvFile in $csvFiles) {
# Construct the name for the corresponding _column3 file.
$column3FileName = "{0}_column3.txt" -f $csvFile.BaseName
$column3FilePath = Join-Path -Path $SourceDir -ChildPath $column3FileName
# Check if the _column3 file exists.
if (Test-Path $column3FilePath) {
# Import the primary CSV file and the corresponding _column3 file.
$primaryCsv = Import-Csv -Path $csvFile.FullName
$column3Data = Get-Content $column3FilePath
# Assuming the first line in the _column3 file is a header and we skip it.
$column3Values = $column3Data | Select-Object -Skip 1
# Update the second column (translation) in the primary CSV with data from the _column3 file.
for ($i = 0; $i -lt $primaryCsv.Count; $i++) {
$primaryCsv[$i].translation = $column3Values[$i]
}
# Construct the output file path.
$outputFilePath = Join-Path -Path $csvFile.DirectoryName -ChildPath ("{0}{1}.csv" -f $csvFile.BaseName, $OutFileSuffix)
# Export the updated CSV data to a new file.
$primaryCsv | Export-Csv -Path $outputFilePath -NoTypeInformation -Encoding UTF8
}
else {
Write-Warning "Corresponding column3 file not found for $($csvFile.Name)"
}
}
To ensure consistent, BOM-less UTF-8 handling in Windows PowerShell:
On reading:
Use -Encoding utf8
to ensure that BOM-less UTF-8 files are read as such:
Import-Csv -Encoding utf8 $csvFile.FullName
Get-Content -Encoding utf8 $column3FilePath
On writing:
Avoid -Encoding utf8
, because it creates files with BOM.
A non-obvious workaround for your Export-Csv
call is needed, which relies on the fact that New-Item
does create BOM-less UTF-8 files, by default and invariably:
$null = New-Item -Force $outputFilePath -Value (
$primaryCsv | ConvertTo-Csv -NoTypeInformation | Out-String
)
Note that none of these things are necessary in PowerShell (Core) 7+, which consistently defaults to (BOM-less) UTF-8, across all built-in cmdlets (as well as when reading source code from files).
Unfortunately, Windows PowerShell and its file-reading cmdlets exhibit inconsistent behavior with respect to what character encoding to assume in the absence of a BOM.
In your code Import-Csv
happens to assume UTF-8, ...
... while Get-Content
assumes ANSI, i.e. the system's active legacy ANSI code page, as does the PowerShell engine itself when it reads source code.
Similarly, there's no consistence when writing to files: Set-Content
uses ANSI encoding, whereas Out-File
(and its virtual aliases, >
and >>
)
By contrast, PowerShell (Core) 7+ and its cmdlets - both reading and writing ones - now consistently default to (BOM-less) UTF-8
The upshot is:
In Windows PowerShell, to be safe, always use -Encoding utf8
if you know you're dealing with UTF-8-encoded files - whether they do or do not have a BOM (in the former case, using -Encoding utf8
is technically redundant, but causes no harm).
The same applies if you want to write to files using UTF-8, but note that in Windows PowerShell -Encoding utf8
invariably creates UTF-8 files with a BOM:
See this answer for plain-text workarounds (the New-Item
workaround is shown above).
By contrast, PowerShell (Core) 7+ creates BOM-less UTF-8 files both by default and with -Encoding utf8
; in case you do want a BOM there, use -Encoding utf8BOM