powershellencodingutf-8powershell-5.0byte-order-mark

Modify the output of this script so that it is read as BOM-less UTF-8


I have several .csv files in a folder with the second column empty and I would like to fill it with the data present in as many csv files with the same name+column3.

Example:

firstfile.csv

header1,translation,source
first,,"third"
one,,three

firstfile_column3.txt (name of the file+_column3.txt)

source
second, fifth
two

firstfileoutput.csv (source became translation)

header1,translation,source
first,"second, fifth","third"
one,two,three

I was able to get it to work, but the accents and Asian characters are wrong. I am using Powershell 5 on Windows and should modify the output so that it is read as BOM-less UTF-8. Trying to edit the last few strings still can't solve it.

param(
    $SourceDir = $PWD,
    $OutDir = $PWD,
    $OutFileSuffix = "output" # Define the suffix for the output file.
)

# Get all primary CSV files in the source directory.
$csvFiles = Get-ChildItem -Path $SourceDir -Recurse -Filter "*.csv"

foreach ($csvFile in $csvFiles) {
    # Construct the name for the corresponding _column3 file.
    $column3FileName = "{0}_column3.txt" -f $csvFile.BaseName
    $column3FilePath = Join-Path -Path $SourceDir -ChildPath $column3FileName
    
    # Check if the _column3 file exists.
    if (Test-Path $column3FilePath) {
        # Import the primary CSV file and the corresponding _column3 file.
        $primaryCsv = Import-Csv -Path $csvFile.FullName
        $column3Data = Get-Content $column3FilePath
        
        # Assuming the first line in the _column3 file is a header and we skip it.
        $column3Values = $column3Data | Select-Object -Skip 1

        # Update the second column (translation) in the primary CSV with data from the _column3 file.
        for ($i = 0; $i -lt $primaryCsv.Count; $i++) {
            $primaryCsv[$i].translation = $column3Values[$i]
        }

        # Construct the output file path.
        $outputFilePath = Join-Path -Path $csvFile.DirectoryName -ChildPath ("{0}{1}.csv" -f $csvFile.BaseName, $OutFileSuffix)

        # Export the updated CSV data to a new file.
        $primaryCsv | Export-Csv -Path $outputFilePath -NoTypeInformation -Encoding UTF8
    }
    else {
        Write-Warning "Corresponding column3 file not found for $($csvFile.Name)"
    }
}

Solution

  • To ensure consistent, BOM-less UTF-8 handling in Windows PowerShell:

    Note that none of these things are necessary in PowerShell (Core) 7+, which consistently defaults to (BOM-less) UTF-8, across all built-in cmdlets (as well as when reading source code from files).


    Background information:

    The upshot is: