powershellcsvsplitselect-string

Search for and combine two fields in a csv with powershell


I want to search for a string in every file in the current directory, combine the results, and output a text file.

The string that I'm looking for is "Start l". This will return two results, "Start latitude" and "Start longitude". I want to combine this data into a single line for each file.

Here's an example:

select-string -path .\CC2019012_20220601_125156.txt -Pattern 'Start l'

Which yields:

CC2019012_20220601_125156.txt:10:% Start latitude,39.0285453

CC2019012_20220601_125156.txt:11:% Start longitude,-74.4891558

What I want to create is:

CC2019012_20220601_125156,39.0285453,-74.4891558

Here's how I do it under Linux:

for file in *.txt
do
        lat=$(grep -i "start l" $file | dos2unix | sed -e 's/^..*,//' | sed -e 'N;s/\n/,/')
        print $file,$lat
done >| locations.sam

How do I achieve the same results in powershell?

ETA: I've found something that works... Maybe someone can improve on what I've come up with?

foreach ($file in gci *.txt)
{
$base = (Get-Item "$file" ).Basename
$var =  select-string -path $file -Pattern 'Start l' | %{$_ -replace "..*,",""}
$join = $var -join ","
echo "$base,$join" | out-file -Append .\locations.sam
}

Example CSV file:

% Device,CC2206006
% File name,CC2206006_20220712_103027
% Cast time (UTC),2022-07-12 10:30:27
% Cast time (local),2022-07-12 10:30:27
% Sample type,Cast
% Cast data,Down
% Location source,GPS
% Default latitude,32
% Default altitude,0
% Start latitude,34.0449595
% Start longitude,-73.4006081
% Start altitude,-30.834999084472656
% Start GPS horizontal error(Meter),32.823001861572266
% Start GPS vertical error(Meter),641.02001953125
% Start GPS number of satellites,3
% End latitude,34.0461822
% End longitude,-73.4019048
% End altitude,-33.549999237060547
% End GPS horizontal error(Meter),214.10699462890625
% End GPS vertical error(Meter),463.32901000976562
% End GPS number of satellites,3
% Cast duration (Seconds),84.2
% Samples per second,5
% Electronics calibration date,0001-01-01
% Conductivity calibration date,2022-02-10
% Temperature calibration date,2022-02-09
% Pressure calibration date,2022-02-08
%
Pressure (Decibar),Depth (Meter),Temperature (Celsius),Conductivity (MicroSiemens per Centimeter),Specific conductance (MicroSiemens per Centimeter),Salinity (Practical Salinity Scale),Sound velocity (Meters per Second),Density (Kilograms per Cubic Meter)
0.15,0.14997177615318641,22.042546929715112,43719.05456967745,46467.5675777045,30.112056249182295,1521.6050401503587,1020.4990404995208
21.75,21.697195002693061,11.70683765411377,37383.1953125,50921.296550740095,32.706304507268229,1493.4325554024592,1024.9609682732307
22.119968011090421,22.065484920621859,11.69420678760779,37442.975677761278,51020.282158553651,32.775522366290218,1493.4778352736607,1025.0187132969579

Solution

  • You could also use the very fast switch for this to get the values you want using regex, collect these in a variable and when done write it all out to the locations.sam file in one go like this:

    # go through the text files, filter out the wanted values 
    # and collect formatted lines in variable $data
    $data = foreach ($file in (Get-ChildItem -Filter '*.txt')) {
        $values = switch -Regex -File $file.FullName {
            '% Start l(at|ong)itude' { ($_ -split ',')[-1]}
        }
        # output a formatted line
        '{0},{1}' -f $file.Basename, ($values -join ',')
    }
    
    # now append the data to the output file
    # using PassThru, it will also display in the console
    $data | Add-Content -Path .\locations.sam -PassThru