batch-filefindstr

find and replace text within a csv using partial and wildcard up to the first delimiter


I have a csv generated from a SQL output and I am trying to replace partial strings of the csv with a generic string. I have tried FART, (always makes me laugh), FINDSTR and POWERSHELL but Idont think my skills are enough and Google searching is quite hard because of the caveats I stipulate.

The txt file is like this (sample data).

course_id,user_id,role,status
2122-DAC00002,123456,sometxt,active
2122-DAC00002,13456,sometxt,active
2122-DAC00010/1,987654,sometxt,active
2122-DAC00010,55669988,sometxt,active
2122-DAC00010/2,112233,sometxt,active
2122-DAC00010,852349,sometxt,active

The headers can be ignored, the first part is the part I need changing en-masse so search for 2122-* until the first , (the 2122-* may be slightly different character lengths but will always stop at the , delimiter, and then replace all the first iterations of 2122-* with 2122-GCE.

So the final output would be :

course_id,user_id,role,status
2122-GCE,123456,sometxt,active
2122-GCE,13456,sometxt,active
2122-GCE,987654,sometxt,active
2122-GCE,55669988,sometxt,active
2122-GCE,112233,sometxt,active
2122-GCE,852349,sometxt,active

I need to automate this, so within a .bat file, or a .ps1 would be good.

Hope this makes sense?

[EDIT /]

Apologies, missed my code attempts off.

My findstr attempt:

findstr /V /i /R '2122-.*' '2122-GCE' "E:\path to file\file1.csv" > "E:\path to file\output3.csv"

findstr output:

course_id,user_id,role,status
2122-GCENAC00025,123456,sometxt,active
2122-GCENAC00025,568974,sometxt,active
2122-GCENAC00025,223366,sometxt,active
2122-GCENAC00025,987654,sometxt,active

As you can see above, its prefixed and not replaced.

My FART attempt:

E:\path to\fart "E:\path to file\file1.csv" 2122-N* 2122-GCE
E:\path to\fart "E:\path to file\output3.csv" 2122-D? 2122-GCE

My PS1 attempt was in an ISE and I closed without saving.

edit, I had a ps window still open:

((Get-Content -path E:\path to file\file1.csv -Raw) -replace '2122-*','2122-GCE') | Set-Content -Path E:\path to file\file2.csv

Some iterations of the replace command: -replace '[^2122]*'

type file1.csv | ForEach-Object { $_ -replace "2122-*", "2122-GCE" } | Set-Content file2.csv

Solution

  • It looks like the first data value always exists with a non-empty value and not starting with ; and must be always replaced by the same value and the second data column contains on all data rows always a value and for that reason there is never ,, after first data value in a data row.

    There can be used the following commented batch file under these conditions:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "SourceFile=E:\path to file\file1.csv"
    set "OutputFile=E:\path to file\file2.csv"
    if not exist "%SourceFile%" exit /B
    
    rem Read just the header row from source CSV file and
    rem create the output CSV file with just this line.
    for /F "usebackq delims=" %%I in ("%SourceFile%") do (
        >"%OutputFile%" echo(%%I
        goto DataRows
    )
    
    rem Process just all data rows in source file by skipping the header row
    rem with splitting each line into two substrings using the comma as string
    rem delimiter with first substring assigned to loop variable I and not used
    rem further and everything after first series of commas assigned to the
    rem next but one loop variable J according to the ASCII table. The command
    rem ECHO is used to output the first fixed data value and a comma and the
    rem other data values of each data row. This output is appended to just
    rem created output CSV file.
    
    :DataRows
    (for /F "usebackq skip=1 tokens=1* delims=," %%I in ("%SourceFile%") do echo 2122-GCE,%%J)>>"%OutputFile%"
    
    endlocal
    

    To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.