batch-filewindows-scripting

batch script to read multiple files and count commas on each line


I am a newbie in batch script and I am trying to achieve the following: loop through multiple files, count the # of commas on each line then remove extra commas if it is greater than 10. I can only get to the point where I get the count but I am stuck there. All fields are required. No carriage return. The extra comma will only happen in the field after the 9th comma

Example of data in csv file:

Row 1, (good data)

123,235252,6376,test1,08/11/2022,2,0,1,EA,Required text, pencil ,pen

Row 2, (bad data)

456,235252,6376,test2,08/11/2022,2,0,1,EA,Required,text, pencil ,pen

In row 2, Required text has an extra comma and should be removed. It should look like the row above

So the logic I would like to have is If the number of commas is 10 for the row, I will go to the next line If the number of commas greater than 10, then I will remove the one after the 9th comma since extra commas will only happen in that field Please note, I cannot put double quote around the field

@echo on
setlocal enabledexpansion enableddelayedexpansion

pause


set "inputFile=test.csv"
set "searchChar=,"

set count16=16
pause
for /f "delims=" %%a in ('
 findstr /n "^" "%inputFile%"
 ') do for /f "delims=:" %%b in ("%%~a") do (
    set "line=%%a"

    pause
    for /f %%c in ('
     cmd /u /v /e /q /c"(echo(!line:*:=!)"^|find /c "%searchChar%"
     ') do  set count=%%c echo  %%c echo here echo %count% echo  %count16% echo %%c line %%b has %%c characters 
        if %count16% equ %count% (echo   ***hit)
    )
    pause
)
pause

Solution

  • Your question is very confusing. You had not clearly explained the details. More important: you have not posted in the question an example of the input data and the desired output; this would remedy the lack of details. So we can only guess what you want...

    I think your problem could be better explained if you pay attention to the columns that both input and output data have. Are you interested in the commas, or in the columns?

    This is my (attempt of a) solution. I used the example input file posted by Compo.

    @echo off
    setlocal EnableDelayedExpansion
    
    rem Process all files with .csv extension in current folder
    for %%F in (*.csv) do (
    
    ECHO/
    ECHO Input: "%%F"
    TYPE "%%F"
    
       rem Each file have comma-separated columns: may be 12 columns or more
       rem Keep columns 1-9 the same. After that, generate 3 columns more:
       rem the last and one-before-last columns are the same
       rem the two-before-last column contain the rest of columns separated by space
    
       (for /F "usebackq tokens=1-9* delims=," %%a in ("%%F") do (
    
          set "restAfter9=%%j"
          set "last="
          set "lastBut1="
          set "lastBut2="
          for %%A in ("!restAfter9:,=" "!") do (
             set "lastBut2=!lastBut2! !lastBut1!"
             set "lastBut1=!last!"
             set "last=%%~A"
          )
          echo %%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i,!lastBut2:~3!,!lastBut1!,!last!
    
       )) > "%%~NF.out"
    
    ECHO Output: "%%~NF.out"
    TYPE "%%~NF.out"
    
    )
    

    Output example:

    Input: "test1.csv"
    123,235252,6376,test1,08/11/2022,2,0,1,EA,Required text, pencil ,pen
    456,235252,6376,test2,08/11/2022,2,0,1,EA,Required,text, pencil ,pen
    789,235252,6376,test3,08/11/2022,2,0,1,EA,Re,qu,ir,ed,te,xt, pencil ,pen
    012,235252,6376,test4,08/11/2022,2,0,1,,Required,text, pencil ,pen
    789,235252,6376,test5,08/11/2022,2,0,1,,Re,qu,,,te,xt, pencil ,pen
    Output: "test1.out"
    123,235252,6376,test1,08/11/2022,2,0,1,EA,Required text, pencil ,pen
    456,235252,6376,test2,08/11/2022,2,0,1,EA,Required text, pencil ,pen
    789,235252,6376,test3,08/11/2022,2,0,1,EA,Re qu ir ed te xt, pencil ,pen
    012,235252,6376,test4,08/11/2022,2,0,1,Required,text, pencil ,pen
    789,235252,6376,test5,08/11/2022,2,0,1,Re,qu   te xt, pencil ,pen
    
    Input: "test2.csv"
    396,32124191,6376,CD1,08/11/2022,1,0,1,EA,Required Books,08/22/2022,12/10/2022,$60 basic supplies,37246613bA0,11800118,Required Books
    396,32124191,6376,CD2,08/11/2022,2,0,1,EA,Required Supplies,08/22/2022,12/10/2022,up to $60.00 basic supplies with comma,37246613bA1,11800118,Required Supplies
    396,32124191,6376,CD3,08/11/2022,2,0,1,EA,Required Supplies,08/22/2022,12/10/2022,up to $60.00 basic supplies with comma,37246613bA2,11800118,Required Supplies
    Output: "test2.out"
    396,32124191,6376,CD1,08/11/2022,1,0,1,EA,Required Books 08/22/2022 12/10/2022 $60 basic supplies 37246613bA0,11800118,Required Books 
    396,32124191,6376,CD2,08/11/2022,2,0,1,EA,Required Supplies 08/22/2022 12/10/2022 up to $60.00 basic supplies with comma 37246613bA1,11800118,Required Supplies
    396,32124191,6376,CD3,08/11/2022,2,0,1,EA,Required Supplies 08/22/2022 12/10/2022 up to $60.00 basic supplies with comma 37246613bA2,11800118,Required Supplies
    

    EDIT: New simpler solution added

    @echo off
    setlocal EnableDelayedExpansion
    
    rem General method to keep the first N columns the same
    rem and group additional fields in column N+1
    
    rem Define the number of "same" and "total" columns:
    set /A "same=12, last=17"
    
    rem Process all files with .csv extension in current folder
    for %%F in (*.csv) do (
    
    ECHO/
    ECHO Input: "%%F"
    TYPE "%%F"
    
       rem Process all lines of current file
       (for /F "usebackq delims=" %%a in ("%%F") do (
    
          set "line=%%a"
          set "head="
          set "tail="
          set "i=0"
    
          rem Split current line in comma-separated fields
          for %%b in ("!line:,=" "!") do (
             set /A i+=1
             if !i! leq %same% (         rem Accumulate field in "head" columns
                set "head=!head!%%~b,"
             ) else if !i! leq %last% (  rem Accumulate field in "tail" columns
                set "tail=!tail!%%~b,"
             ) else (  rem Combine one field from beginning of "tail" and accumulate last field
                for /F "tokens=1* delims=," %%x in ("!tail!") do set "tail=%%x %%y%%~b,"
             )
          )
    
          echo !head!!tail:~0,-1!
    
       )) > "%%~NF.out"
    
    ECHO Output: "%%~NF.out"
    TYPE "%%~NF.out"
    
    )
    

    Output example:

    Input: "test1.csv"
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13, field 14, field 15, field 16, field 17
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13, field 13a, field 13b, field 14, field 15, field 16, field 17 
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13, field 13a, field 13b, field 13c, field 14, field 15, field 16, field 17
    Output: "test1.out"
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13, field 14, field 15, field 16, field 17
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13  field 13a  field 13b, field 14, field 15, field 16, field 17 
    field 1, field 2, field 3, field 4, field 5, field 6, field 7, field 8, field 9, field 10, field 11, field 12, field 13  field 13a  field 13b  field 13c, field 14, field 15, field 16, field 17