windowspowershellbatch-filebrotli

Multiple brotli processes at the same time


I currently have to compress several thousand files (~40-80MB each) with brotli and get them ready for an s3 bucket. From what i've researched so far, brotli can't multithread the compression so, brotli.exe uses ~10% of the cpu. How can I iterate through the files in a folder and spawn multiple (brotli).exe files to work at the same time (8-10 processes should fill the cpu)? windows/powershell/vbs, I can try any suggestions

At the moment, I'm running this batch

for /R %%f in (*.) do (
"brotli" -Z "--output=E:\output\brotli\%%~nf" "%%f"
)

Solution

  • @ECHO OFF
    SETLOCAL
    
    :: set limit to #jobs
    
    SET /a limit=8
    
    :: establish a subdirectory in %temp%
    
    SET "control=%temp%\brotlicontrol"
    MD "%control%" 2>NUL
    
    :: Dummy for testing
    
    for %%f IN (fred anna george bill betty carl celia daphne john kelly ian zoe brian
                tracey susan colin jane selina valerie david stephen) DO (
    rem for /R %%f in (*.) do (
     CALL :wait
     START /min "brotli %%~nf" q75403766_2 "%%f"
    )
    
    GOTO :EOF
    
    :wait
    SET /a running=0
    FOR /f %%y IN ('DIR /a-d /b "%control%\*.flg" 2^>nul ^|FIND /c ".flg" ') DO SET /a running=%%y
    IF %running% geq %limit% timeout /t 1 >nul&GOTO wait
    GOTO :eof
    

    Here's a main batch which starts a subsidiary batch

    @echo off
    setlocal
    ECHO.>"%control%\%~n1.flg"
    REM "brotli" -Z "--output=E:\output\brotli\%~n1" %1
    :: Dummy - variable timeout 5-20 seconds
    SET /a exectime=(%RANDOM% %% 16) + 5
    timeout /t %exectime% >nul
    del "%control%\%~n1.flg"
    EXIT
    

    I had %%f iterate through a list of names for testing. All you need to do is to remove that test code and use your original code which I remmed out to process your list of files.

    The process calls the :wait routine, which counts the .flg files in the temporary directory, and sets running to that value.

    If the number running is greater than or equal to (geq) the limit established in the initialisation, wait 1 second and try again, otherwise the :wait routine terminates and the subsidiary batch q75403766_2 is started /min minimised and with the name brotli nameoffile. It's important that the first quoted parameter to start exists as it's used as the title of the started process. You could use "" if you want (for no title) but you should not omit this title string.

    The sub-process started (q75403766_2) first creates a .flg file with the name of the file being processed in the control directory, then runs the brotli job (remmed out again) - I added a few lines to create a variable timeout to simulate the brotli process-time - and deletes the control file and exits.

    The carets before the redirectors in the for loops tell cmd that the redirection is to be applied to the command being executed, not the for. 2>nul (+caret) says "redirect error messages (file not found) to nowhere (ie. discard them)".