batch-file

Move duplicate files to a subfolder using a .bat file


When I run the following batch it creates a folder "duplicates" and it shows the duplicate (by size) files within the folder the script is currently in. I'm new in BATCH and I can't make it move the duplicate files to the "duplicates" folder.

@echo off

setlocal EnableDelayedExpansion

if not exist duplicates mkdir duplicates

for /R %%a in (*.*) do (
    set "size[%%~Za]=!size[%%~Za]!,%%~Fa"
)

for /F "tokens=2,3* delims=[]=," %%a in ('set size[') do (
    if "%%c" neq "" echo %%b,%%c
)

pause

I've tried if "%%c" neq "" copy %%b duplicates/ rm %%b,copy %%c /duplicates rm %%b but it's not working. Any help is appreciated. Thanks!


Solution

  • Please open a command prompt window, run help and look on the output incomplete list of Windows commands. The Windows command to move a file (or folder) is move. There can be executed help move or move /? for the output of the help of this command. There is also the A-Z index of Windows CMD commands.

    The provided code is not good for following reasons:

    1. It does not work for files of which fully qualified file name contains one or more ! because of enabled delayed variable expansion.
    2. It does not work for files of which fully qualified file name contains one or more = or , or [ or ] because of using these characters as string delimiters in the second FOR loop.
    3. It does not consider the maximum command line length limitation of 8191 characters. (8192 characters with string terminating null character.) The entire argument string of command SET consisting of the first ", the environment variable name beginning with size[ and ending with ], the equal sign, the comma separated list of absolute file names and the last " cannot be longer than 8191 characters.
    4. If the subfolder duplicates exist already like from a previous batch file execution, the files in this subfolder are not ignored for being processed once again.
    5. The batch file is not ignored if it is in the current directory or one of its subdirectories.

    The comma separated list of file names with identical file size is difficult to process further.

    It is unclear which files of identical file size should be moved into the subfolder duplicates in the current working directory which of course can be different to the directory containing the batch file.

    Solution 1: Moving only the second, third, … file of same file size

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    mkdir "duplicates" 2>nul
    if not exist "duplicates\" exit /B 1
    for %%I in ("duplicates") do set "DuplicatesFolder=%%~fI"
    set "DuplicatesFolder=%DuplicatesFolder:\=\\%\\"
    set "BatchFileName=%~f0"
    set "BatchFileName=%BatchFileName:\=\\%"
    for /F "delims==" %%I in ('set # 2^>nul') do set "%%I="
    for /F "delims=" %%G in ('dir * /A-D-L /B /ON /S 2^>nul ^| %SystemRoot%\System32\findstr.exe /B /I /V /C:"%DuplicatesFolder%" /C:"%BatchFileName%"') do if defined #%%~zG (move /Y "%%G" "duplicates\") else set "#%%~zG=1"
    rd "duplicates" 2>nul
    endlocal
    pause
    

    The batch file defines with the first two command lines the required execution environment.

    Next is created in the current working directory the subdirectory duplicates and checking next if that subdirectory exists finally. If that is not the case because of the used account does not have the permissions to create a subdirectory in the current working directory, the batch file exits silently with exit code 1 indicating an error.

    Next the fully qualified folder name of the subdirectory duplicates is determined using a simple FOR loop. The folder name could be also Duplicates or DUPLICATES on already existing before batch file execution.

    The directory separator on Windows is \ and not / as on Linux/Mac as described by the Microsoft documentation about Naming Files, Paths, and Namespaces.

    Each backslash in the fully qualified folder name of the subdirectory duplicates is replaced next with two backslashes and appended are additionally two backslashes. That is necessary for FINDSTR which interprets even in a literally interpreted search string a backslash as escape character on not being preceded by one more backslash.

    The batch file must be also ignored if it is in the current directory or one of its subdirectories. The fully qualified batch file name is determined for that reason and each backslash in batch file name is replaced by two backslashes.

    The first for /F loop makes sure that there is no environment variable defined by chance of which name begins with # in the current environment.

    There is executed by the second for /F in background one more command process started with:

    C:\WINDOWS\system32\cmd.exe /c dir * /A-D-L /B /OS /S 2>nul | C:\WINDOWS\System32\findstr.exe /B /V /C:"%DuplicatesFolder%" /C:"%BatchFileName%"
    

    %DuplicatesFolder% is replaced already by the fully qualified folder name of the subdirectory duplicates in the current working directory ending with \ and each backslash escaped with one more backslash.

    %BatchFileName% is replaced already by the fully qualified file name of the batch file with each backslash escaped with one more backslash.

    That command line outputs all file names with full path found in the current directory and all its subdirectories except those in the subdirectory duplicates and the batch file name filtered out by FINDSTR. See also: How to use OR operator with command FINDSTR from a Windows command prompt? Directories and links to other files and folders are excluded by DIR.

    Read the Microsoft documentation about Using command redirection operators for an explanation of 2>nul and |. The redirection operators > and | must be escaped with caret character ^ on FOR command line to be interpreted as literal characters when Windows command interpreter processes this command line before executing command FOR which executes the embedded dir command line with findstr for filtering out the files in subdirectory duplicates and the currently processed batch file.

    The output list of file names is captured by cmd.exe processing the batch file and processed line by line by FOR after cmd.exe started in background finished the execution of the entire command line. The file names with full path can contain one or more spaces which are by default interpreted as string delimiters. The option delims= specifies an empty list of string delimiters to get assigned to the loop variable G each file name completely.

    The IF condition checks if there is already defined an environment variable of which name is # and the current file size. If such an environment variable exists, the current file has the same file size as another file processed before. The duplicate file according to file size is moved into the subdirectory duplicates of the current working directory. The environment variable with name beginning with # and the file size of the current file is defined otherwise with the string value 1 if the current file is the first file with that file size.

    Please note that the command MOVE displays despite using option /Y a prompt if there is already in the folder duplicates a file with same name as the file to move and which has the read-only attribute set. The user must confirm in this case if the existing read-only destination file should be replaced by the file to move. The read-only attribute of a file to move does not prevent moving the file.

    The command RD deletes the directory duplicates on being empty after execution of the batch file because of not having found at least two files with the same file size.

    The last but one line restores the initial execution environment. Read this answer for details about the commands SETLOCAL and ENDLOCAL.

    Solution 2: Moving all files with identical file size

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    mkdir "duplicates" 2>nul
    if not exist "duplicates\" exit /B 1
    for %%I in ("duplicates") do set "DuplicatesFolder=%%~fI"
    set "DuplicatesFolder=%DuplicatesFolder:\=\\%\\"
    set "BatchFileName=%~f0"
    set "BatchFileName=%BatchFileName:\=\\%"
    for /F "delims==" %%I in ('set # 2^>nul') do set "%%I="
    for /F "delims=" %%G in ('dir * /A-D-L /B /ON /S 2^>nul ^| %SystemRoot%\System32\findstr.exe /B /V /C:"%DuplicatesFolder%" /C:"%BatchFileName%"') do if defined #%%~zG (
        if defined #%%~zG# (
            setlocal EnableDelayedExpansion
            move "!#%%~zG#!" "duplicates\"
            endlocal
            set "#%%~zG#="
        )
        move /Y "%%G" "duplicates\"
    ) else set "#%%~zG=1" & set "#%%~zG#=%%G" 
    rd "duplicates" 2>nul
    endlocal
    pause
    

    This second batch file is a variant of the first batch file. The first file with same file size as another file is moved also to the subdirectory duplicates in the current working directory. The current working directory and all its subdirectories except duplicates contain finally only unique files according to file size and perhaps remaining empty subdirectories and links.

    There is defined for each file size first an environment variable with a name consisting of # and the file size number with the string value 1 and a second environment variable with a name also consisting of # and the file size number and ending additionally with # with the fully qualified file name of the first file with that file size.

    If one more file is found with the same file size, there is checked first if the environment variable with the file name of the first file with same file size is still defined in which case the file is moved now to the subdirectory duplicates using temporarily enabled delayed variable expansion and this environment variable is removed from the environment variables list. Then is moved also the duplicate file according to the file size.

    Please note that it is possible that a file cannot be moved because of being currently opened in an application. It is also possible that the subdirectory duplicates contains already a file with same name which cannot be replaced by a file to move because of the destination file is currently opened in an application. Errors on moving a file could be handled better with additional command lines if that is necessary in typical use cases of this batch file.

    To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.