I am trying to rename several files (thousands) with the following algorithm in Python: There is a list of word-patterns. Each entry starts on a new line and contains text data of one or more words. For example:
r'House',
r'Big Tree',
r'Word',
Next, the script takes the names of the files in the folder and compares each pattern (Text Data String) and looks for matches in the file name. If the script finds one or more matches, it removes everything except them (and the extension) from the file name. And the remaining matches are renamed according to the pattern with which the comparison was made.
The search ignores case, special characters and different delimiters like '-', '_', ',' (of which there can be more than one). When searching for several words ('Big Tree'), all possible combinations like:
__big_tree_
; big,tree_
etc.
Here are the original file names for an example:
__big_tree_and_5_more_house_and_1_more_drawn_by_a_n_name__f95c144ac5d5cf0a0baa24d6593cc699.jpg
Table-Home-Word-7325313.jpeg
kiwi 40055619 virtual YouTube,Search EN, 100+ bookmarks 115480616_p0.png
The output should be:
Big Tree House.jpeg
Word.jpeg
kiwi 40055619 virtual YouTube,Search EN, 100+ bookmarks 115480616_p0.png
Since the last file did not have a name match, it was not renamed
I am new to Python and can't figure out how to implement a similar task.
All the solutions found on the internet mostly work with a single pattern, whereas I will have thousands of them. Ideally, I am trying to implement a variant in which the text database will be stored separately and separately a universal regular expression that allows to search for matches.
Thank you all in advance.
After much trial and error, the best I could do was the following code: Unfortunately it is not fully functional and due to moving files to a temporary folder it duplicates files when restarted
import os
import re
import shutil
# Patterns that will be checked for matches in file names
patterns = [
r'House',
r'Bit Tree',
r'Word',
r'Word2',
r'Home',
]
# Filename filtering
def filter_filename(filename):
matches = []
for pattern in patterns:
match = re.search(pattern, filename)
if match:
matches.append(match.group())
return ' '.join(matches)
# File Renaming sending them to a temporary directory
def rename_files(directory):
temp_dir = os.path.join(directory, 'temp')
os.makedirs(temp_dir, exist_ok=True)
for filename in os.listdir(directory):
old_filepath = os.path.join(directory, filename)
if os.path.isfile(old_filepath):
base_filename, file_extension = os.path.splitext(filename)
new_filename = filter_filename(base_filename)
if new_filename:
new_filename = new_filename
new_filepath = os.path.join(temp_dir, new_filename)
# Adding a one for uniqueness of the name
count = 1
while os.path.exists(new_filepath):
if count < 9999:
new_filename_with_count = f"{new_filename} {count}{file_extension}"
new_filepath = os.path.join(temp_dir, new_filename_with_count)
count += 1
else:
break
shutil.move(old_filepath, new_filepath)
# Returning from a temporary folder
for filename in os.listdir(temp_dir):
temp_filepath = os.path.join(temp_dir, filename)
new_filepath = os.path.join(directory, filename)
shutil.move(temp_filepath, new_filepath)
os.rmdir(temp_dir)
# Directory for processing
directory = './.'
rename_files(directory)
Here's a simpler solution, which leaves out some of the extra work you're doing (like creating a target directory, temp files, etc.) and which focuses on the following:
-
, _
, ,
or whitespace (of any length > 0), and be case-insensitiveThe script doesn't currently actually move the files, but just prints what it would be moving, so you can verify it works.
import re
import os
import shutil
def move_matching_files(source, destination, key_phrases, log=True, move=False):
# replace whitespace in the keywords with regex pattern
# `patterns` will be the patterns for the key_phrases
patterns = [r'[_\-,\s]+'.join(key_phrase.split()) for key_phrase in key_phrases]
for file_name in os.listdir(source):
for pattern in patterns:
# whole pattern: anything followed by a key phrase, followed by anything
if re.match(f'.*{pattern}.*', file_name, re.IGNORECASE):
# when matched, a move needs to be executed, break from the search
break
else:
# if no phrase matches, don't move, continue with the next file_name
continue
source_fn = os.path.join(source, file_name)
destination_fn = os.path.join(destination, file_name)
if log:
print(f'Moving: {source_fn} to {destination_fn}')
if move:
shutil.move(source_fn, destination_fn)
move_matching_files('.', 'temp', ['house', 'big tree'])
If you don't want it to print, and do want it to move, you'd call it like:
move_matching_files('.', 'temp', ['house', 'big tree'], log=False, move=True)