When I search text in a single file and write out just that one file, it acts as expected. It creates a new file in the subdirectory "output", with the existing text "This", and the addition of the text "And That" on the next line. However, when I am iterating through all the files in a sub-directory, I'm getting double the new text. I don't get why. Here is the code:
import os
import shutil
import pathlib
def replace_text_in_multiple_files(input_path, output_path):
search_text = "This"
new_text = "This\nAndThat"
shutil.rmtree(output_path)
os.mkdir(output_path)
for subdir, dirs, files in os.walk(input_path):
for file in files:
input_file_path = subdir + os.sep + file
output_file_path = output_path + os.sep + file
if input_file_path.endswith(".txt"):
s = pathlib.Path(input_file_path).read_text()
s = s.replace(search_text, new_text)
with open(output_file_path, "w") as f:
f.write(s)
def replace_text_in_a_single_files(input_file_path, output_file_path):
search_text = "This"
new_text = "This\nAndThat"
s = pathlib.Path(input_file_path).read_text()
s = s.replace(search_text, new_text)
with open(output_file_path, "w") as f:
f.write(s)
replace_text_in_multiple_files("D:\\Test\\", "D:\\Test\\output\\")
#replace_text_in_a_single_files("D:\\Test\\File1.txt", "D:\\Test\\output\\File1.txt")
In the directory 'D:\Test' I have 3 text files. Each of the text files contains the following text:
This
is
a
test
If I run 'replace_text_in_a_single_files' in the code, it opens File1.txt, searches for the text, replaces that text with the same text plus the value 'And That', and then writes that out to a new file in the output subdirectory, which results in the following:
This
And That
is
a
test
However, when I run replace_text_in_multiple_files which does the same thing, just to a bunch of files instead of just one, each of the new files gets a doubling of the replacement text, resulting in the following:
This
AndThat
AndThat
is
a
test
So, it's like it's executing the replacement code twice. But why? And why only when it's iterating?
I was expecting that it would just produce the following text in each of the files.
This
AndThat
is
a
test
You're iterating over the input files as well as your own output files:
import os
import shutil
import pathlib
def replace_text_in_multiple_files(input_path, output_path):
search_text = "This"
new_text = "This\nAndThat"
shutil.rmtree(output_path)
os.mkdir(output_path)
for subdir, dirs, files in os.walk(input_path):
for file in files:
print(subdir, file)
input_file_path = subdir + os.sep + file
output_file_path = output_path + os.sep + file
if input_file_path.endswith(".txt"):
s = pathlib.Path(input_file_path).read_text()
s = s.replace(search_text, new_text)
with open(output_file_path, "w") as f:
f.write(s)
def replace_text_in_a_single_files(input_file_path, output_file_path):
search_text = "This"
new_text = "This\nAndThat"
s = pathlib.Path(input_file_path).read_text()
s = s.replace(search_text, new_text)
with open(output_file_path, "w") as f:
f.write(s)
replace_text_in_multiple_files("./Test", "./Test/output/")
./Test File3.txt
./Test File2.txt
./Test File1.txt
./Test/output File3.txt
./Test/output File2.txt
./Test/output File1.txt
Your script writes each output file once it "sees" a file in the input folder, but then os.walk
"discovers" files with the same name in the output folder and proceeds to iterate over those.