I have text in an Input File
I need to start a new line in the text file every time I find the string 'NODATACODE' and write it to an Output file.
The desired output is below with a new line added everytime the 'NODATACODE' output is found:
I tried the following code to perform the task:
with open('InputFile.txt', 'r') as file:
data = file.read()
nodata_index = data.find('NODATACODE')
if nodata_index != -1:
data_to_write = data[nodata_index:]
#Code to add a new line
data_to_write = str(data_to_write.split('\n'))
with open('Output.txt', 'w') as file:
file.write(data_to_write)
else:
print("'NODATACODE' not found in the file.")
I don't get any error messages but I do get wrong output. My incorrect error output is below.
Please let me know what I need to amend in my code.
Thanks a lot in advance.
The issue in your code arises from how you're handling the string data_to_write
. When you split the data using data_to_write.split('\n')
, it creates a list of strings. Then, when you convert this list back to a string using str(...)
, it formats the list as a string with square brackets and commas, which is not what you want.
The idea is to search for occurrences of 'NODATACODE' that are not at the start of the text and replace them with a newline followed by 'NODATACODE'. This can be efficiently done using a regular expression that matches 'NODATACODE' and checks if it's not preceded by the start of the string. Here's how you can modify your code:
import re
# Read the file content first
with open('InputFile.txt', 'r') as file:
data = file.read()
# Find the index where 'NODATACODE' occurs
nodata_index = data.find('NODATACODE')
# Check if 'NODATACODE' is found
if nodata_index != -1:
# Extract the text from 'NODATACODE' to the end
data_to_write = data[nodata_index:]
data_to_write = re.sub(r'(?<!^)NODATACODE', r'\nNODATACODE', data_to_write)
# Write the modified data back to the file
with open('Output.txt', 'w') as file:
file.write(data_to_write)
else:
print("'NODATACODE' not found in the file.")
In this script:
re.sub()
function is used for substitution.r'(?<!^)NODATACODE'
is a negative lookbehind assertion (?<!^)
that ensures 'NODATACODE'
is not at the start of the string (^
denotes the start of the string in regex).r'\nNODATACODE'
is the replacement string, which adds a newline before 'NODATACODE'
.'NODATACODE'
that is not at the very beginning of your file.