I'm using something similar to this as input.txt
header
040525 $$$$$ 9999 12345
random stuff
040525 $$$$$ 8888 12345
040525 $$$$$ 7777 12345
random stuff
040525 $$$$$ 6666 12345
footer
Due to the way this input is being pre-processed, I cannot correctly use pd.read_csv. I must first create a list from the input; Then, create a DataFrame from the list.
data_list = []
with open('input.txt', 'r') as data:
for line in data:
data_list.append(line.strip().split())
df = pd.DataFrame(data_list)
I only want to append lines that contain '$$$' in the second column. Desired output would be:
0 1 2 3
0 40525 $$$$$ 9999 12345
1 40525 $$$$$ 8888 12345
2 40525 $$$$$ 7777 12345
3 40525 $$$$$ 6666 12345
data_list = []
with open('input.txt', 'r') as data:
for line in data:
#create split_row to check
split_row = line.strip().split()
#check if the second substring in split_row starts with "*****"
if split_row[1].startswith("*****"):
data_list.append(split_row)
df = pd.DataFrame(data_list)