I have exhausted online searches trying to find out how to do this.
I have tab delimited file searchfile.txt with two columns and >200 rows. Sample here:
A(H1N1)/SWINE/COTES-DARMOR/388/2009 X? 4.28144245
A(H1N2)/SWINE/SCOTLAND/410440/1994 X? 7.25878836
A(H1)/SWINE/ENGLAND/117316/1986 X? 3.305392038
A(H1)/SWINE/ENGLAND/438207/1994 X? 7.66078717
I have another file keywords.txt with some keywords that partially match the IDs in searchfile.txt:
ENGLAND/117316
DARMOR/388
438207
I want to extract all lines from searchfile.txt that contain any of the keywords in keywords.txt
Using solutions from other similar questions I tried:
grep -F -f keywords.txt searchfile.txt > selected.txt
grep -f keywords.txt searchfile.txt
awk 'FNR==NR {a[$0];next} ($NF in a)' keywords.txt searchfile.txt > result.txt
I also got part of the way there with this python script:
infile = r"/path/to/searchfile.txt"
results = []
to_keep = ["ENGLAND/117316",
"DARMOR/388",
"438207"]
with open(infile) as f:
f = f.readlines()
for line in f:
for phrase in to_keep:
if phrase in line:
results.append(line)
break
print(results)
And it outputs this in the terminal window:
[
'A(H1N1)/SWINE/COTES-DARMOR/388/2009 X?\t4.28144245\n',
'A(H1)/SWINE/ENGLAND/117316/1986 X?\t3.305392038\n',
'A(H1)/SWINE/ENGLAND/438207/1994 X?\t7.66078717\n'
]
Is there a way to
a) modify this script to read from a file like keywords.txt and output lines to another file? (My python skills are not up to that)
OR
b) use grep, awk, sed... to do this
I think the problem is that my keywords are not whole separate words and have to partially match what's in the searchfile.txt.
Grateful for any help! Thanks.
This is fairly straightforward in python. Assuming you have keywords.txt and input.txt files and want to output to output.txt:
# 1
with open('keywords.txt', 'r') as k:
keywords = k.read().splitlines()
#2
with open('input.txt') as f, open('output.txt', 'w') as o:
for line in f:
if any(key in line for key in keywords):
o.writelines(line)
this reads in the keywords file, and stores each line from it in a list (#1). We then open our input and output text files, looping through the input file line-by-line and write to the output file if we find any of our keywords in the line (#2).