Searching for specific phrase pattern within lines. python

I have made certain rules that I need to search for in a file. These rules are essentially phrases with an unknown number of words within. For example,

mutant...causes(...)GS

Here, this a phrase, which I want to search for in my file. The ... means a few words should be here(i.e. in this gap) & (...) means there may/may not be words in this gap. GS here is a fixed string variable that I know.

Basically I made these rules by going through many such files and they tell me that a particular file does what I am looking for.

The problem is that the gap can have any(small) number of words. There can even be a new line that begins in one of the gaps. Hence, I cannot go for identical string matching.

Some example texts -

!Series_summary "To better understand how the expression of a *mutant gene that causes ALS* can perturb the normal phenotype of astrocytes, and to identify genes that may

Here the GS is ALS (defined) and the starred text should be found as a positive match for the rule mutant...causes(...)GS

!Series_overall_design "The analysis includes 9 samples of genomic DNA from isolated splenic CD11c+ dendritic cells (>95% pure) per group. The two groups are neonates born to mothers with *induced allergy to ovalbumin*, and normal control neonates. All neonates are genetically and environmentally identical, and allergen-naive."

Here the GS is ovalbumin (defined) and the starred text should be found as a positive match for the rule induced...to GS

I am a beginner in programming in python, so any help will be great!!

Solution

The following should get you started, it will read in your file and display all possible matching lines using a Python regular expression, this will help you to determine that it is matching all of the correct lines:

import re

with open('input.txt', 'r') as f_input:
    data = f_input.read()
    print re.findall(r'(mutant\s.*?\scauses.*?GS)', data, re.S)

To then just search for just the presence of one match, change findall to search:

import re

with open('input.txt', 'r') as f_input:
    data = f_input.read()
    if re.search(r'(mutant\s.*?\scauses.*?GS)', data, re.S):
        print 'found'

To carry this out on many such files, you could adapt it as follows:

import re
import glob

for filename in glob.glob('*.*'):
    with open(filename, 'r') as f_input:
        data = f_input.read()
        if re.search(r'mutant\s.*?\scauses.*?GS', data, re.S):
            print "'{}' matches".format(filename)