Search pattern to include square brackets

I am trying to search for exact words in a file. I read the file by lines and loop through the lines to find the exact words. As the in keyword is not suitable for finding exact words, I am using a regex pattern.

def findWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

The problem with this function is that is doesn't recognizes square brackets [xyz].

For example

findWord('data_var_cod[0]')('Cod_Byte1 = DATA_VAR_COD[0]')

returns None whereas

findWord('data_var_cod')('Cod_Byte1 = DATA_VAR_COD')

returns <_sre.SRE_Match object at 0x0000000015622288>

Can anybody please help me to tweak the regex pattern?

Solution

It's because of that regex engine assume the square brackets as character class which are regex characters for get ride of this problem you need to escape your regex characters. you can use re.escape function :

def findWord(w):
    return re.compile(r'\b({0})\b'.format(re.escape(w)), flags=re.IGNORECASE).search

Also as a more pythonic way to get all matches you can use re.fildall() which returns a list of matches or re.finditer which returns an iterator contains matchobjects.

But still this way is not complete and efficient because when you are using word boundary your inner word must contains one type characters.

>>> ss = 'hello string [processing] in python.'  
>>>re.compile(r'\b({0})\b'.format(re.escape('[processing]')),flags=re.IGNORECASE).search(ss)
>>> 
>>>re.compile(r'({})'.format(re.escape('[processing]')),flags=re.IGNORECASE).search(ss).group(0)
'[processing]'

So I suggest to remove the word boundaries if your words are contains none word characters.

But as a more general way you can use following regex which use positive look around that match words that surround by space or come at the end of string or leading:

r'(?: |^)({})(?=[. ]|$) '