pythonregexunit-testingfalse-positive

Unit Testing regex to check for False Positives


so I have a Regex expression that I'm matching against a single string provided to check and match certain information. If it's matched, only the captured group is returned. I have made my function so that it removes any null strings from the returned array and finally, just gives the captured string as the output. This works great in unit testing for True Positives.

Now, I want to check for False Positives using the same expression, except I can't seem to figure out how to demonstrate it in a unit test. I have a few test strings in a file for which the Regex shouldn't match, and it does not. So my code works. But when I try to actually show that in a test case, as in check if a null string is returned, I can't.

I essentially want to check that if a match is not found, then it should return a null string. This is my code

match = re.findall(combined, narration)
    result = list(filter(None, match[0]))
    if match:
        return result[0]
    else:
        result[0] = ""
        return result[0]

The first clause works fine for matched strings and returns a single string as output. In the second clause, I want to output a null string so I can check with the test case .assertEqual if the string is unmatched. But the function returns list index out of range error.

Can anybody tell me if there's a better way to check for an unmatched string with Regex and Unit Tests?

Edit 1: Adding Expected Input and Output as requested

Input 1 - BRN CLG-CI IQ PAID ROHIT SINGH

Output 1 - ROHIT SINGH

Input 2 - BRN-TO CASH SELF

Output 2 - '' //null string


Solution

  • It seems you can use re.findall, check its output, and if there is a match, filter out empty matches and print the first match. Else, print an empty string.

    See this Python demo:

    import re
    combined = r'^BRN.*?(?:paid|to)\s(?![A-Za-z\s]*\bself\b)([A-Za-z\s]+)'
    narrations = ['BRN CLG-CI IQ PAID ROHIT SINGH','BRN-TO CASH SELF']
    
    for narration in narrations:
        print('-------',narration,sep='\n')
        match = re.findall(combined, narration, flags=re.I)
        if match:
            result = list(filter(None, match))
            print( result[0] )
        else:
            print( '' )
    

    yielding

    -------
    BRN CLG-CI IQ PAID ROHIT SINGH
    ROHIT SINGH
    -------
    BRN-TO CASH SELF