pythonpython-3.xregexpython-repython-regex

Regular Expression with Two Names: One With Middle Initial and One Without


I'm attempting to identify the names in this string, using regex (https://regex101.com).

Example text:

Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432

What I've tried so far only seems to work for names without a middle initial:

([A-Z]{1}[a-z]+) ([A-Z]{1}[a-z]+)

Note: Phone Numbers are random keystrokes. Please don't try calling them.

Here's an example of python code using the re package:

import re
strr = 'Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432'

def gimmethenamesdammit(strr):
    regex = re.compile("([A-Z]{1}[a-z]+) ([A-Z]{1}[a-z]+)")
    print(regex.findall(strr))

gimmethenamesdammit(strr)

To sum things up, please modify the regular expression above to highlight both the names Elon R. Musk and Jeff Bezos

Desired python output when running gimmethenamesdammit(strr):

gimmethenamesdammit(strr)

[('Elon', 'R.', 'Musk'), ('Jeff', 'Bezos')]

Solution

  • The following regex expression solves the issue:

    import re
    
    strr = 'Elon R. Musk (245)436-7956 Jeff Bezos (235)231-3432'
    
    regex = r"[A-Z]\w+\s[A-Z]?\w+"
    
    POCs = re.findall(regex, strr)
    
    f"{POCs[0]}, {POCs[-1]}"