pythonregexpython-re

How to use regex to extract a set of particular substrings?


I want to extract all possible substrings which have all the vowels from a string. For example in the code:

import re
text = "thisisabeautifulsequencofwords"
pattern = r"(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)[ \w]*?"
match = re.findall(pattern, text, re.DOTALL)
print(match)

I want to get the following outputs:

['thisisabeautifulsequenco','thisisabeautifulsequencof','sisabeautifulsequenco'....]

How to do that?


Solution

  • The code below can help you with your problem using Regix re in Python.

    import re
    
    def get_substrings_with_all_vowels_regex(text):
        pattern = r'(?=(.{5,}))'  # Positive lookahead for substrings of length >=5
        matches = []
    
        # Iterate over all possible starting positions
        for match in re.finditer(pattern, text):
            start = match.start()
            # Try all possible substrings starting from 'start' position
            for end in range(start + 5, len(text) + 1):
                substring = text[start:end]
                if all(vowel in substring for vowel in 'aeiou'):
                    matches.append(substring)
             return matches
    
    text = "thisisabeautifulsequencofwords"
    result = get_substrings_with_all_vowels_regex(text)
    print(result)