pythoniteration

Python: detecting a letter pattern without iterating through all possible combinations


Apologies for the possibly not-very-useful title; I couldn't figure out how to summarise this problem into one sentence.

I'm trying to count how many "units" long a word is in Python 3.10. One "unit" is (C for consonant and V for vowel) either CV or VC or C or V (the latter two only being used when no pair can be made). For example, "piece" would be three units (pi-e-ce or pi-ec-e), "queue" would be four units (qu-e-u-e), and "lampshade" would be six units (la-m-p-s-ha-de).

What I'm struggling with is how exactly I would detect these units without iterating through every combination of every vowel and consonant for each pair of letters. It'd be hugely inefficient to do this with iteration, but I can't think of anything better with my current knowledge of Python. What would be an efficient way of solving this problem?

As an extra (optional) problem, what if digraphs are introduced, like "gh" and "th"? This would make words like "thigh" (four units, t-hi-g-h) into only two units (thi-gh), but also complicates the working-out.


Solution

  • Just convert everything to 'V' or 'C', respectively, in a list. Then check the first 2 letters in a loop. If it's 'CV' or 'VC' pop the list, and no matter what, pop it again. Count the iterations it takes to exhaust the list.

    def units(word) -> int:
        word = ["V" if c in "aeiou" else "C" for c in word]
        cnt = 0
    
        while word:
            if ''.join(word[:2]) in ('CV', 'VC'):
                word.pop(0)
            word.pop(0)
            cnt += 1
              
        return cnt
          
          
    for word in ('queue', 'piece', 'lampshade'):
        print(word, units(word))