regexlookbehindregex-lookarounds

Regular Expression - Match all but first letter in each word in sentence


I've almost got the answer here, but I'm missing something and I hope someone here can help me out.

I need a regular expression that will match all but the first letter in each word in a sentence. Then I need to replace the matched letters with the correct number of asterisks. For example, if I have the following sentence:

There is an enormous apple tree in my backyard.

I need to get this result:

T**** i* a* e******* a**** t*** i* m* b*******.

I have managed to come up with an expression that almost does that:

(?<=(\b[A-Za-z]))([a-z]+)

Using the example sentence above, that expression gives me:

T* i* a* e* a* t* i* m* b*.

How do I get the right number of asterisks?

Thank you.


Solution

  • Try this:

    \B[a-z]
    

    \B is the opposite of \b - it matches where there is no word boundary - when we see a letter that is after another letter.

    Your regex is replacing the whole tail of the word - [a-z]+, with a single asterisks. You should replace them one by one. If you want it to work, you should match a single letter, but check is has a word behind it (which is a little pointless, since you might as well check for a single letter (?<=[A-Za-z])[a-z]):

    (?<=\b[A-Za-z]+)[a-z]
    

    (note that the last regex has a variable length lookbehind, which isn't implemented in most regex flavors)