javaregexperlpcre

How can I match the middle character in a string with regex?


In an odd number length string, how could you match (or capture) the middle character?

Is this possible with PCRE, plain Perl or Java regex flavors?

With .NET regex, you could use balancing groups to solve it easily (that could be a good example). By plain Perl regex I mean not using any code constructs like (??{ ... }), with which you could run any code and of course do anything.

The string could be of any odd number length.

For example, in the string 12345 you would want to get the 3, the character at the center of the string.

This is a question about the possibilities of modern regex flavors and not about the best algorithm to do that in some other way.


Solution

  • With PCRE and Perl (and probably Java) you could use:

    ^(?:.(?=.*?(?(1)(?=.\1$))(.\1?$)))*(.)
    

    which would capture the middle character of odd length strings in the 2nd capturing group.

    Explained:

    ^ # beginning of the string
    (?: # loop
      . # match a single character
      (?=
        # non-greedy lookahead to towards the end of string
        .*?
        # if we already have captured the end of the string (skip the first iteration)
        (?(1)
          # make sure we do not go past the correct position
          (?= .\1$ )
        )
        # capture the end of the string +1 character, adding to \1 every iteration
        ( .\1?$ )
      )
    )* # repeat
    # the middle character follows, capture it
    (.)