regexitalic

Regex for matching predefined rules for italic text formatting


I'm trying to write a regex for matching user input that will be turned into italic format using markdown.

In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.

So basically: substring *substring substring substring* substring should spit out *substring substring substring*.

So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.

Thank you for your input! :)


Solution

  • Use

    \*(?![*\s])(?:[^*]*[^*\s])?\*
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      \*                       '*'
    --------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
    --------------------------------------------------------------------------------
        [*\s]                    any character of: '*', whitespace (\n,
                                 \r, \t, \f, and " ")
    --------------------------------------------------------------------------------
      )                        end of look-ahead
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
    --------------------------------------------------------------------------------
        [^*]*                    any character except: '*' (0 or more
                                 times (matching the most amount
                                 possible))
    --------------------------------------------------------------------------------
        [^*\s]                   any character except: '*', whitespace
                                 (\n, \r, \t, \f, and " ")
    --------------------------------------------------------------------------------
      )?                       end of grouping
    --------------------------------------------------------------------------------
      \*                       '*'