markdownreact-markdown

Regex space and special character selector


I'm using rich text editor. I convert html output to markdown. I have a problem like that; If I type **text** it works. But if I type like that **text ** (space after word) my app doesn't recognise markdown. So I want to select space and * or space and _ with regex. I am going to remove space before * or _ and put it after of the word.

Shortly I need this;

  1. **text ** -> **text**
  2. _text _ -> _text_
  3. *text * -> *text*

and of course if you know how can I solve this problem with react-markdown its better for this problem.

I am using react with react-markdown


Solution

  • If your regex engine supports look arounds:

    Search: "(?<=\w) (?=[_*])"
    Replace: ""
    

    See live demo.

    (?<=\w) is a look behind that asserts the previous character is a word character.
    (?=[_*]) is a look ahead that asserts the following character is an underscore of asterisk.

    If look arounds are not available, capture the surrounding characters then restore them:

    Search: "(\w) ([_*])"
    Replace: "$1$2"
    

    In your comment you have changed the nature of the question to not delete the space, but the interchange the asterisk and the space, and only where the word has a balanced number of asterisks preceding it, then also insert a space where there isn't one. There is no general solution using regex for an arbitrary number of asterisks surrounding a word. However, if you limit the number to 2, this will work:

    First repair double asterisks:

    Search: "(\*\*\w+) \*\*"
    Replace: "$1** "
    

    See live demo.

    Then the single asterisks:

    Search: "(\*\*\w+) \*\*"
    Replace: "$1** "
    

    See live demo.

    Then fix the missing space involving 2 asterisks:

    Search: "(?<!\*)(\*\*\w+)\*\*(?=\*)"
    Replace: "$1** "
    

    See live demo.

    Finally fix the missing spaces for single asterisks:

    Search: "(?<!\*)(\*\w+)\*(?=\*)"
    Replace: "$1* "
    

    But even then you will probably find there are cases of input that will not give the desired result.

    Your comment's more complicated requirements as best served by parsing the input byte by byte, and correcting it using a syntax model.