pythonregexstringpython-repunctuation

Match all punctuation not surrounded by alphanumeric characters?


I am trying to write a regular expression that removes all non alphanumeric characters from a string, except for those that are surrounded by alphanumeric characters.

For example, consider the following three examples.

1.

it's -> it's

2.

its. -> its

3.

It's a: beautiful day? I'm =sure it is. The coca-cola (is frozen right?

It's a beautiful day I'm sure it is The coca-cola is frozen right

I am using Python's re module, and can match the opposite of what I am looking for with the following expression.

(?<=[a-zA-Z])[^a-zA-Z ](?=[a-zA-Z])

Any ideas?


Solution

  • Use

    [^a-zA-Z\s](?!(?<=[a-zA-Z].)[a-zA-Z])
    

    Regex proof

    EXPLANATION

    PATTERN DETAILS
    [^a-zA-Z\s] non-letter and non-whitespace
    (?!(?<=[a-zA-Z].)[a-zA-Z]) unmatch if followed and preceded with letter