pythonregextimestamppython-retext-segmentation

Remove timestamp in the bracket from text Python


I'd like to remove all the timestamps in the parentheses in the below sample text data.

Input:

Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website (remember to use your new password) ( 11m 31s )

Expected Output:

Agent: Can I help you? Customer: Thank you Customer: I have a question about X. Agent: I can help here. Log in this website (remember to use your new password)

I tried re.sub(r'\(.*?\)', '', data) but it did not work as it removes everything in the parentheses. I want to keep the content in the parentheses if it is not a timestamp, for instance, I'd like to keep "(remember to use your new password)" in the output.

Still new to regex so hope I can get some guidance here. Thank you!


Solution

  • \(\s(\d{1,2}[smh]\s)+\)
    

    FYI: .* matches everything except line terminator.