rtidyversestringr

Get strings before special character except apostrophe using stringr::word


Character string:

text="Il n’a pas réussi à répondre à votre plainte/question et ne vous a pas orienté ailleurs"

I'd like to extract the part before "/". I tried

word(text,sep="[[:punct:]]")
[1] "Il n"

The problem in my situation is that I have many strings with different special characters at each one. Considering above example, the special character is "/" but it could be "," or ";" or "(" etc. So I am looking for a global solution.


Solution

  • Your code is using any punctuation as a separator, and qualifies as punctuation. If you want to restrict the category, either hard-code the list of alternatives as a character class (e.g. [/,;(]) or use a negative lookahead to match all punctuation except apostrophe etc.

    Here’s the solution with the negative lookahead that excludes the apostrophe. You can add other characters as required; however, note that there’s no way to distinguish between apostrophe and the (English) single closing quotation mark — at least not on the character level.

    word(text, sep = '(?!’)[[:punct:]]')
    # [1] "Il n’a pas réussi à répondre à votre plainte"