regexregex-negation

Elegant regular expression to match all punctuations but not "'" in emacs Lisp?


I want to match all punctuations, but not "'", as in "I'm". For example, in the sentence below:

I'm a student, but I'm also working. 
 ^not match  ^match ^not           ^match

I can use "[[:punct:]]+" to match all punctuations, but I'm having hard time to exclude "'" from the matching pattern.

Of course, I could use someting like the following to express by enumeration, but it's much tedious, especially considering all those punctuations for Chinese as well. "[,.?!]"

Please suggest a more elegant solution.

Thanks in advance,

Yu


Solution

  • Thanks to Bart's answer and all of your comments. Inspired by Bart's, I checked that emacs seems still not supporting look-ahead yet. But in the spirit, I coded the following:

    (defun string-match-but-exclude (regexp string exclusion &optional start)
    
    "Return index of start of first match for regexp in string, or nil, 
    but exclude the regular express in exclusion.
    Matching ignores case if `case-fold-search' is non-nil.
    If third arg start is non-nil, start search at that index in string.
    For index of first char beyond the match, do (match-end 0).
    `match-end' and `match-beginning' also give indices of substrings
    matched by parenthesis constructs in the pattern.
    
    You can use the function `match-string' to extract the substrings
    matched by the parenthesis constructions in regexp."
    
      (let ((data nil))
    
        (and (string-match regexp string start)
    
           ;; keep the match-data for recovery at the end. 
    
           (setq data (match-data))
    
           (not (string-match (concat "[" exclusion "]") (match-string 0 string)))
    
           (progn (set-match-data data) t) ; To recover the match data, and make sure it produces t as returned value
    
           (match-beginning 0)
    
           ))
    
      )
    

    So for the equivalent expression of (?!')[[:punct:]] string "'")

    it would be

    (string-match-but-exclude "[[:punct:]]" string "'")
    

    This would do the job, but not as elegant. It should be a minor addition to emacs to make this a built-in support.

    emacs does support character class now.

    Thanks again.

    Yu