I have a string variable somewhere in elisp code, and want to extract some parts of it into other variables using a regular expression with groupings. That's something that you can write in 1-2 lines in any language:
my ($user, $domain) = $email =~ m/^(.+)@(.+)$/;
How do I write the same in elisp?
The GNU Emacs Lisp Reference Manual is your friend. See also http://emacswiki.org/emacs/ElispCookbook (though at the time, the latter did not yet contain an example of this particular technique).
(save-match-data ; is usually a good idea
(and (string-match "\\`\\([^@\n]+\\)@\\([^@\n]+\\)\\'" email)
(setq user (match-string 1 email)
domain (match-string 2 email) ) ))
Since several commenters asked about this, here is a breakdown of this particular regex:
\`
and \'
(corresponding to Perl's \A
and \Z
) are anchors which match the beginning and end of the input string.
In a multi-line string, ^
would match the beginning of every line. \`
only matches the beginning of the first line. Similarly, \'
only matches the end of the last line.
[^@\n]
matches a single character which isn't @
or newline. The OP's Perl regex should also have taken care to only match a single occurrence of @
for correctness as well as efficiency (allowing a regex to match a string multiple ways can lead to catastrophic backtracking).
We also exclude newline to prevent the match from straddling line breaks.
\(
and \)
create matching groups. The matched text can be pulled out by match-string
; the groups are numbered from the left, counting the opening parentheses.
save-match-data
prevents this function from overwriting the match-string
data from another function which calls it. You generally want to prevent that sort of side effect in any reusable code.
I use single backslashes here, but of course, inside an elisp string, these need to be doubled.
Strictly speaking, there are many more characters which aren't permitted in email addresses. On the other hand, many beginners restrict way too far, and prohibit characters which are actually allowed in email addresses... and then publish their brain stains on the Internet and tell us "this will 100% work!!"; but I digress. You should take care to allow .
, -
, +
, and *
in particular. The full RFC5321 spec cannot easily be captured by a regular expression alone, but in practice, many of the esoteric corner cases are disallowed by lots of broken software anyway; so depending on your exact use case, you will probably be fine with something like [^@<>'\"\s ]
.