I´m trying to get the email from and cc from a forwarded email, when the body looks like this:
$body = '-------
Begin forwarded message:
From: Sarah Johnson <blabla@gmail.com>
Subject: email subject
Date: February 22, 2013 3:48:12 AM
To: Email Recipient <thatwouldbe@yayyy.com>
Cc: Ralph Johnson <johnson@gmail.com>
Hi,
hello, thank you and goodbye!
blabla@gmail.com'
Now, when I do the following:
$body = strtolower($body);
$pattern = '#from: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
I correctly get:
from: sarah johnson <blabla@gmail.com>
Now, why does the cc don't work? I do something very similar, only changing from to cc:
$body = strtolower($body);
$pattern = '#cc: \D*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S#';
if (preg_match($pattern, $body, $arr_matches)) {
echo htmlentities($arr_matches[0]);
die();
}
and I get:
cc: ralph johnson <johnson@gmail.com> hi, hello, thank you and goodbye! blabla@gmail.com
If I remove the email from the original body footer (removing blabla@gmail.com) then I correctly get:
cc: ralph johnson <johnson@gmail.com>
It looks like that email is affecting the regular expression. But how, and why doesn't it affect it in the from? How can I fix this?
The problem is, that \D*
matches too much, i.e. it is also matching newline characters. I would be more restrictive here. Why do you use \D
(not a Digit) at all?
With e.g. [^@]*
it is working
cc: [^@]*\S([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})\S
See it here on Regexr.
This way, you are sure that this first part is not matching beyond the email address.
This \D
is also the reason, it is working for the first, the "From" case. There are digits in the "Date" row, therefore it does not match over this row.