I was using str_replace to rewrite URLs to PDFs from https://example.com/documents/en/whatever.PDF to https://example.com/documents/es/whatever_SPANISH.pdf
This is what I was using
if($_COOKIE['googtrans'] == "/en/es") { //Check the google translate cookie
$text = str_replace('/documents/en/', '/documents/es/', $text);
$text = str_replace('.pdf', '_SPANISH.pdf', $text);
}
The problem is, if the page contains a PDF linked to another page (not my own website), example https://othersite.example.com/whatever.pdf, it becomes https://othersite.example.com/whatever_SPANISH.pdf which isn't valid on other people's sites. I want to ignore offsite links and only change URLs on my site.
So what I would like to do is look for the string: https://example.com/documents/en/whateverfilename.pdf and pull that file name out and change it to https://example.com/documents/es/whateverfilename_SPANISH.pdf (Switching the en to es and also appending the _SPANISH to the end of the PDF filename.
How can I do this. Have tried various preg_replace but can't get my syntax right.
if($_COOKIE['googtrans'] == "/en/es") {
$text = str_replace('/documents/en/', '/documents/es/', $text);
$text = str_replace('.pdf', '_SPANISH.pdf', $text);
}
You could do the replacement in 1 go using a regex and 2 capture group values in the replacement.
\b(https?://\S*?/documents/)en(/\S*)\.pdf\b
Or match the domain name:
\b(https?://example\.com/documents/)en(/\S*)\.pdf\b
The pattern matches:
\b
A word boundary(https?://\S*?/documents/)
Capture group 1, match the protocol and then optional non whitespace characters until the first occurrence of /documents/
en
Match literally(/\S*)
Capture group 2, match /
followed by optional non whitspace chars\.pdf\b
Match .pdf
followed by a word boundaryIn the replacement use the 2 capture groups denoted by $1
and $2
:
$1es$2_SPANISH.pdf
See the regex group captures.
Example:
$regex = '~\b(https?://\S*?/documents/)en(/\S*)\.pdf\b~';
$text = "https://example.com/documents/en/whateverfilename.pdf";
$result = preg_replace($regex, "$1es$2_SPANISH.pdf", $text);
echo $result;
Output
https://example.com/documents/es/whateverfilename_SPANISH.pdf
If you want to match the same amount of forward slashes as in your example, you can make use of a negated character class [^\s/]
to exclude matching whitespace characters or forward slashes:
\b(https?://[^\s/]+/documents/)en/([^\s/]+)\.pdf\b
See another regex demo.