phpreplacepreg-replace

preg_replace for a specific domain name


I was using str_replace to rewrite URLs to PDFs from https://example.com/documents/en/whatever.PDF to https://example.com/documents/es/whatever_SPANISH.pdf

This is what I was using

    if($_COOKIE['googtrans'] == "/en/es") { //Check the google translate cookie
            $text = str_replace('/documents/en/', '/documents/es/', $text);
            $text = str_replace('.pdf', '_SPANISH.pdf', $text);
    }

The problem is, if the page contains a PDF linked to another page (not my own website), example https://othersite.example.com/whatever.pdf, it becomes https://othersite.example.com/whatever_SPANISH.pdf which isn't valid on other people's sites. I want to ignore offsite links and only change URLs on my site.

So what I would like to do is look for the string: https://example.com/documents/en/whateverfilename.pdf and pull that file name out and change it to https://example.com/documents/es/whateverfilename_SPANISH.pdf (Switching the en to es and also appending the _SPANISH to the end of the PDF filename.

How can I do this. Have tried various preg_replace but can't get my syntax right.

    if($_COOKIE['googtrans'] == "/en/es") {
            $text = str_replace('/documents/en/', '/documents/es/', $text);
            $text = str_replace('.pdf', '_SPANISH.pdf', $text);
    }


Solution

  • You could do the replacement in 1 go using a regex and 2 capture group values in the replacement.

    \b(https?://\S*?/documents/)en(/\S*)\.pdf\b
    

    Or match the domain name:

    \b(https?://example\.com/documents/)en(/\S*)\.pdf\b
    

    The pattern matches:

    In the replacement use the 2 capture groups denoted by $1 and $2:

    $1es$2_SPANISH.pdf
    

    See the regex group captures.

    Example:

    $regex = '~\b(https?://\S*?/documents/)en(/\S*)\.pdf\b~';
    $text = "https://example.com/documents/en/whateverfilename.pdf";
    
    $result = preg_replace($regex, "$1es$2_SPANISH.pdf", $text);
    
    echo $result;
    

    Output

    https://example.com/documents/es/whateverfilename_SPANISH.pdf
    

    If you want to match the same amount of forward slashes as in your example, you can make use of a negated character class [^\s/] to exclude matching whitespace characters or forward slashes:

    \b(https?://[^\s/]+/documents/)en/([^\s/]+)\.pdf\b
    

    See another regex demo.