phpapachenginxpreforking

Traffic loss due to redirection or server resources


I am not 100% sure this question belongs here or in ServerFault, but I will post it where the community wants me to. I have a site that handles thousands of redirects per day. Many of our customers have complained that they are losing some of their traffic. Upon close examination of the code, we deduced that the traffic is in fact being lost when we redirect:

header("Location: https://samedomain/differentpath/");
exit;

100% of the traffic hits the page where this line of code gets called from. Only about 60% of the visits hit the https://samedomain/differentpath.

At first we thought it was a server issue, so we tried so many different combinations of apache and mpm_prefork settings to no avail. This is our current mpm prefork configuration:

<IfModule mpm_prefork_module>
        StartServers                     5
        MinSpareServers           5
        MaxSpareServers          10
        MaxRequestWorkers         1786
        MaxConnectionsPerChild   0
</IfModule>

After some benchmarking and sending a lot of traffic I noticed that when I send traffic directly from my own IP, 100% of the redirection occurs. This is great! I may be mistaken, but at least I can have a little more confidence the server can handle it (I've been testing with 2k visits in about 2-3 mins). Then I modified my tests a little bit to emulate real traffic a bit more, so I got a list of random proxies, and this behaved more like my real traffic, where I was again losing 40% of my traffic. And it all gets lost with the header. I am 100% confident nothing else is being sent to the response before this header occurs. We don't serve content, we only redirect.

Just to make sure I wasn't having issues with response content before the header went out, I started redirecting with meta tags, JavaScript redirects, and header refresh with the same results.

As a final desperate attempt, we fired up a Nginx server with php FastCGI. I was not surprised to find the same issues.

The server resources btw didn't make sense either because I looked at the processes while sending heavy traffic and it peaked at 12% CPU, and 4% RAM. Here are some of the details of the server:

Ubuntu 18.04.3 (LTS) x64 6 vCPUs 16GB /320GB Disk

I should mention that our DB server is separate from this one, and it has the same specs. We also fired up a couple of servers and added a Load Balancer and we got (surprise) the same result.

So my question is, knowing that the header redirect is not failing because of content (some of my tests I only had the redirect and it still failed), is there any other possible reason why this would happen? Would it be possible this is a apache issue, or a php issue, or a combination of both?

PS: I also looked at all error logs, and they were clean.

Edit: Something else that I thought was a problem was the traffic source. Maybe the proxies were being blocked by my provider, but if that was the case the initial visits wouldn't hit the page either, correct? Just another bit of info that I think could be useful.


Solution

  • Providing a 301 redirect is an instruction for the client visiting your website. You can not force that visitor to actually go there. So you will never get a 100% redirection rate. That being said, standard behavior for most human web clients (aka web browsers) would be to follow that instruction and go to the destination provided in your header response. What you are most likely experiencing is standard bot traffic. Most bots, good or bad, have much different behavior when facing a redirect. Sometimes they will follow, and more often they will not. You're seeing about 40% not follow and that would fit right in with the estimates of current bot traffic (estimates range from 40-60%) on the Internet.

    You may want to analyze the IP addresses and user agents in your logs to determine if they are from known bots and see what percentage they are. You could also try a package that will help determine what the traffic is and log it for testing. For example: https://github.com/JayBizzle/Crawler-Detect Just understand that no detection routine will be 100% accurate.

    If you are afraid that human visitors are not getting properly redirected, you could add content to the page that contains your redirect code and provide a button or link that will help them reach the page you are trying to redirect to.