apachecometlong-pollingreverse-ajaxphp

Is php scalable with reverse ajax long polling?


I am working on a website that displays some data from DB that changes frequently (Status of a queue and a chat conversation). My current setup is Apache/PHP/MySQL. Naturally I would like to avoid polling the server every x seconds since this does not scale well. I would like to do reverse ajax long polling, however, I've read that Apache does not work well with this since it quickly runs out of worker threads. There are many other web servers out there that get around this problem: nginx, tornado, etc. However, my problem is, PHP is the ONLY server-side scripting language I know. Also I've already written some PHP scripts so I'd like to keep them if I can. I am ok with switching server so long as I can still use PHP.

But after doing some more research, I've read that people say PHP (PHP-FPM?) also creates a process for every request made, which means if I have hundreds/thousands of open connections, there will be hundreds/thousands of processes, which will be problem as well.

Can I conclude that there's no good scalable ways to make long polling websites using PHP? Should I abandon PHP and learn another server scripting language? I can continue developing long polling using my current setup (Apache/PHP) for now but I don't want the choice of scripting language to pose any limitation on the scalability of my system when I deploy. So what should I do? I am not very experienced with web programming, so if any gurus out there can give me some pointers I'd appreciate it! Thank you!


Solution

  • PHP runned in php-fpm mode will still have limitations, especially if your code is eating a lot of memory. You won't be able to run thousands of parallel processes without some available memory. But it usually perform faster than mod_php, and at least HTTP request that do not need PHP are handled by the webserver, and if that webserver is nginx you'll get a lot more HTTP requests available in parallel.

    With php-fpm you will also have a queue of waiting requests, that may be usefull in case a temporary big traffic, as at least requests are queued, not rejected.

    Now the long polling operations are OK with nginx (or others, that's an example), but not with PHP. PHP is not built to be a long-running server, each request is a new process, it's really not the right choice for a KeepAlive thing. But "Divide ut regnes" (divide and rule). Your long polling tasks could run near your PHP application, but without your PHP application.

    As an example look at the jappix project, this is a PHP project. But you need to put somewhere an XMPP server (like ejabberd), and a BOSH server with nginx as a proxy on port 80 to that BOSH server (so you have the xmpp chat protocol on port 80, via nginx and ejabberd, and nothing on the PHP side for that). The problem is then to connect your application authentification, identification, and such, and this will have to be done by extending the XMPP server configuration (so that it use the same LDAP server as your PHP app for example).

    Your second long polling problem is the status of a queue. You may find some XMPP extensions for that, maybe. Or you may perform regular ajax queries on the queue. One of the useful technique to avoid the big number of ajax requests on your PHP application is to reschedule the next ajax check on the ajax callback of the check, based on the Fibonacci numbers (it's an example). So the first time the next ajax call will be scheduled 1 minutes after, next time 2 minutes, then 3m, 5m, 8m, 13m, 21m, 34m, 55m, 89m, 144m, etc. The idea is that it's maybe important to check new messages incoming 1 minute after a page load. As the user is still reading the same page (or drinking a coffee, talking to a friend, going to holidays without switching off his computer, etc), we can delay more and more the next checks. Is a way of assuming the user is not really active. Note that you could detect user activity by other means and alter the rescheduling.