javascriptphp.htaccessbotsbrowser-detection

How to ban crawlers, bot, fake user and allow only specific browser?


I'm trying to make a good web traffic filter, my goal is to ban all bots, crawlers, spiders, non-real users and allow only a specific browser.

I have done some tests in PHP, others in JavaScript, but I do not feel that it is totally well done. I would like the opinion of some expert. I think using a combination of PHP + JavaScript + robots.txt + .htaccess could do it.

I know that the user agent can be faked, but I would like to know if there is a better way to detect it. For example I would like to allow only users to use Mozilla Firefox (regardless of version).

All others browser should go to an exclusion list or sort of, this is like a filter.

What is the best way to do this? In short, detect the browser to allow only Firefox and avoid all fake users, robots, spiders, crawlers and other crap.


Solution

  • Ok then, let me try to provide some ideas here.

    You should use a combination of techniques:

    1. Robots will keep the legic crawlers out;
    2. Use some Javascript validation on the client side to keep most crawlers out (these will rarely have the ability to run Javascript);
    3. On you server side, use a user agent service to identify and filter the user agents as well;
    4. Track IP addresses so that you can do one-off bans on "known offenders";

    To expand a little more on #2, your landing page could use JavaScript to drop a cookie with a "known" value that can be mapped back to the originator. One example is to take the user agent and ip address and compute a hash. This can still be faked but most offenders would just decide to ignore your site rather than put the effort into bypassing your protection measures.

    Hope this helps.