ruby-on-railscapybarapoltergeist

The difference between the user and the bot


I practice in rails and the following question arose - how does the site is determined by the user, person or bot?

I use RoR - Capybara with Poltergeist

There is the following code:

require 'capybara/poltergeist'
options = {}

Capybara.register_driver :poltergeist do |app|
  Capybara::Poltergeist::Driver.new(app, options)
end

session =Capybara::Session.new(:poltergeist)
session.driver.headers = { 'User-Agent' => ''}

session.visit 'https://gumtree.com'
session.save_and_open_page

In the User-Agent field, I set my data and execute the code. If i just go to URL - the page is displayed correctly. If i run the code, it saves a blank page to where it redirects.

I have cleaned cookies. We have the same IP address. What other signs do we have different?


Solution

  • There are many many ways for a site to determine that you using an automation tool. The two easiest in this case are

    1. Poltergeist loads some JS into every page which is easily detectable.
    2. Poltergeist doesn't support a ton of new CSS/JS so the site could be feature testing the browser, seeing that it looks like a 7 year old version of Safari and finding that suspicious enough to assume it's a bot.

    Beyond that there are many more methods which would need complete analysis of the pages JS to see exactly what they're doing. Gumtree is very aggressive about detecting bots in order to prevent people from violating their terms of use, and bypassing that is well beyond a stackoverflow answer.