I was trying to scrape booking.com as an exercise to learn Mechanize, but I can't get past an issue. I am trying to get a hotel's prices trough Mechanize using the following code:
hotel_name = "Hilton New York"
date = Date.today
day_after_date = date + 1
agent = Mechanize.new
homepage = agent.get("http://www.booking.com")
# Fill out the main form on the booking.com homepage
main_form = homepage.form_with(name: 'frm')
main_form.ss = hotel_name
main_form.checkin_monthday = date.day.to_s
main_form.checkin_year_month = "#{date.year}-#{date.month}"
main_form.checkout_monthday = day_after_date.day.to_s
main_form.checkout_year_month = "#{day_after_date.year}-#{day_after_date.month}"
main_form[''] = 1 # 1 adult, 0 children
homepage.save('1-homepage.html') # For debugging purposes
# Choose the hotel from the list that comes up
hotel_selection_page = agent.submit main_form
hotel_link = hotel_selection_page.links.select { |link| link.text =~ /#{hotel_name}/i }.first
hotel_page = hotel_link.click
# For debugging purposes
hotel_selection_page.save('2-hotels-list.html')
hotel_page.save('3-hotel-page.html')
If you follow the pages through your web browser, you will see that, after submitting the form on the homepage and choosing the hotel on the next page, you see the room prices for the selected date.
Through Mechanize though, on the 3-hotel-page.html
page, you cannot see the prices.
I have been at this for a while, and I can't seem to solve it. I thought the problem was the JavaScript that booking.com is using, but even after turning off JavaScript on my web browser, I was able to get the correct behavior.
Any thoughts on this?
Edit: I just realized that when the form is sent through the web browser, on the second page where you choose the hotel, hotel links have a sid
parameter (for example, sid=ba232d9d340c66ae73f1ded22b80a0da
), but when I send the form through Mechanize, I don't get the sid
parameter. What could be the reason?
Adding the following line to change the user agent worked in the end:
agent.user_agent_alias = 'Mac Safari'