I am trying to parse amazon to compile a list of prices, as part of a bigger project relating to statistics. However, I am stumped. I was wondering If anyone can review my code and tell me where I went wrong?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mechanize
from bs4 import BeautifulSoup
URL_00 = "http://www.amazon.co.uk/Call-Duty-Black-Ops-PS3/dp/B007WPF7FE/ref=sr_1_2?ie=UTF8&qid=1352117194&sr=8-2"
bro = mechanize.Browser()
resp = bro.open(URL_00)
html = resp.get_data()
soup_00 = BeautifulSoup(html)
price = soup_00.find('b', {'class':'priceLarge'})
print price #this should return at the very least the text enclosed in a tag
According to the screenshot, what I wrote above should work, shouldn't it?
Well all I get in the print out is "[]", if I change the line before last to this:
price = soup_00.find('b', {'class':'priceLarge'}).contents[0].string
or
price = soup_00.find('b', {'class':'priceLarge'}).text
I get a "noneType" error.
I am quite confused as to why this is happening. The page encoding in the URL on chrome says UTF8, to which my script is adjusted in line #2. I have changed it to ISO (as per inner HTML of the page) but this makes zero difference, so I am positive encoding is not the issue here.
Also, don't know if this is relevant at all, but my system locale on linux being UTF-8 should not cause a problem should it?
There's no need to do this as Amazon provide an API
https://affiliate-program.amazon.co.uk/gp/advertising/api/detail/main.html
The Product Advertising API helps you advertise Amazon products using product search and look up capability, product information and features such as Customer Reviews, Similar Products, Wish Lists and New and Used listings.
More detail here: Amazon API library for Python?
I'm using the API and it so much easier and reliable then scraping the data from the webpage, even with BS. You will also get access to a list of prices for new, second hand etc and not just the "headline" price.