I am trying to scrape a betting site. However, when I check for the retrieved data in scrapy shell, I receive nothing.
The xpath to what I need is: //*[@id="yui_3_5_0_1_1562259076537_31330"] and when I write in the shell this is what I get:
In [18]: response.xpath ( '//*[@id="yui_3_5_0_1_1562259076537_31330"]')
Out[18]: []
The output is [] but I expected to be something from which I could extract the href.
When I use the "inspect" tool from Chrome, while the site is still loading, this id is outlined in purple. Does this mean that the site is using JavaScipt? And if this is true, is this the reason why scrapy does not find the item and returns []?
This the items.py file
import scrapy
class LifeMatchsItem(scrapy.Item):
Event = scrapy.Field() # Name of event
Match = scrapy.Field() # Teams1 vs Team2
Date = scrapy.Field() # Date of Match
This is my Spider code
import scrapy
from LifeMatchesProject.items import LifeMatchsItem
class LifeMatchesSpider(scrapy.Spider):
name = 'life_matches'
start_urls = ['http://www.betfair.com/sport/home#sscpl=ro/']
custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}
def parse(self, response):
for event in response.xpath('//div[contains(@class,"events-title")]'):
for element in event.xpath('./following-sibling::ul[1]/li'):
item = LifeMatchsItem()
item['Event'] = event.xpath('./a/@title').get()
item['Match'] = element.xpath('.//div[contains(@class,"event-name-info")]/a/@data-event').get()
item['Date'] = element.xpath('normalize-space(.//div[contains(@class,"event-name-info")]/a//span[@class="date"]/text())').get()
yield item
And this is the result