pythonweb-scrapingpython-requestslxml

Scraping with xpath with requests and lxml but having problems


I keep running into an issue when I scrape data with lxml by using the xpath. I want to scrape the dow price but when I print it out in python it says Element span at 0x448d6c0. I know that must be a block of memory but I just want the price. How can I print the price instead of the place in memory it is?

from lxml import html
import requests

page = requests.get('https://markets.businessinsider.com/index/realtime- 
chart/dow_jones')
content = html.fromstring(page.content)

#This will create a list of prices:
prices = content.xpath('//*[@id="site"]/div/div[3]/div/div[3]/div[2]/div/table/tbody/tr[1]/th[1]/div/div/div/span')

#This will create a list of volume:


print (prices)

Solution

  • You're getting generators which as you said are just memory locations. To access them, you need to call a function on them, in this case, you want the text so .text

    Additionally, I would highly recommend changing your XPath since it's a literal location and subject to change.

    prices = content.xpath("//div[@id='site']//div[@class='price']//span[@class='push-data ']")
    prices_holder = [i.text for i in prices]
    prices_holder
     ['25,389.06',
     '25,374.60',
     '7,251.60',
     '2,813.60',
     '22,674.50',
     '12,738.80',
     '3,500.58',
     '1.1669',
     '111.7250',
     '1.3119',
     '1,219.58',
     '15.43',
     '6,162.55',
     '67.55']
    

    Also of note, you will only get the values at load. If you want the prices as they change, you'd likely need to use Selenium.