pythonweb-scrapinglxml.html

How to use lxml for web scraping?


I want to write a python script that fetches my current reputation on stack overflow --https://stackoverflow.com/users/14483205/raunanza?tab=profile

This is the code I have written.

from lxml import html 
import requests
page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
tree = html.fromstring(page.content) 

Now, what to do to fetch my reputation. (I can't understand how to use xpath even
after googling it.)


Solution

  • You need to make some modifications in your code to get the xpath. Below is the code:

    from lxml import HTML 
    import requests
    
    page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
    tree = html.fromstring(page.content) 
    title = tree.xpath('//*[@id="avatar-card"]/div[2]/div/div[1]/text()')
    print(title) #prints 3
    

    You can easily get the xpath of element in chrome console(inspect option). enter image description here

    To learn more about xpath you can refer: https://www.w3schools.com/xml/xpath_examples.asp