pythonxpathrequestfetchtabular-form

Python XPATH get data from tabular table


I am trying to fetch data from a tabular table. I want to get all data from the table but for some reason I can't even get the title to display. Can someone give me some pointers as to what I'm doing wrong here? Thanks

from lxml import html
import requests

    page = requests.get("https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000501")
    tree = html.fromstring(page.content)

    title = tree.xpath('//*[@id="1_1"]/text()')
    print("title", title)

Solution

  • After making some tests, you need to pass a cookie value in the header of your request. Otherwise you won't be able to get the page. Code :

    from lxml import html
    import requests
    
    url = 'https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000501'
    headers = {'Cookie': 'TS011c6724=01bc1e93397eb3e6d45954baff82f1dc5a53f5c7c9d6e15b0e5924fa1271e6172d10ebdde1926759324799c768ddd4eb7c4fa9c487'}
    r = requests.get(url,headers=headers)
    tree = html.fromstring(r.content)
    
    print(tree.xpath('//th[@id="1_1"]')[0].text)
    
    for elm in tree.xpath('//tr[./th[contains(.,"years")]]/td[1]'):
        print(elm.text)
    

    Output (Canada, and the population estimates for both sexes in 2015) :

    Canada 
    1,928,878
    1,969,492
    1,895,463
    2,092,961
    2,395,623
    ...