pythonxpathweb-scrapinglxmllxml.html

Lxml is returning an empty list


I am working with lxml to try to get the top 10 hits currently on spotify(https://spotifycharts.com/regional). When I run the program, it returns an empty list [] instead of returning ['song 1', 'song 2', etc].

    import requests
    import lxml.html

    html = requests.get("https://spotifycharts.com/regional")
    doc = lxml.html.fromstring(html.content)

    songs = doc.xpath('//div[@id="content"]')[0]
    titles = songs.xpath('.//div[@class="chart-table-track"]/text()')
    print(titles)

I'm not sure if it was an xpath problem or not, but when I went to go look for another id on the site, there wasn't any. Also the id "content" is what contained the text that I needed. Same thing for "chart-table-track". Im not sure if I wrote the wrong syntax or not, but any help would be appreciated.

Thanks,


Solution

  • You can try like the following to get the first ten hits (rank and name) from that webpage. I used BeautifulSoup instead of lxml library to fetch the content.

    import requests
    from bs4 import BeautifulSoup
    
    html = requests.get("https://spotifycharts.com/regional")
    doc = BeautifulSoup(html.content,"lxml")
    for items in doc.select('table.chart-table tr')[1:11]:
        rank = items.select_one("td.chart-table-position").get_text(strip=True)
        name = items.select_one("td.chart-table-track > strong").get_text(strip=True)
        print(rank,name)
    

    Output:

    1 Blinding Lights
    2 The Box
    3 Dance Monkey
    4 Don't Start Now
    5 Roses - Imanbek Remix
    6 In Your Eyes
    7 death bed (coffee for your head) (feat. beabadoobee)
    8 Say So
    9 Intentions (feat. Quavo)
    10 Falling