I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?
for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
print(d)
And let say this is the songs list that I want to scrape from it. https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14
In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
browser.get(url)
wait = WebDriverWait(browser, 5)
try:
wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
print("accepted cookies")
except Exception as e:
print('no cookie button')
header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", header_to_be_removed)
while True:
try:
show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
show_more_button.location_once_scrolled_into_view
t.sleep(5)
show_more_button.click()
print('clicked to show more')
t.sleep(3)
except TimeoutException:
print('all done')
break
songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
print('we have', len(songs), 'songs')
song_list = []
for song in songs:
song.location_once_scrolled_into_view
t.sleep(1)
title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
song_list.append((artist.text, title.text))
df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
print(df)
This will print out in terminal:
no cookie button
clicked to show more
clicked to show more
clicked to show more
clicked to show more
all done
we have 50 songs
| Title | Artist | |
|---|---|---|
| 0 | Bizarrap, | Quevedo: Bzrp Music Sessions, Vol. 52 |
| 1 | Harry Styles | As It Was |
| 2 | Bad Bunny, | Me Porto Bonito |
| 3 | Bad Bunny | Tití Me Preguntó |
| 4 | Manuel Turizo | La Bachata |
| 5 | ROSALÍA | DESPECHÁ |
| 6 | BLACKPINK | Pink Venom |
| 7 | David Guetta, | I'm Good (Blue) |
| 8 | OneRepublic | I Ain't Worried |
| 9 | Bad Bunny | Efecto |
| 10 | Chris Brown | Under The Influence |
| 11 | Steve Lacy | Bad Habit |
| 12 | Bad Bunny, | Ojitos Lindos |
| 13 | Kate Bush | Running Up That Hill (A Deal With God) - 2018 Remaster |
| 14 | Joji | Glimpse of Us |
| 15 | Nicki Minaj | Super Freaky Girl |
| 16 | Bad Bunny | Moscow Mule |
| 17 | Rosa Linn | SNAP |
| 18 | Glass Animals | Heat Waves |
| 19 | KAROL G | PROVENZA |
| 20 | Charlie Puth, | Left and Right (Feat. Jung Kook of BTS) |
| 21 | Harry Styles | Late Night Talking |
| 22 | The Kid LAROI, | STAY (with Justin Bieber) |
| 23 | Tom Odell | Another Love |
| 24 | Central Cee | Doja |
| 25 | Stephen Sanchez | Until I Found You |
| 26 | Bad Bunny | Neverita |
| 27 | Post Malone, | I Like You (A Happier Song) (with Doja Cat) |
| 28 | Lizzo | About Damn Time |
| 29 | Nicky Youre, | Sunroof |
| 30 | Elton John, | Hold Me Closer |
| 31 | Luar La L | Caile |
| 32 | KAROL G, | GATÚBELA |
| 33 | The Weeknd | Die For You |
| 34 | Bad Bunny, | Tarot |
| 35 | James Hype, | Ferrari |
| 36 | Imagine Dragons | Bones |
| 37 | Elton John, | Cold Heart - PNAU Remix |
| 38 | The Neighbourhood | Sweater Weather |
| 39 | Ghost | Mary On A Cross |
| 40 | Shakira, | Te Felicito |
| 41 | Justin Bieber | Ghost |
| 42 | Bad Bunny, | Party |
| 43 | Drake, | Jimmy Cooks (feat. 21 Savage) |
| 44 | Doja Cat | Vegas (From the Original Motion Picture Soundtrack ELVIS) |
| 45 | Camila Cabello, | Bam Bam (feat. Ed Sheeran) |
| 46 | Rauw Alejandro, | LOKERA |
| 47 | Rels B | cómo dormiste? |
| 48 | The Weeknd | Blinding Lights |
| 49 | Arctic Monkeys | 505 |
Of course you can get other info like chart ranking, all artists when there are more than one, etc.
Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.
Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html
For selenium docs, visit: https://www.selenium.dev/documentation/