Using Genius API, I acquire a song url to the lyrics page. I now want to webcrawl this using beautifulsoup4
; however, I run into an error. Here is the code:
from bs4 import BeautifulSoup
import requests
def scrap_song_url(url):
page = requests.get(url)
html = BeautifulSoup(page.text, 'html.parser')
lyrics = html.find('div', class_='lyrics').get_text()
return lyrics
Here, I am looking at the html for the lyrics page. For the sake of example, look at this specific url: https://genius.com/Acceptance-permanent-lyrics
. Spelunking through the html, it appears that the lyrics are contained under div
with class 'lyrics'
.
However, trying to find this using html.find
returns a NoneType
object and consequently .get_text()
throws an error. I presume this means that, for some reason, the html tag (or whatever you call it, I don't really know html) is not being found. How can I acquire the lyrics from the div class 'lyrics'
from a given song lyrics url?
There is a Genius API Python wrapper called lyricsgenius
. First, get the access token from Genius:
Installing is easy with pip:
pip install lyricsgenius
From its documentation, collecting lyrics look much easier:
from lyricsgenius import Genius
genius = Genius(<token>)
artist = genius.search_artist('Kowalsky meg a Vega')
artist.save_lyrics()