pythonhtmlbeautifulsoup

Scraping song lyrics with beautifulsoup


Using Genius API, I acquire a song url to the lyrics page. I now want to webcrawl this using beautifulsoup4; however, I run into an error. Here is the code:

from bs4 import BeautifulSoup
import requests

def scrap_song_url(url):
    page = requests.get(url)
    html = BeautifulSoup(page.text, 'html.parser')
    lyrics = html.find('div', class_='lyrics').get_text()

    return lyrics

Here, I am looking at the html for the lyrics page. For the sake of example, look at this specific url: https://genius.com/Acceptance-permanent-lyrics. Spelunking through the html, it appears that the lyrics are contained under div with class 'lyrics' HTML.

However, trying to find this using html.find returns a NoneType object and consequently .get_text() throws an error. I presume this means that, for some reason, the html tag (or whatever you call it, I don't really know html) is not being found. How can I acquire the lyrics from the div class 'lyrics' from a given song lyrics url?


Solution

  • There is a Genius API Python wrapper called lyricsgenius. First, get the access token from Genius:

    1. Sign up to Genius
    2. Visit the https://docs.genius.com
    3. Navigate to the "Authorization: Bearer" and copy it

    Installing is easy with pip:

    pip install lyricsgenius
    

    From its documentation, collecting lyrics look much easier:

    from lyricsgenius import Genius
    
    genius = Genius(<token>)
    artist = genius.search_artist('Kowalsky meg a Vega')
    artist.save_lyrics()