pythonspotifyrecommendation-engine

How to this python-spotify-recommendation question?


I'm trying to make my recommendation list using this dataset.

I want to group songs by artists_upd、id and artists in order. The original one is this and I want to make it like this but I failed and it showed keyerror 'artists' and I can't understand why?

import pandas as pd
import numpy as np
import json
import re 
import sys
import itertools

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util

import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

pd.set_option('display.max_columns', None)
pd.set_option("max_rows", None)

spotify_df = pd.read_csv('/content/drive/MyDrive/tracksong/tracks.csv')
spotify_df.head()

data_w_genre = pd.read_csv('/content/drive/MyDrive/tracksong/artists.csv')
data_w_genre.head()

spotify_df['artists_upd_v1'] = spotify_df['artists'].apply(lambda x: re.findall(r"'([^']*)'", x))
spotify_df['artists_upd_v1'].values[0][0]

spotify_df[spotify_df['artists_upd_v1'].apply(lambda x: not x)].head(5)

spotify_df['artists_upd_v2'] = spotify_df['artists'].apply(lambda x: re.findall('\"(.*?)\"',x))
spotify_df['artists_upd'] = np.where(spotify_df['artists_upd_v1'].apply(lambda x: not x), spotify_df['artists_upd_v2'], spotify_df['artists_upd_v1'] )


spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+str(row['name']),axis = 1)
#original code is -> spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+row['name'],axis = 1)

spotify_df.sort_values(['artists_song','release_date'], ascending = False, inplace = True)

spotify_df[spotify_df['name']=='Adore You']

spotify_df.drop_duplicates('artists_song',inplace = True)

spotify_df[spotify_df['name']=='Adore You']

artists_exploded = spotify_df[['artists_upd','id']].explode('artists_upd')

artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')

KeyError: 'artists'

Here is my code (last line) and here is the original code (line 25).


Solution

  • The issue is that you are attempting to merge on the artists column of data_w_genre (with the parameter right_on = 'artists'), but data_w_genre does not have a column with that name.

     artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')
    

    In the original code, the csv file from which data_w_genre is imported has an artists column:

     data_w_genre.dtypes
    

    in Cell 9 outputs:

    artists              object
    acousticness        float64
    danceability        float64
    duration_ms         float64
    energy              float64
    instrumentalness    float64
    liveness            float64
    loudness            float64
    speechiness         float64
    tempo               float64
    valence             float64
    popularity          float64
    key                   int64
    mode                  int64
    count                 int64
    genres               object
    dtype: object
    

    But the same line in Cell 9 in your code outputs:

    id             object
    followers     float64
    genres         object
    name           object
    popularity    float64
    dtype: object
    

    Note that artists is missing. Without an artists column in data_w_genre, you won't be able to reproduce the line from the original code. I would look into where the original author sourced 'data_w_genre.csv' and try to obtain or create a file with the same format.