How to this python-spotify-recommendation question?

I'm trying to make my recommendation list using this dataset.

I want to group songs by artists_upd、id and artists in order. The original one is this and I want to make it like this but I failed and it showed keyerror 'artists' and I can't understand why?

import pandas as pd
import numpy as np
import json
import re 
import sys
import itertools

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util

import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

pd.set_option('display.max_columns', None)
pd.set_option("max_rows", None)

spotify_df = pd.read_csv('/content/drive/MyDrive/tracksong/tracks.csv')
spotify_df.head()

data_w_genre = pd.read_csv('/content/drive/MyDrive/tracksong/artists.csv')
data_w_genre.head()

spotify_df['artists_upd_v1'] = spotify_df['artists'].apply(lambda x: re.findall(r"'([^']*)'", x))
spotify_df['artists_upd_v1'].values[0][0]

spotify_df[spotify_df['artists_upd_v1'].apply(lambda x: not x)].head(5)

spotify_df['artists_upd_v2'] = spotify_df['artists'].apply(lambda x: re.findall('\"(.*?)\"',x))
spotify_df['artists_upd'] = np.where(spotify_df['artists_upd_v1'].apply(lambda x: not x), spotify_df['artists_upd_v2'], spotify_df['artists_upd_v1'] )


spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+str(row['name']),axis = 1)
#original code is -> spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+row['name'],axis = 1)

spotify_df.sort_values(['artists_song','release_date'], ascending = False, inplace = True)

spotify_df[spotify_df['name']=='Adore You']

spotify_df.drop_duplicates('artists_song',inplace = True)

spotify_df[spotify_df['name']=='Adore You']

artists_exploded = spotify_df[['artists_upd','id']].explode('artists_upd')

artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')

KeyError: 'artists'

Here is my code (last line) and here is the original code (line 25).

Solution

The issue is that you are attempting to merge on the artists column of data_w_genre (with the parameter right_on = 'artists'), but data_w_genre does not have a column with that name.

 artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')

In the original code, the csv file from which data_w_genre is imported has an artists column:

 data_w_genre.dtypes

in Cell 9 outputs:

artists              object
acousticness        float64
danceability        float64
duration_ms         float64
energy              float64
instrumentalness    float64
liveness            float64
loudness            float64
speechiness         float64
tempo               float64
valence             float64
popularity          float64
key                   int64
mode                  int64
count                 int64
genres               object
dtype: object

But the same line in Cell 9 in your code outputs:

id             object
followers     float64
genres         object
name           object
popularity    float64
dtype: object

Note that artists is missing. Without an artists column in data_w_genre, you won't be able to reproduce the line from the original code. I would look into where the original author sourced 'data_w_genre.csv' and try to obtain or create a file with the same format.