I'm trying to make my recommendation list using this dataset.
I want to group songs by artists_upd、id and artists in order. The original one is this and I want to make it like this but I failed and it showed keyerror 'artists' and I can't understand why?
import pandas as pd
import numpy as np
import json
import re
import sys
import itertools
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.oauth2 import SpotifyOAuth
import spotipy.util as util
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
pd.set_option('display.max_columns', None)
pd.set_option("max_rows", None)
spotify_df = pd.read_csv('/content/drive/MyDrive/tracksong/tracks.csv')
spotify_df.head()
data_w_genre = pd.read_csv('/content/drive/MyDrive/tracksong/artists.csv')
data_w_genre.head()
spotify_df['artists_upd_v1'] = spotify_df['artists'].apply(lambda x: re.findall(r"'([^']*)'", x))
spotify_df['artists_upd_v1'].values[0][0]
spotify_df[spotify_df['artists_upd_v1'].apply(lambda x: not x)].head(5)
spotify_df['artists_upd_v2'] = spotify_df['artists'].apply(lambda x: re.findall('\"(.*?)\"',x))
spotify_df['artists_upd'] = np.where(spotify_df['artists_upd_v1'].apply(lambda x: not x), spotify_df['artists_upd_v2'], spotify_df['artists_upd_v1'] )
spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+str(row['name']),axis = 1)
#original code is -> spotify_df['artists_song'] = spotify_df.apply(lambda row: row['artists_upd'][0]+row['name'],axis = 1)
spotify_df.sort_values(['artists_song','release_date'], ascending = False, inplace = True)
spotify_df[spotify_df['name']=='Adore You']
spotify_df.drop_duplicates('artists_song',inplace = True)
spotify_df[spotify_df['name']=='Adore You']
artists_exploded = spotify_df[['artists_upd','id']].explode('artists_upd')
artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')
KeyError: 'artists'
Here is my code (last line) and here is the original code (line 25).
The issue is that you are attempting to merge on the artists
column of data_w_genre
(with the parameter right_on = 'artists'
), but data_w_genre
does not have a column with that name.
artists_exploded_enriched = artists_exploded.merge(data_w_genre, how = 'left', left_on = 'artists_upd',right_on = 'artists')
In the original code, the csv file from which data_w_genre
is imported has an artists
column:
data_w_genre.dtypes
in Cell 9 outputs:
artists object
acousticness float64
danceability float64
duration_ms float64
energy float64
instrumentalness float64
liveness float64
loudness float64
speechiness float64
tempo float64
valence float64
popularity float64
key int64
mode int64
count int64
genres object
dtype: object
But the same line in Cell 9 in your code outputs:
id object
followers float64
genres object
name object
popularity float64
dtype: object
Note that artists
is missing. Without an artists
column in data_w_genre
, you won't be able to reproduce the line from the original code. I would look into where the original author sourced 'data_w_genre.csv' and try to obtain or create a file with the same format.