Detailed: I'm using python to make a small app that scrapes a top 100 song list and creates a spotify playlist from it. I'm bottlenecked by the fact that the spotify API only lets you search by one song at a time (to get its internal spotify ID).
Short: I tried multithreading with mixed results.
For reference, this is what search song does, not entirely relevant:
def __search_song(self, song: str):
result = self.sp.search(song + " NOT Karaoke", limit=1, type="track")
try:
sid = result["tracks"]["items"][0]["uri"]
except IndexError:
pass
else:
self.song_list.append(sid)
Initial implementation:
def __populate_playlist(self, song_list: list, pid: str):
for song in song_list:
self.__search_song(song)
self.sp.playlist_add_items(pid, self.song_list)
This was normal execution, "one after another", it worked fine, but it was slow and it made the window hang because of the UI (Tkinter needs to refresh constantly).
Multithreading using threading and queue:
q = queue.Queue()
def __worker():
while True:
item = q.get()
q.task_done()
threading.Thread(target=__worker, daemon=True).start()
def __populate_playlist(self, song_list: list, pid: str):
for song in song_list:
q.put(self.__search_song(song))
q.join()
self.sp.playlist_add_items(pid, self.song_list)
This worked, however, it was marginally faster than the original. It did fix the issue with the program appearing to not respond, but it was not fast enough.
I then tried to drop the queue and implement unordered threading.
def __populate_playlist(self, song_list: list, pid: str):
# multiprocessing support
threads = []
for song in song_list:
t = threading.Thread(target=self.__search_song, args=(song, ))
threads.append(t)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
self.sp.playlist_add_items(pid, self.song_list)
and this was very fast, I'm talking a reduction from 23 to 8 seconds. Obviously this has the unintended consequence that the playlist is shuffled, and it's not a real top 100 anymore.
My question is simple, is there an issue with my implementation of a queue, or does using a queue system provide this much overhead inherently? This is the first time ever I've implemented multithreading in an application, so I might be missing something.
To iterate over the use case once more, I don't really care which one finishes first, as long as the order is maintained. I thought about storing the initial order of the list and using a dictionary to hold its order and spotify ID, but I'm still thinking about the actual implementation of that.
As mentioned it's very difficult to guarantee an order if you want asynchronous calls. But a simple implementation of mapping the ID to the name of the song would be:
def __search_song(self, song: str):
result = self.sp.search(song + " NOT Karaoke", limit=1, type="track")
try:
sid = result["tracks"]["items"][0]["uri"]
except IndexError:
pass
else:
self.song_list.append(sid)
self.song_to_sid[song] = sid
Given that the dict song_to_sid
is instanciated in you class.
If you then just iterate over you first map (if that is in order) you may append the mapped sid to have an ordered playlist.
After you have run the __populate_playlist
function you can do:
top_hundred_playlist = []
for song_id in self.song_list:
top_hundred_playlist(self.song_to_sid[song_id])