I have an application where I use zmq
with asyncio
to communicate with the clients who have the ability to download a video with youtube-dl
to the server. I tried adding await
to youtube_dl
's download function but it gave me an error since it was not a coroutine. My code right now is simply looking like this:
import asyncio
import youtube_dl
async def networking_stuff():
download = True
while True:
if download:
print("Received a request for download")
await youtube_to_mp3("https://www.youtube.com/watch?v=u9WgtlgGAgs")
download = False
print("Working..")
await asyncio.sleep(2)
async def youtube_to_mp3(url):
ydl_opts = {
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}]
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
loop = asyncio.get_event_loop()
loop.create_task(networking_stuff())
loop.run_forever()
which gives the following output:
Received a request for download
[youtube] u9WgtlgGAgs: Downloading webpage
[youtube] u9WgtlgGAgs: Downloading video info webpage
[youtube] u9WgtlgGAgs: Extracting video information
[youtube] u9WgtlgGAgs: Downloading MPD manifest
[download] Destination: The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.webm
[download] 100% of 4.20MiB in 00:03
[ffmpeg] Destination: The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.mp3
Deleting original file The Cardigans - My Favourite Game “Stone Version”-u9WgtlgGAgs.webm (pass -k to keep)
Working..
Working..
....
Working..
Working..
whereas I would expect the Working..
message to be printed in between youtube-dl
's messages as well. Am I missing something here or is this impossible with async
/await
? Is ffmpeg
blocking? If so, can I run the download in async
without converting to mp3
or is using threads the only way?
You are correct that you cannot simply make any function asynchronous.
Your question assumes that youtube-dl requires ffmpeg to work. It's not entirely true, it can download individual streams by its own means, AFAIK ffmpeg is used only for muxing these streams (video + audio + maybe subtitles) to one file.
In case you use ffmpeg, there's not much to win from performance point of view because if it's used via subprocess (most likely case), then there's at least 1 full-blown process being spawned for doing the work. Interaction with subprocesses can also be done in non-blocking way — see https://docs.python.org/3/library/asyncio-subprocess.html, but anyway if your code spawns a process for each task, it will not scale well in either case.
Otherwise, it might be possible (and make some sense) to fork youtube-dl and make changes so that all network operations are based on asyncio. This is probably quite a lot of refactoring, but it should be doable.
regarding your code:
First, the function youtube_to_mp3
is not asynchronous at all, because there are no code paths which could execute an await …
expression. The meaning of the code would not change at all if you remove the async
word from the function definition and await
from await youtube_to_mp3("…
.
Second, even if it was asynchronous, you are not using it in a way which would allow "parallel" execution. the await
keyword really means that: the control flow in this task will continue only after the awaited coroutine finishes. if you need to run multiple coroutines in "parallel", you will need to not directly await them one by one. There are several ways to run coroutines in parallel, for example you may use https://docs.python.org/3/library/asyncio-task.html#asyncio.gather and await the resulting "combined" coroutine, if all the tasks are known at the same moment (but it doesn't look like your case), or use fire-and-forget approach (loop.create_task).