youtube-apiyoutube-data-api

How do some sites download YouTube captions?


This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how do sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?

In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with

403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption.

On these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?


Solution

  • Send a GET request on:

    http://video.google.com/timedtext?lang={LANG}&v={VIDEOID}
    

    Example for your video in comment: http://video.google.com/timedtext?lang=ko&v=0db1_qWZjRA

    Let's look at another example of yours, i.e. https://www.youtube.com/watch?v=7068mw-6lmI (and I agree about differentiation part in your comment).

    There are multiple subtitles available for the video

    These stand for the subtitle name parameter (i.e., name=English).

    lang stands for the country code. In your example: https://www.youtube.com/api/timedtext?lang=es-MX&v=7068mw-6lmI&name=Spanish

    If subtitle track is available, it is possible to do translation form it, namely using tlang parameter.

    https://www.youtube.com/api/timedtext?lang=en&v=7068mw-6lmI&name=English&tlang=lv
    https://www.youtube.com/api/timedtext?lang=ko&v=7068mw-6lmI&name=Korean&tlang=lv
    

    This would be my bid for what these sites are using, i.e. translation of the available subtitle track (confirm by trying to use a video without subtitle track as input for one of their sites).

    As for asr signature seems to always be needed, but as long as one of the subtitle tracks are available, you could use that for translation. E.g. in your OP comment example:

    https://www.youtube.com/api/timedtext?lang=en&v=vx6NCUyg1NE&tlang=lv
    

    Looks like the last example is special with both of subtitle tracks being asr (checked with Chrome -> Inspect -> Network) therefore you need to omit the subtitle name parameter part. This difference unfortunately is not visible in YouTube video's settings wheel.