pythonxmlpytube

xml to srt conversion not working after installing pytube


I have installed pytube to extract captions from some youtube videos. Both the following code give me the xml captions.

from pytube import YouTube
yt = YouTube('https://www.youtube.com/watch?v=4ZQQofkz9eE')
caption = yt.captions['a.en']
print(caption.xml_captions)

and also as mentioned in the docs

yt = YouTube('http://youtube.com/watch?v=2lAe1cqCOXo')
caption = yt.captions.get_by_language_code('en')
caption.xml_captions

But in both cases, I get the xml output and when use

print(caption.generate_srt_captions())

I get an error like the following. Can you help on how to extract the srt format?

KeyError
~/anaconda3/envs/myenv/lib/python3.6/site-packages/pytube/captions.py in 
generate_srt_captions(self)
49         recompiles them into the "SubRip Subtitle" format.
50         """
51         return self.xml_caption_to_srt(self.xml_captions)
52 
53     @staticmethod

~/anaconda3/envs/myenv/lib/python3.6/site-packages/pytube/captions.py in 
xml_caption_to_srt(self, xml_captions)
81             except KeyError:
82                 duration = 0.0
83             start = float(child.attrib["start"])
84             end = start + duration
85             sequence_number = i + 1  # convert from 0-indexed to 1.

KeyError: 'start'

Solution

  • This is a bug in the library itself. Everything below is done in pytube 11.01. In the captions.py file on line 76 replace:

    for i, child in enumerate(list(root)):
    

    to:

    for i, child in enumerate(list(root.findall('body/p'))):
    

    Then on line 83, replace:

    duration = float(child.attrib["dur"])
    

    to:

    duration = float(child.attrib["d"])
    

    Then on line 86, replace:

    start = float(child.attrib["start"])
    

    to:

    start = float(child.attrib["t"])
    

    If only the number of lines and time will be displayed but no subtitle text, replace line 77:

    text = child.text or ""
    

    to:

    text = ''.join(child.itertext()).strip()
    if not text:
        continue
    

    It worked for me, python 3.9, pytube 11.01. Good luck!