pythonweb-scrapingbeautifulsouprequestyoutube

Requests and BeautifulSoup to get video length from YouTube


In getting the video length from a YouTube url, Inspect from web browser shows there's a line:

enter image description here

Then I use requests and BeautifulSoup to get it:

import requests
from bs4 import BeautifulSoup

url = "https://www.youtube.com/watch?v=ANYyoutubeLINK"

response = requests.get(url)
response.raise_for_status()

soup = BeautifulSoup(response.text, 'html.parser')

duration_span = soup.find_all('span', class_='ytp-time-duration')

print (duration_span)

Neither "soup.find_all" nor "soup.find" works. What went wrong?


Solution

  • The element you are searching for doesn't exist in the response. Without JS rendering you will not get the information you are seeking. Use selenium in headless mode and you will get the time. You can use Beautifulsoup or get the data directly from Webdriver.

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from bs4 import BeautifulSoup
    
    chrome_options = Options()
    chrome_options.add_argument('--start-maximized')
    chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)
    
    driver = webdriver.Chrome(options=chrome_options)
    
    URL = "https://www.youtube.com/watch?v=ANYyoutubeLINK"
    driver.get(URL)
    
    #Get the time directly from webdriver
    duration = driver.find_element(By.CLASS_NAME,'ytp-time-duration')
    print(f"From webdriver: {duration.text}")
    
    #Get the time using beautifulsoup
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    duration_span = soup.find('span', class_='ytp-time-duration')
    print (f"From beautifulsoup: {duration_span.text}")
    
    #quit the webdriver
    driver.quit()
    

    Output:

    From webdriver: 1:43
    From beautifulsoup: 1:43