pythonseleniumgoogle-colaboratorytiktok

Selenium Python Unable to scroll down in tiktok while fetching videos


i am trying to use Selenium Python to open tiktok user page and scroll down to load all user videos i can open the url and get the source code including all loaded videos data, but when scroll down and time sleep for a while and get source code, the page code is the sane with same videos and nothing new is loaded!!

from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results

wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/@tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)

SCROLL_PAUSE_TIME = 20

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return      document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

print(wd.page_source)

i also tried to use this code for scroll down

wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
print(wd.page_source)

but also nothing is loaded in source code! , i am using google colab, any help?

install

# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium

Solution

  • I had the same problem. Although I am using playwright, with playwright-stealth, not selenium.

    The problem is that tiktok detects that you are headless and throttles you. Or at least that was my problem.

    Simply adding a browser flag: "--headless=new" fixed it. This argument makes a new recently released version of headless chromium to be used. And this version is much less detectable. Just make sure you use a recent version of chromium.