I'm trying to scrape a website for the first time using Beautiful Soup and/or Selenium to locate the CSRF token required for login. When I print the HTML from the page, it doesn't seem to be displaying all the information. For instance, if I go to the page and use the "inspect" tool - I'm not able to find the element with the token value such as:
<input type="hidden" name="_token" value="6eLvVeLX0s0VPqPgb1YUXp3gJ3qNXje5gFcts4ii" autocomplete="off">
It appears that sometimes BS and selenium won't pick up all the text on the page. Is this what is happening and how should I write my code to output the CRSF token value appropriately?
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
# add headless Chrome option
options.add_argument("--headless=new")
# set up Chrome in headless mode
driver = webdriver.Chrome(options=options)
# visit your target site
driver.get("https://stockinvest.us/login?sref=ta-lock")
# output the full-page HTML
html = driver.page_source
soup = BeautifulSoup(html)
print(soup.prettify())
# csrf_token = soup.find("data-cf-beacon", {"crossorigin": "anonymous"})["value"]
csrf_token=driver.find_element(By.TAG_NAME, 'data-cf-beacon')
print(csrf_token)
# release the resources allocated by Selenium and shut down the browser
driver.quit()
You don't really need BeautifulSoup in this case. You can just use Selenium.
You need to navigate to the URL, wait for the element to exist in the DOM (presence), and then return and print the token/value.
The code below works.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
url = 'https://stockinvest.us/login?sref=ta-lock'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
token = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "input[name='_token']"))).get_attribute("value")
print(token)
Output
m8ahvogPL3pXYwKCiG13tzzyiCX2jVQzbNccM2Cq