from bs4 import BeautifulSoup
import requests
import random
id_url = "https://codeforces.com/profile/akash77"
id_headers = {
"User-Agent": 'Mozilla/5.0(Windows NT 6.1Win64x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 87.0 .4280 .141 Safari / 537.36 '}
id_page = requests.get(id_url, headers=id_headers)
id_soup = BeautifulSoup(id_page.content, 'html.parser')
id_soup = id_soup.find('svg')
print(id_soup)
I'm getting None
as the output for this.
If I parse the <div>
element in which this <svg>
tag is contained, the contents of the <div>
element are not getting printed. The find()
works for all HTML tags except the SVG tag.
The webpage is rendered dynamically with Javascript, so you will need selenium to get the rendered page.
First, install the libraries
pip install selenium
pip install webdriver-manager
Then, you can use it to access the full page
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.maximize_window()
driver.get('https://codeforces.com/profile/akash77')
elements = driver.find_elements(By.XPATH, '//*[@id="userActivityGraph"]')
Elements is a selenium WebElement, so we will need to get HTML out of it.
svg = [WebElement.get_attribute('innerHTML') for WebElement in elements]
This gives you svg and all elements inside it.
Sometimes, you need to run a browser in headless mode (without opening a chrome UI), for that you can pass a 'headless' option to the driver.
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('headless')
# then pass options to the driver
driver = webdriver.Chrome(service=s, options=options)