As soon as i try to scrape a website it loads the browser in another instance but crashes immediately the code and error is attached ->
code:
import selenium.webdriver as webdriver
from selenium.webdriver.chrome.service import Service
import time
def scrape_website(website):
print("Launching the browser!")
option=webdriver.ChromeOptions()
driver=webdriver.Chrome()
try:
driver.get(website)
print("The page is loaded now...")
html=driver.page_source
time.sleep(10)
return html
finally:
driver.quit()
The error:
InvalidArgumentException: Message: invalid argument (Session info: chrome=131.0.6778.205)
Stacktrace: GetHandleVerifier [0x00007FF616866CC5+28821] (No symbol) [0x00007FF6167D3850]
(No symbol) [0x00007FF6166755B9] (No symbol) [0x00007FF616663051] (No symbol) [0x00007FF6166612FD] (No symbol) [0x00007FF616661B3C] (No symbol) [0x00007FF61667885A] (No symbol) [0x00007FF6167101FE] (No symbol) [0x00007FF6166EF2FA] (No symbol) [0x00007FF61670F412] (No symbol) [0x00007FF6166EF0A3] (No symbol) [0x00007FF6166BA778] (No symbol) [0x00007FF6166BB8E1] GetHandleVerifier [0x00007FF616B9FCCD+3408029] GetHandleVerifier [0x00007FF616BB743F+3504143] GetHandleVerifier [0x00007FF616BAB61D+3455469] GetHandleVerifier [0x00007FF61692BDCB+835995] (No symbol) [0x00007FF6167DEB6F] (No symbol) [0x00007FF6167DA824] (No symbol) [0x00007FF6167DA9BD] (No symbol) [0x00007FF6167CA1A9] BaseThreadInitThunk [0x00007FF85F087374+20] RtlUserThreadStart [0x00007FF86057CC91+33]
I am using streamlit to prepare the frontend of the application the code is attached below:
import streamlit as st # type: ignore
from scrape import scrape_website
st.title("College Website Scraper")
url=st.text_input("Enter the Website Address:")
if st.button("Scrape Site"):
st.write("Scraping this Website")
result=scrape_website(url)
print(result)
The URL passed to driver.get() needs to include the scheme - e.g., https
The error you're seeing is due to the absence of that component of the URL.
You can use urlparse from urllib.parse to check various aspects of a URL.
Ignoring streamlit (because it's not relevant to the question) here's an example of how you could check that an input URL contains a scheme:
import selenium.webdriver as webdriver
from selenium.webdriver import ChromeOptions
from urllib.parse import urlparse
def scrape_website(website):
options = ChromeOptions()
options.add_argument("--headless=true")
with webdriver.Chrome(options) as driver:
driver.get(website)
return driver.page_source
while url := input("Enter url to scrape: "):
p = urlparse(url)
if not p.scheme:
print("Scheme missing from url")
else:
html = scrape_website(url)
print("HTML fragment:", html[:80])
Example:
Enter url to scrape: www.google.com
Scheme missing from url
Enter url to scrape: https://www.google.com
HTML fragment: <html itemscope="" itemtype="http://schema.org/WebPage" lang="en-GB"><head><meta