To be able to capture headers (the Selenium library does not support this) I decided to use the Selenium Wire library. I found the following website: https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/snippets/60 that explains how to use the Selenium Wire library with the Tor browser. However, when I use the code from this page I get a connection error, quote "Error connecting to SOCKS5 proxy 127.0.0.1:9150: [WinError 10061]". I also can't set header capture according to the documentation of the Selenium Wire library: https://github.com/wkeeling/selenium-wire . The documentation states that this should be according to the formula:
def interceptor(request):
del request.headers['Referer'] # Remember to delete the header first
request.headers['Referer'] = 'some_referer' # Spoof the referer
driver.request_interceptor = interceptor
driver.get(...)
# All requests will now use 'some_referer' for the referer
However, it does not explain what a request is or why a function reference is not interceptor()
.
As for the proxy settings from the example, for this to work, you must first open the Tor browser. In the following code, this is done by a script. This is because in order to set up a proxy, it must first work. When it comes to capturing headers, you should follow the Selenium Wire documentation exactly. Below is a working script that allows you to capture headers:
import os
import time
from seleniumwire import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
def firefoxdriver(my_url):
"""Preparing of the Tor browser for the work."""
# The location of the Tor Browser bundle
# for my laptop.
# tbb_dir = r'C:\Users\Oliver\Desktop\Tor Browser'
# for my mainframe.
tbb_dir = r'C:\Users\olive\OneDrive\Pulpit\Tor Browser'
# Set the Tor Browser binary and profile.
tb_binary = tbb_dir + r'\Browser\firefox.exe'
tb_profile = tbb_dir + r'\Browser\TorBrowser\Data\Browser\profile.default'
binary = FirefoxBinary(tb_binary)
profile = FirefoxProfile(tb_profile)
# Open Tor Browser to allow to work on the proxy.
torexe = os.popen(tb_binary)
# Disable Tor Launcher to prevent it connecting the Tor Browser to
# Tor directly.
os.environ['TOR_SKIP_LAUNCH'] = '1'
os.environ['TOR_TRANSPROXY'] = '1'
# Disable HTTP Strict Transport Security (HSTS) in order to have
# seleniumwire between the browser and Tor.
profile.set_preference("security.cert_pinning.enforcement_level", 0)
profile.set_preference("network.stricttransportsecurity.preloadlist", False)
# Tell Tor Button it is OK to use seleniumwire
profile.set_preference("extensions.torbutton.local_tor_check", False)
profile.set_preference("extensions.torbutton.use_nontor_proxy", True)
# Enable JavaScript at all, otherwise JS stays disabled regardless
# of the Tor Browser's security slider value.
profile.set_preference("browser.startup.homepage_override.mstone", "68.8.0")
# Configure seleniumwire to upstream traffic to Tor running on
# port 9150.
# It is possible to increase/decrease the timeout if you are trying
# to a load page that requires a lot of requests. It is in
# seconds.
options = {
'proxy': {
'http': 'socks5h://127.0.0.1:9150',
'https': 'socks5h://127.0.0.1:9150',
'connection_timeout': 20
}
}
driver = webdriver.Firefox(firefox_profile=profile,
firefox_binary=binary,
seleniumwire_options=options)
return driver
def interceptor(request):
"""
Adding the headers to the browser - create a request interceptor.
"""
del request.headers['User-Agent']
request.headers['User-Agent'] = ('Mozilla/5.0 (Windows NT 10.0;rv:102.0)'+
' Gecko/20100101 Firefox/102.0')
del request.headers['Accept']
request.headers['Accept'] = ('text/html,application/xhtml+xml,application'+
'/xml;q=0.9,image/avif,image/webp,*/*;q=0.8')
del request.headers['Accept-Language']
request.headers['Accept-Language'] = 'en-US,en;q=0.5'
# Variable with the URL of the website.
my_url = 'https://httpbin.org/headers'
# Preparing of the Tor browser for the work.
driver = firefoxdriver(my_url)
# Adding the headers to the browser - set the interceptor on the
# driver.
driver.request_interceptor = interceptor
# Loads the website code as the Selenium object.
driver.get(my_url)
# Access requests via the `requests` attribute.
for request in driver.requests:
if request.response:
print(
request.url,
request.response.status_code,
request.response.headers['Content-Type'],
request.headers
)
time.sleep(15)
driver.quit()