I am writing a webscraper using Python and Beutifulsoup.
It was not long before my IP got blocked. I now need to rotate my IP so that I can connect to the website and scrape the required data.
I mostly followed tutorials and git repo documentation:
I am just following the tutorials line by line, not 100% certain if I am doing the right things.
I have set the torrc file to:
# This file was generated by Tor; if you edit it, comments will not be preserved
# The old torrc file was renamed to torrc.orig.1, and Tor will ignore it
ClientOnionAuthDir /Users/user/Library/Application Support/TorBrowser-Data/Tor/onion-auth
DataDirectory /Users/user/Library/Application Support/TorBrowser-Data/Tor
GeoIPFile /Applications/Tor Browser.app/Contents/Resources/TorBrowser/Tor/geoip
GeoIPv6File /Applications/Tor Browser.app/Contents/Resources/TorBrowser/Tor/geoip6
ControlPort 9051
HashedControlPassword my_hashed_password
CookieAuthentication 1
The my_hashed_password
I got by running tor --hash-password my_password
.
I went on to create a config
file in the directory where privoxy
is installed with the following content:
forward-socks5 / 127.0.0.1:9050 .
Every time I change something in these two files I run a short script to restart the services and call privoxy to check everything is ok:
brew services restart tor
brew services restart privoxy
privoxy
When I run a test script:
import time
from urllib.request import ProxyHandler, build_opener, install_opener, Request, urlopen
from stem import Signal
from stem.control import Controller
class TorHandler:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'}
def open_url(self, url):
# communicate with TOR via a local proxy (privoxy)
def _set_url_proxy():
proxy_support = ProxyHandler({'http': '127.0.0.1:8118'})
opener = build_opener(proxy_support)
install_opener(opener)
_set_url_proxy()
request = Request(url, None, self.headers)
return urlopen(request).read().decode('utf-8')
@staticmethod
def renew_connection():
__TOR_password__ = 'my_password'
__TOR_hashed_password__ = 'my_hashed_password'
with Controller.from_port(port=9051) as controller:
controller.authenticate(password=__TOR_password__)
controller.signal(Signal.NEWNYM)
controller.close()
if __name__ == '__main__':
wait_time = 2
number_of_ip_rotations = 3
tor_handler = TorHandler()
ip = tor_handler.open_url('http://icanhazip.com/')
print('My first IP: {}'.format(ip))
# Cycle through the specified number of IP addresses via TOR
for i in range(0, number_of_ip_rotations):
old_ip = ip
seconds = 0
tor_handler.renew_connection()
# Loop until the 'new' IP address is different than the 'old' IP address,
# It may take the TOR network some time to effect a different IP address
while ip == old_ip:
time.sleep(wait_time)
seconds += wait_time
print('{} seconds elapsed awaiting a different IP address.'.format(seconds))
ip = tor_handler.open_url('http://icanhazip.com/')
print('My new IP: {}'.format(ip))
Note: I have tried both TOR_password and TOR_hashed_password.
I receive the following output:
"/Users/code/venv/bin/python" "/Users/code/proxy_rotation.py"
My first IP: 185.220.101.16
Traceback (most recent call last):
File "/Users/code/venv/lib/python3.8/site-packages/stem/socket.py", line 535, in _make_socket
control_socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/code/proxy_rotation.py", line 48, in <module>
tor_handler.renew_connection()
File "/Users/code/proxy_rotation.py", line 29, in renew_connection
with Controller.from_port(port=9051) as controller:
File "/Users/code/venv/lib/python3.8/site-packages/stem/control.py", line 1033, in from_port
control_port = stem.socket.ControlPort(address, port)
File "/Users/code/venv/lib/python3.8/site-packages/stem/socket.py", line 503, in __init__
self.connect()
File "/Users/code/venv/lib/python3.8/site-packages/stem/socket.py", line 172, in connect
self._socket = self._make_socket()
File "/Users/code/venv/lib/python3.8/site-packages/stem/socket.py", line 538, in _make_socket
raise stem.SocketError(exc)
stem.SocketError: [Errno 61] Connection refused
Process finished with exit code 1
I would appreciate some assistance in:
Thank you
It looks like my Tor was not started.
When I manually start it as an application I can connect and request a website.
Ill have to look at how I can automatically start Tor without having to start it from my applications