I'm new here!
And I'm stuck, unfortunately.
I need a single proxy accessible under IP:Port for scraping activities to bypass cloudflare firewall, that can be entered in a browser. Internally, it should relay/tunnel the request, each time the page is reloaded, unchanged to another proxy randomly selected from a list. The proxies on the list do not need login handling. So it 'should' be quite simple to implement. Unfortunately I can't seem to implement it with my preferred language PHP and I'm a absolutely beginner in Python. I wrote (and built from snippets) the following code, it starts without errors, and I can connect to the local python-proxy (according to wget -v output), but unfortunately nothing happens after that. Except a timeout. I tried it on Windows (firewall disabled) and on Debian. The external proxies tested ok too.
Can someone please help me out with this? Am I missing anything obvious as a bloody python beginner? Or does anyone already know of a simple tunneling proxy script that does what I need (relaying to rotating proxies)? I also would like to add, once the connection works, a mysql query that randomly queries me a proxy from the database, so it should be customizable.
Or is that the completely wrong approach?
Thank You!
import socket
import socketserver
import select
import itertools
# proxylist
PROXY_LIST = ['xxx.xxx.xxx.220:3128', 'xxx.xxx.xxx.22:3128', 'xxx.xxx.xxx.105:3128']
proxy_pool = itertools.cycle(PROXY_LIST)
class ThreadingTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
pass
class HTTPRequestHandler(socketserver.BaseRequestHandler):
def handle(self):
data = self.request.recv(4096).strip()
proxy = next(proxy_pool)
hostname, port = proxy.split(':')
remote = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
remote.connect(("xxx.xxx.xxx.22", 3128)) # for testing a static proxy, not from above list
remote.send(data)
inputs = [self.request, remote]
while True:
readable, writable, exceptional = select.select(inputs, [], inputs, 60)
if not readable and not writable and not exceptional:
break
for s in readable:
if s is remote:
out = self.request
else:
out = remote
data = s.recv(4096)
if data:
out.send(data)
for s in exceptional:
inputs.remove(s)
s.close()
remote.close()
self.request.close()
if __name__ == "__main__":
with ThreadingTCPServer(('localhost', 3128), HTTPRequestHandler) as server:
server.serve_forever()
wget -v google.de -e use_proxy=yes -e http_proxy=127.0.0.1:3128
It connects to the proxy. But respond timeouts. In the moment I kill the script I get the following message:
Exception occurred during processing of request from ('127.0.0.1', 56714)
Traceback (most recent call last):
File "xxx\Python\Python311\Lib\socketserver.py", line 691, in process_request_thread
self.finish_request(request, client_address)
File "xxx\Python\Python311\Lib\socketserver.py", line 361, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "xxx\Python\Python311\Lib\socketserver.py", line 755, in __init__
self.handle()
File "xxx\Test\test.py", line 36, in handle
data = s.recv(4096)
^^^^^^^^^^^^
I think we can simplify your code a bit by using the selectors
module:
import selectors
import socket
import socketserver
import itertools
# proxylist
PROXY_LIST = ['172.23.0.2:8888', '172.23.0.3:8888', '172.23.0.4:8888']
proxy_pool = itertools.cycle(PROXY_LIST)
class ThreadingTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
pass
class HTTPRequestHandler(socketserver.BaseRequestHandler):
def handle(self):
proxy = next(proxy_pool)
hostname, port = proxy.split(':')
remote = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
remote.connect((hostname, int(port)))
selector = selectors.DefaultSelector()
selector.register(self.request, selectors.EVENT_READ)
selector.register(remote, selectors.EVENT_READ)
with self.request, remote:
while True:
events = selector.select()
for key, _ in events:
infd = key.fileobj
outfd = remote if key.fileobj is self.request else self.request
data = infd.recv(1024)
if not data:
break
outfd.send(data)
if __name__ == "__main__":
with ThreadingTCPServer(('localhost', 3128), HTTPRequestHandler) as server:
server.serve_forever()
Using this code, I can run curl -x localhost:3128 example.com
and it will successfully fetch the remote url, cycling through the list of proxies for each request.