I'm trying to use urllib2 over a proxy to scrap a web page that isn't directly available (it's running in the remote server's local network and isn't externally accessible). The proxy I'd prefer is a SSH SOCKS proxy (like you get if you run ssh -D 9090 server
), both because I have access to this and because it's fairly secure.
I've had a poke around with paramiko
but everything I find points to running a SSH connection out over SOCKS, which is the opposite of what I'm actually trying to accomplish here.
I have seen the Transport class but this only does dumb forwarding and doesn't provide a nice OpenSSH-SOCKS proxy interface that I can latch onto with SocksiPy (et al).
Net::SSH::Socks for Ruby is exactly what I'm looking for in the wrong language. Is there anything available in Python that provides a proxy over SSH?
I have a workaround that works for scraping. Instead of trying to use the SSH connection, I'm using the remote shell to pull out the data:
from bs4 import BeautifulSoup
import paramiko
ssh = paramiko.SSHClient()
ssh.load_system_host_keys()
ssh.connect('example.com', username='Oli', look_for_keys=True, timeout=5)
stdin, stdout, stderr = ssh.exec_command('/usr/bin/wget -qO- "%s"' % url)
soup = BeautifulSoup(stdout)
ssh.close()
This isn't what I was looking for to begin with (and I'd still very much like to see if there's a way of connecting a SOCKS socket in over SSH) but there is some elegance in its simplicity.