proxyprivoxy

Privoxy does not work with traffic from iptables


I have privoxy configured and working on port 8118. I can forward HTTP and HTTPS traffic when defining the http_proxy and https_proxy variables to point out to the proxy. Examples:

https_proxy=http://127.0.0.1:8118 curl -vvv https://www.google.com
http_proxy=http://127.0.0.1:8118 curl -vvv http://www.google.com

Note that I still use http:// for the HTTPS proxy. Privoxy somehow forwards the request.

However, I need to forward it transparently, because I am using Node.js and I do not want to change the application code to support the proxy. On Windows this is easily done by Proxifier, but the application is proprietary and does not work on WSL or GNU/Linux. On WSL/Linux I tried to use iptables to forward packages to the privoxy port:

sudo iptables -t nat -N CUSTOM_PROXY

# Ignore LANs and some other reserved addresses.
sudo iptables -t nat -A CUSTOM_PROXY -d 0.0.0.0/8 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 10.0.0.0/8 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 127.0.0.0/8 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 169.254.0.0/16 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 172.16.0.0/12 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 192.168.0.0/16 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 224.0.0.0/4 -j RETURN
sudo iptables -t nat -A CUSTOM_PROXY -d 240.0.0.0/4 -j RETURN

# Everything else is redirected to the privoxy port
sudo iptables -t nat -A CUSTOM_PROXY -p tcp -j REDIRECT --to-ports 8118

# Then I tried to forward the ports I need to the chain:
sudo iptables -t nat -A OUTPUT -p tcp --dport 80 -j CUSTOM_PROXY
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j CUSTOM_PROXY

sudo iptables -t nat -A OUTPUT -p tcp --dport 443 -j CUSTOM_PROXY
sudo iptables -t nat -A PREROUTING -p tcp --dport 443 -j CUSTOM_PROXY

# other ports here...

After activating those rules, the HTTP and HTTPS calls stop working:

shell> curl -vvv http://www.google.com
*   Trying 142.250.74.36:80...
* TCP_NODELAY set
* Connected to www.google.com (142.250.74.36) port 80 (#0)
> GET / HTTP/1.1
> Host: www.google.com
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Invalid header received from client
< Content-Type: text/plain
< Connection: close
< 
Invalid header received from client.
* Closing connection 0

shell> curl -vvv https://www.google.com
*   Trying 142.250.74.36:443...
* TCP_NODELAY set
* Connected to www.google.com (142.250.74.36) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3

The HTTP request does not work and the HTTPS request never ends. Privoxy also supports socks5.

I do not understand how these forwardings happen. Could someone help me to find what I am doing wrong?

A couple of additional comments/questions that might be helpful:

My privoxy setup uses all the default values, except the following forwarding configuration:

forward  /  .
forward-socks5  .something.net  127.0.0.1:12345 .

I do not believe that this privoxy configuration really matters, because anything I use through the proxy variables manyally works. The problem lies between the iptables rules and privoxy.

Any help is appreciated. Thanks in advance!


Solution

  • I'm not entirely familiar with how Privoxy works, however I do know how intercepting proxy works on Linux.
    HTTP proxies and intercepting proxies work in very different ways. An HTTP proxy get the destination from the first line of the query, which would contain the domain name (e.g. it should be GET http://www.google.com/ HTTP/1.1). For HTTPS, it does an HTTP CONNECT request with the domain:port to connect to (e.g. CONNECT www.google.com:443 HTTP/1.1).
    An intercepting proxy get the original destination address from the kernel by doing a getsockopt() with some specific parameters. It has no knowledge of higher level protocol.
    In general redirecting with iptables a request to an HTTP proxy does not work because of theses differences. That said, Privoxy seems to have a configuration option accept-intercepted-requests that you can use so it read the target from the Host: HTTP header. With that configuration, it should be able to handle HTTP requests redirected using iptables. As the documentation says, this is not supported for HTTPS. You will need to use some additional software which can do the forwarding to an HTTP proxy, probably much like Proxifier does. I know moproxy can do that. It ought to not be the only one, but I don't know others.

    Additional questions:

    • Why setting the variables works, and through iptables doesn't?

    because HTTP and transparent/intercepting proxy works in different ways. When the variables are present, curl (and other) alter how they send the query, but with iptables, they don't (as they don't know you use a proxy).

    • Why do I have to set the https_proxy to point to http://...? Is that privoxy-related?

    the http:// in your environment variable is to describe how you connect to the proxy (it could actually be https:// if your proxy had a tls certificate, and it would work with http requests too, though they would be encrypted only from you to the proxy, not from the proxy to the remote server).

    • Regarding the comment "HTTPS is not supposed to be used with transparent proxies": So how does Proxifier work on Windows? If that is not transparent a proxy, is there another term for it? Why can't we have something similar on WSL/Linux?

    I don't know the specifics of how transparent proxies work on Windows. Assuming it works a lot like Linux, Proxifier probably redirect the request to a local port, get the actual destination, wrap the data in a way it's understood by HTTP/SOCKS proxy and send that to the proxy you told it to use. Maybe the capture part is a bit different (creating a new network interface, ask Windows to send the traffic their, reconstruct the stream from raw tcp packets, wrap the data). This is often called transparent proxy as the client doesn't know a proxy is used, some call it an intercepting proxy instead. And we can have something similar, moproxy is an example (probably not the only one)