I'm working in a Java (Android) project, the main idea is to capture urls and block the access to dangerous pages, so to obtain the urls that the user access I'm using pcap4j
library as follows:
IpV4Packet ipV4Packet = packet.get (IpV4Packet.class);
Inet4Address srcAddr = ipV4Packet.getHeader().GetSrcAddr();
System.out.println(srcAddr);
So if I access the url (https://es.wikipedia.org/wiki/Google) the code will generates the domain name, something like : (wikipedia.org), but what I really need is the main url that generates the request, How can I get the complete URL (https://es.wikipedia.org/wiki/Google)?
It's really difficult I think.
In order to get the URL, you need to see the request line of the HTTP packet. But Pcap4J now doesn't support HTTP, so you need to write packet classes to dissect HTTP packets.
And, HTTP is on TCP, which usually fragments upper layer packets. You need to reassemble HTTP packets before you can dissect them.
And more, if https, the HTTP packets are encrypted and fragmented by TLS layer. In this case, you need to reassemble and decrypt the HTTP packets to get the URL. To decrypt TLS packets, you need the secret key of the HTTP server. But, even if you have the key, you can't decrypt the packets when Diffie-Hellman key exchange algorithm is used in the TLS session.