I'm building a tool for analyzing emails to try to determine if they're phishing or not and I'd like to see if any of the links in the email redirect, and if they do how many times and to where. I'm currently using the requests library to handle all of that stuff and in order to get a link's history you have to call .get(). Is this safe to do on potentially malicious URLs, and if not is there any way I can get the redirect information without putting my computer at risk?
You could send a HEAD request with allow_redirects=True
:
>>> url = "http://stackoverflow.com/q/57298432/7954504"
>>> resp = requests.request(
... "HEAD",
... url,
... allow_redirects=True
... )
>>> resp.history
[<Response [301]>, <Response [302]>]
>>> [i.url for i in resp.history]
['http://stackoverflow.com/q/57298432/7954504', 'https://stackoverflow.com/q/57298432/7954504']
Not saying this is a cure-all. Something else to consider is adding some heuristics on the URL itself, in the spirit of "you know a crappy-looking URL when you see one." (I like yarl
for analyzing URLs.) For instance:
...and so on.