I am experimenting with filter_input
and filter_var
and I am currently trying to sanitize URLs with FILTER_SANITIZE_URL
.
The test program gets input from a GET variable which consists of a URL, (ex. foo.com/bar.php?a=http://www.domain.se
). It works fine as long as I don't use swedish domain names. Ex: (foo.com/bar.php?a=http://www.äta.se
) gets sanitized to where a = http://www.ta.se
which obviously isn't the same.
Domains with special characters are technically not transferred with non-ASCII characters (like the ä in your case), they are punycode encoded. The calling program should encode it's URLs accordingly.
See:
http://en.wikipedia.org/wiki/Internationalized_domain_name
http://en.wikipedia.org/wiki/Punycode
Example:
http://www.äta.se is http://www.xn--ta-uia.se