phpsubdomainwildcard-subdomainhttp-host

PHP HTTP_HOST subdomain extraction given that a subdomain be a wildcard and contain more than one '.'


I'm trying to extract the subdomain from the HTTP_HOST value. However I've stumbled into a problem where if the subdomain has more than one dot in it it fails to match properly. Given that this is a script to run on multiple different domains and it could have an unlimited amount of dots, and the tld could be either 1 or 2 parts (and any length) - is there a practical way of correctly matching the subdomain, domain and tld in all situations?

So for example take the following HTTP_HOST values and what is required to be matched.

I am presuming that the only way to accomplish this would be to load a list of tlds, which allow possible I don't really want to do as this is at the start of a script and should really require heavy lifting like that.

Below is the current code.

define('HOST', isset($_SERVER['HTTP_HOST']) === true ? $_SERVER['HTTP_HOST'] : (isset($_SERVER['SERVER_ADDR']) === true ? $_SERVER['SERVER_ADDR'] : $_SERVER['SERVER_NAME']));
$domain_parts = explode('.', HOST); 
$domain_parts_count = count($domain_parts);
if($domain_parts_count > 1)
{   
    $sub_parts = array_splice($domain_parts, 0, $domain_parts_count-3);
    define('SUBDOMAIN', implode('.', $sub_parts));
    unset($sub_parts);
}
else
{
    define('SUBDOMAIN', '');
}
define('DOMAIN', implode('.', $domain_parts));
var_dump($domain_parts, SUBDOMAIN, DOMAIN);exit;

Just thought could mod_rewrite append the subdomain as a get param?


Solution

  • First of all I would explode(and use the first index in the array) on a slash just to be sure that the string ends with the TLD.

    Then I would cut it with a preg_replace. This rexexp matches the domain+tld regardless of tld type. Beware however this would give a problem with 2&3 letter domains. But it should give a push to the right direction....

    [a-zA-Z0-9]+\.(([a-zA-Z]{2,6})|([a-zA-Z]{2,3}\.[a-zA-Z]{2,3}))$
    

    Edit: as pointed out: .museum is also possible, so edited the first pattern in the TLD part....

    And of course TLD's like .UK could behave differently then co.uk ugh.. it's not that easy...