Recently, I've begun writing my own PHP OpenID consumer class in order to better understand openID. As a guide, I've been referencing the [LightOpenID Class][1]. For the most part, I understand the code and how OpenID works. My confusion comes when looking at the author's discover
function:
function discover($url)
{
if(!$url) throw new ErrorException('No identity supplied.');
# We save the original url in case of Yadis discovery failure.
# It can happen when we'll be lead to an XRDS document
# which does not have any OpenID2 services.
$originalUrl = $url;
# A flag to disable yadis discovery in case of failure in headers.
$yadis = true;
# We'll jump a maximum of 5 times, to avoid endless redirections.
for($i = 0; $i < 5; $i ++) {
if($yadis) {
$headers = explode("\n",$this->request($url, 'HEAD'));
$next = false;
foreach($headers as $header) {
if(preg_match('#X-XRDS-Location\s*:\s*(.*)#', $header, $m)) {
$url = $this->build_url(parse_url($url), parse_url(trim($m[1])));
$next = true;
}
if(preg_match('#Content-Type\s*:\s*application/xrds\+xml#i', $header)) {
# Found an XRDS document, now let's find the server, and optionally delegate.
$content = $this->request($url, 'GET');
# OpenID 2
# We ignore it for MyOpenID, as it breaks sreg if using OpenID 2.0
$ns = preg_quote('http://specs.openid.net/auth/2.0/');
if (preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'(.*?)\s*</Type>(.*)</Service>#s', $content, $m)
&& !preg_match('/myopenid\.com/i', $this->identity)) {
$content = $m[1] . $m[3];
if($m[2] == 'server') $this->identifier_select = true;
$content = preg_match('#<URI>(.*)</URI>#', $content, $server);
$content = preg_match('#<LocalID>(.*)</LocalID>#', $content, $delegate);
if(empty($server)) {
return false;
}
# Does the server advertise support for either AX or SREG?
$this->ax = preg_match('#<Type>http://openid.net/srv/ax/1.0</Type>#', $content);
$this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);
$server = $server[1];
if(isset($delegate[1])) $this->identity = $delegate[1];
$this->version = 2;
$this->server = $server;
return $server;
}
# OpenID 1.1
$ns = preg_quote('http://openid.net/signon/1.1');
if(preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'\s*</Type>(.*)</Service>#s', $content, $m)) {
$content = $m[1] . $m[2];
$content = preg_match('#<URI>(.*)</URI>#', $content, $server);
$content = preg_match('#<.*?Delegate>(.*)</.*?Delegate>#', $content, $delegate);
if(empty($server)) {
return false;
}
# AX can be used only with OpenID 2.0, so checking only SREG
$this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);
$server = $server[1];
if(isset($delegate[1])) $this->identity = $delegate[1];
$this->version = 1;
$this->server = $server;
return $server;
}
$next = true;
$yadis = false;
$url = $originalUrl;
$content = null;
break;
}
}
if($next) continue;
# There are no relevant information in headers, so we search the body.
$content = $this->request($url, 'GET');
if($location = $this->htmlTag($content, 'meta', 'http-equiv', 'X-XRDS-Location', 'value')) {
$url = $this->build_url(parse_url($url), parse_url($location));
continue;
}
}
if(!$content) $content = $this->request($url, 'GET');
# At this point, the YADIS Discovery has failed, so we'll switch
# to openid2 HTML discovery, then fallback to openid 1.1 discovery.
$server = $this->htmlTag($content, 'link', 'rel', 'openid2.provider', 'href');
$delegate = $this->htmlTag($content, 'link', 'rel', 'openid2.local_id', 'href');
$this->version = 2;
# Another hack for myopenid.com...
if(preg_match('/myopenid\.com/i', $server)) {
$server = null;
}
if(!$server) {
# The same with openid 1.1
$server = $this->htmlTag($content, 'link', 'rel', 'openid.server', 'href');
$delegate = $this->htmlTag($content, 'link', 'rel', 'openid.delegate', 'href');
$this->version = 1;
}
if($server) {
# We found an OpenID2 OP Endpoint
if($delegate) {
# We have also found an OP-Local ID.
$this->identity = $delegate;
}
$this->server = $server;
return $server;
}
throw new ErrorException('No servers found!');
}
throw new ErrorException('Endless redirection!');
}
[1]: http://gitorious.org/lightopenid
Okay, Here's the logic as I understand it (basically):
$url
sends you a valid XRDS file that you then parse to figure out the OpenID provider's endpoint.
What. The. Heck.
I mean seriously? Essentially screen scrape the response and hope you find a link with the appropriate attribute value?
Now, don't get me wrong, this class works like a charm and it's awesome. I'm just failing to grok the two separate methods used to discover the endpoint: XRDS (yadis) and HTML.
My Questions
Thanks again for your time folks. I apologize if I sound a little flabbergasted, but I was genuinely stunned at the methodology once I began to understand what measures were being taken to find the endPoint.
Specification is your friend.
But answering your question:
And yes, HTML discovery is about searching for a <link>
in the response body. That's why it's called HTML discovery.