phpopenidlightopenid

OpenID Discovery Methods - Yadis VS HTML


Recently, I've begun writing my own PHP OpenID consumer class in order to better understand openID. As a guide, I've been referencing the [LightOpenID Class][1]. For the most part, I understand the code and how OpenID works. My confusion comes when looking at the author's discover function:

function discover($url)
{
    if(!$url) throw new ErrorException('No identity supplied.');
    # We save the original url in case of Yadis discovery failure.
    # It can happen when we'll be lead to an XRDS document
    # which does not have any OpenID2 services.
    $originalUrl = $url;

    # A flag to disable yadis discovery in case of failure in headers.
    $yadis = true;

    # We'll jump a maximum of 5 times, to avoid endless redirections.
    for($i = 0; $i < 5; $i ++) {
        if($yadis) {
            $headers = explode("\n",$this->request($url, 'HEAD'));

            $next = false;
            foreach($headers as $header) {
                if(preg_match('#X-XRDS-Location\s*:\s*(.*)#', $header, $m)) {
                    $url = $this->build_url(parse_url($url), parse_url(trim($m[1])));
                    $next = true;
                }

                if(preg_match('#Content-Type\s*:\s*application/xrds\+xml#i', $header)) {
                    # Found an XRDS document, now let's find the server, and optionally delegate.
                    $content = $this->request($url, 'GET');

                    # OpenID 2
                    # We ignore it for MyOpenID, as it breaks sreg if using OpenID 2.0
                    $ns = preg_quote('http://specs.openid.net/auth/2.0/');
                    if (preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'(.*?)\s*</Type>(.*)</Service>#s', $content, $m)
                        && !preg_match('/myopenid\.com/i', $this->identity)) {
                        $content = $m[1] . $m[3];
                        if($m[2] == 'server') $this->identifier_select = true;

                        $content = preg_match('#<URI>(.*)</URI>#', $content, $server);
                        $content = preg_match('#<LocalID>(.*)</LocalID>#', $content, $delegate);
                        if(empty($server)) {
                            return false;
                        }
                        # Does the server advertise support for either AX or SREG?
                        $this->ax   = preg_match('#<Type>http://openid.net/srv/ax/1.0</Type>#', $content);
                        $this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);

                        $server = $server[1];
                        if(isset($delegate[1])) $this->identity = $delegate[1];
                        $this->version = 2;

                        $this->server = $server;
                        return $server;
                    }

                    # OpenID 1.1
                    $ns = preg_quote('http://openid.net/signon/1.1');
                    if(preg_match('#<Service.*?>(.*)<Type>\s*'.$ns.'\s*</Type>(.*)</Service>#s', $content, $m)) {
                        $content = $m[1] . $m[2];

                        $content = preg_match('#<URI>(.*)</URI>#', $content, $server);
                        $content = preg_match('#<.*?Delegate>(.*)</.*?Delegate>#', $content, $delegate);
                        if(empty($server)) {
                            return false;
                        }
                        # AX can be used only with OpenID 2.0, so checking only SREG
                        $this->sreg = preg_match('#<Type>http://openid.net/sreg/1.0</Type>#', $content);

                        $server = $server[1];
                        if(isset($delegate[1])) $this->identity = $delegate[1];
                        $this->version = 1;

                        $this->server = $server;
                        return $server;
                    }

                    $next = true;
                    $yadis = false;
                    $url = $originalUrl;
                    $content = null;
                    break;
                }
            }
            if($next) continue;

            # There are no relevant information in headers, so we search the body.
            $content = $this->request($url, 'GET');
            if($location = $this->htmlTag($content, 'meta', 'http-equiv', 'X-XRDS-Location', 'value')) {
                $url = $this->build_url(parse_url($url), parse_url($location));
                continue;
            }
        }

        if(!$content) $content = $this->request($url, 'GET');

        # At this point, the YADIS Discovery has failed, so we'll switch
        # to openid2 HTML discovery, then fallback to openid 1.1 discovery.
        $server   = $this->htmlTag($content, 'link', 'rel', 'openid2.provider', 'href');
        $delegate = $this->htmlTag($content, 'link', 'rel', 'openid2.local_id', 'href');
        $this->version = 2;

        # Another hack for myopenid.com...
        if(preg_match('/myopenid\.com/i', $server)) {
            $server = null;
        }

        if(!$server) {
            # The same with openid 1.1
            $server   = $this->htmlTag($content, 'link', 'rel', 'openid.server', 'href');
            $delegate = $this->htmlTag($content, 'link', 'rel', 'openid.delegate', 'href');
            $this->version = 1;
        }

        if($server) {
            # We found an OpenID2 OP Endpoint
            if($delegate) {
                # We have also found an OP-Local ID.
                $this->identity = $delegate;
            }
            $this->server = $server;
            return $server;
        }

        throw new ErrorException('No servers found!');
    }
    throw new ErrorException('Endless redirection!');
}


  [1]: http://gitorious.org/lightopenid

Okay, Here's the logic as I understand it (basically):

  1. Check to see if the $url sends you a valid XRDS file that you then parse to figure out the OpenID provider's endpoint.
    • From my understanding, this is called the Yadis authentication method.
  2. If no XRDS file is found, Check the body of the response for an HTML <link> tag that contains the url of the endpoint.

What. The. Heck.

I mean seriously? Essentially screen scrape the response and hope you find a link with the appropriate attribute value?

Now, don't get me wrong, this class works like a charm and it's awesome. I'm just failing to grok the two separate methods used to discover the endpoint: XRDS (yadis) and HTML.

My Questions

  1. Are those the only two methods used in the discovery process?
  2. Is one only used in version 1.1 of OpenID and the other in version 2?
  3. Is it critical to support both methods?
  4. The site I've encountered the HTML method on is Yahoo. Are they nuts?

Thanks again for your time folks. I apologize if I sound a little flabbergasted, but I was genuinely stunned at the methodology once I began to understand what measures were being taken to find the endPoint.


Solution

  • Specification is your friend.

    But answering your question:

    1. Yes. Those are the only two methods defined by the OpenID specifications (at least, for URLs -- there is a third method for XRIs).
    2. No, both can be used with both version of the protocol. Read the function carefully, and you'll see that it supports both methods for both versions.
    3. If you want your library to work with every provider and user, you'd better do. Some users paste the HTML tags into their sites, so their site's url can be used as an openid.
    4. Some providers even use both methods at once, to mantain compatibility with consumers not implementing YADIS discovery (which isn't part of OpenID 1.1, but can be used with it). So that does make sense.

    And yes, HTML discovery is about searching for a <link> in the response body. That's why it's called HTML discovery.