phpsimplexmlgoogle-weather-api

Simplexml_load_string() fail to parse error


I'm trying to load parse a Google Weather API response (Chinese response).

Here is the API call.

// This code fails with the following error
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN');

( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\weather.php on line 11

Why does loading this response fail?

How do I encode/decode the response so that simplexml loads it properly?

Edit: Here is the code and output.

<?php
$googleData = file_get_contents('http://www.google.com/ig/api?weather=11102&hl=zh-CN');
$xml = simplexml_load_string($googleData);

( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3

( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: t_system data="SI"/>

( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: ^ in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3


Solution

  • The problem here is that SimpleXML doesn't look at the HTTP header to determine the character encoding used in the document and simply assumes it's UTF-8 even though Google's server does advertise it as

    Content-Type: text/xml; charset=GB2312
    

    You can write a function that will take a look at that header using the super-secret magic variable $http_response_header and transform the response accordingly. Something like that:

    function sxe($url)
    {   
        $xml = file_get_contents($url);
        foreach ($http_response_header as $header)
        {   
            if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
            {   
                switch (strtolower($m[1]))
                {   
                    case 'utf-8':
                        // do nothing
                        break;
    
                    case 'iso-8859-1':
                        $xml = utf8_encode($xml);
                        break;
    
                    default:
                        $xml = iconv($m[1], 'utf-8', $xml);
                }
                break;
            }
        }
    
        return simplexml_load_string($xml);
    }