phpassociative-arraystring-parsing

Parsing URL query string


I'm trying to extract data from anchor urls of a webpage i.e. :

require 'simple_html_dom.php';
$html = file_get_html('http://www.example.com');
foreach($html->find('a') as $element) 
{
    $href = $element->href;
    $name = $surname = $id = 0;     
    parse_str($href);
    echo $name;
}

Now, the problem with this is that it doesn't work for some reason. All urls are in the following form:

name=James&surname=Smith&id=2311245

Now, the strange thing is, if I execute

echo $href;

I get the desired output. However, that string won't parse for some reason and also has a length of 43 according to strlen() function. If, however, I pass 'name=James&surname=Smith&id=2311245' as the parse_str() function argument, it works just fine. What could be the problem?


Solution

  • I'm gonna take a guess that your target page is actually one of the rare pages that correctly encodes & in its links. Example:

    <a href="somepage.php?name=James&amp;surname=Smith&amp;id=3211245">
    

    To parse this string, you first need to unescape the &amp;s. You can do this with a simple str_replace if you like.