phphtmlweb-scrapingsimple-html-dom

Can't get all link in a div


I'm trying to get all link from this page. Actually there I almost achieve this result with this code:

public function getLinks()
{
    $html = file_get_html("http://it.soccerway.com/national/italy/serie-a/20152016/regular-season/r31554/");

    foreach($html->find("div.block_competition_left_tree-wrapper") as $div)
    {
        foreach ($div->find('a') as $li)
        {
            echo $li->href . "<br>";
        }
    }

}

this is the result:

/national/italy/serie-a/c13/
/national/italy/serie-a/20152016/s11663/
/national/italy/serie-b/c14/
/national/italy/serie-c1/c53/
/national/italy/serie-c2/c358/
/national/italy/serie-d/c659/
/national/italy/coppa-italia/c135/
/national/italy/super-cup/c171/
/national/italy/coppa-italia-serie-c/c684/
/national/italy/campionato-nazionale-primavera/c952/
/national/italy/coppa-italia-primavera/c1070/
/national/italy/super-coppa-primavera/c1171/
/national/italy/dante-berretti/c1092/
/national/italy/serie-a-women/c293/
/national/italy/serie-a2/c457/
/national/italy/coppa-italia-women/c852/
/national/italy/super-cup-women/c851/
/national/italy/club-friendlies/

the problem is that I need to scrape only the link in the list <li>, how you can see in the html there is different classes expanded | odd | even. Essentially I don't want get the link of the element displayed as Serie A - Serie B, etc... but the link inside it. In particular something like this should be the result:

/national/italy/serie-a/20152016/s11663/
/national/italy/serie-b/20152016/regular-season/r31798/
/national/italy/serie-c1/20152016/girone-c/r31861/

now if you see in the first result above there is only /national/italy/serie-a/20152016/s11663/ correct in my final example, this is 'cause in the html page the Serie A item have the class expanded and the code see the link. How can I fix my code to achieve this?


Solution

  • I hope, I have understood you as well. You need to get all links as you did, then open every link to get all links of the class.

    An example:

    public function getLinks()
    {
        $html = file_get_html("http://it.soccerway.com/national/italy/serie-a/20152016/regular-season/r31554/");
    
        foreach($html->find("div.block_competition_left_tree-wrapper") as $div)
        {
    
            //get all links
            foreach ($div->find('a') as $li)
            {
                $openLink = file_get_html("http://it.soccerway.com/".$li->href);
    
                foreach($openLink->find("div.block_competition_left_tree-wrapper") as $divOfNewLink){
    
                    foreach ($divOfNewLink->find('li') as $liOfNewDiv){
    
                            if (preg_match("/expanded/i", $liOfNewDiv->class)) {
    
                                foreach ($liOfNewDiv->find('a') as $link)
                                {
                                    echo $link->href . "<br>";
                                }
    
                            }else{
                                 // do nothing
                                }
    
    
                    }
                }
    
            }
    
    
      }
    }