phplaraveldomcrawler

Don Crawler , get a Javascript attribute of a div onClick


I want to loop the div - my-node-div , and get the LINK that is a javascript onclick property THAT IS IN MY DIV that I am looping .

I have this structure -

<div id="container">
<div class="my-node-div" onclick="window.location='https://www.website1.com'">
<h1>Title One</h1>
</div>

<div class="my-node-div" onclick="window.location='https://www.website2.com'">
<h1>Title Two</h1>
</div>

<div class="my-node-div" onclick="window.location='https://www.website3.com'">
<h1>Title Three</h1>
</div>
</container>

so I would make something like that -

 $html    = $client->request('GET', $url_of_website);
                $crawler = new Crawler();
                $crawler->filter('div#container > div.my-node-div')->each(
                   function (Crawler $node, $index) use ($refer) {
                      // GET THE TEXT  
                     $H1 =  $node->filter('h1')->text();
                      // HOW COULD i GET THE window.location= WEBSITE ?
                     $LINK = ?
               });
         }

how could I get this javascript link that is in my div ?


Solution

  • To get the attribute from a node, you'll use the extract[1] method on the $node.

    $crawler = new Crawler($html);
    $links = $crawler->filter('div#container > div.my-node-div')
        ->each(function(Crawler $node) {
            return $node->extract(['onclick']);
        });
    

    Now $links will contain an array of whatever's in that nodes onclick attribute.

    array (
      0 => 
      array (
        0 => 'window.location=\'https://www.website1.com\'',
      ),
      1 => 
      array (
        0 => 'window.location=\'https://www.website2.com\'',
      ),
      2 => 
      array (
        0 => 'window.location=\'https://www.website3.com\'',
      ),
    )
    

    Then you'll have to parse out the link from there, maybe try Extract URLs from text in PHP for some ideas.

    1. Accessing Node Values