phpgoutte

Goutte extract data from every node


hi i want to extract data from every node but i don't know how to do it and really appreciated if someone give me some guidance

<table>
    <tr>
        <td>item1</td>
        <td>item2</td>
    </tr>
    <tr>
        <td>item3</td>
        <td>item4</td>
    </tr>
</table>

and here it is my php code:

$client = new Client();
    $crawler = $client->request('GET', 'https://www.socom');

    $crawler->filter('.tr')->each(function ($node) {
        print $node->filter('.td')->text()."\n";
    });

Solution

  • You're in the right way, just you're referring to your html tags which have the class tr and as I've seen in your html you have none, so, that's why you don't have "success".

    Check this, you can access to every one of your tr elements and to get the text inside this way:

    $crawler->filter('tr')->each(function($node) {
      print_r($node->text());
    });
    

    Notice the output is a node so you can't use echo, and there I'm using just tr to make a reference to the element.

    And also you can do this, that's more seemed maybe to what you wanted to get:

    $crawler->filter('tr')->each(function($node) {
      $node->filter('td')->each(function($nested_node) {
        echo $nested_node->text() . "\n";
      });
    });
    

    This is get all the tr over every tr get its td and then over those td elements get the text inside.

    And that's it, this is the code.

    <?php
    
    require __DIR__ . '/vendor/autoload.php';
    
    use Goutte\Client;
    
    $client = new Client();
    
    $crawler = $client->request('GET', 'your_url');
    
    $crawler->filter('tr')->each(function($node) {
      print_r($node->text());
    });
    
    $crawler->filter('tr')->each(function($node) {
      $node->filter('td')->each(function($nested_node) {
        echo $nested_node->text() . "\n";
      });
    });
    

    Hope it helps.