laravelweb-scrapinggoutteguzzle

How to force my app to use Goutte instead of Symfony?


I'm trying to scape a webpage using Laravel, Goutte, and Guzzle. I'm trying to pass an instance of guzzle into Goutte but my web server keeps trying to use Symfony\Contracts\HttpClient\HttpClientInterfac. Here's the exact error I'm getting:

Argument 1 passed to Symfony\Component\BrowserKit\HttpBrowser::__construct() must be an instance of Symfony\Contracts\HttpClient\HttpClientInterface or null, instance of GuzzleHttp\Client given, called in /opt/bitnami/apache/htdocs/app/Http/Controllers/ScrapeController.php on line 52

Where line 52 is referring to this line: $goutteClient = new Client($guzzleclient);

Here's my class. How can I force it to use Goutte instead of Symfony?

Changing the line to this: $goutteClient = new \Goutte\Client($guzzleclient); does not fix it.

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Goutte\Client;
use GuzzleHttp\Cookie;
use GuzzleHttp\Client as GuzzleClient;

class ScrapeController extends Controller
{
    public function index()
    {
        return view(‘index’);
    }
    public function scrape() {
        $url = ‘www.domain.com;
        $domain = ‘www.domain.com’;


        $cookieJar = new \GuzzleHttp\Cookie\CookieJar(true);

        // get the cookie from www.domain.com
        $cookieJar->setCookie(new \GuzzleHttp\Cookie\SetCookie([
            'Domain'  => “www.domain.com”,
            'Name'    => ‘_name_session',
            'Value'   => ‘value’,
            'Discard' => true
        ]));
        $guzzleClient = new \GuzzleHttp\Client([
            'timeout' => 900,
            'verify' => false,
            'cookies' => $cookieJar
        ]);
        $goutteClient = new Client($guzzleClient);

        $crawler = $goutteClient->request('GET', $url);
        $crawler->filter('table')->filter('tr')->each(function ($node) {
            dump($node->text());
        });
    }
}

Solution

  • Here's a fun little observation, Gouette\Client is now simply a thin extension of Symfony\Component\BrowserKit\HttpBrowser, so based on that you can modify your scrape function to be something like:

    use Symfony\Component\BrowserKit\Cookie;
    use Symfony\Component\BrowserKit\CookieJar;
    use Symfony\Component\BrowserKit\HttpBrowser;
    use Symfony\Component\HttpClient\HttpClient;
    
    ...
    
    public function scrape() {
      $url = 'http://www.example.com/';
      $domain = 'www.example.com';
    
      $jar = new CookieJar();
      $jar->set(new Cookie('_name_session', 'value', null, null, $domain));
      $client = HttpClient::create([
        'timeout' => 900,
        'verify_peer' => false
      ]);
      $browser = new HttpBrowser($client, null, $jar);
    
      $crawler = $browser->request('GET', $url);
      $crawler->filter('div')->filter('h1')->each(function ($node) {
        dump($node->text());
      });
    }
    

    In your composer.json you'll need to have requires similar to the following:

    "symfony/browser-kit": "^5.3",
    "symfony/css-selector": "^5.3",
    "symfony/http-client": "^5.3"
    

    but fabpot/goutte required all them anyway, so there won't be any libraries downloaded in addition to what you already have.