web-scrapingrakucro

Fetch the content of url get stucks when using Cro or HTTP::UserAgent


I want to get the content of https://translate.google.cn, however, Cro::HTTP::Client and HTTP::UserAgent just stucks, and WWW get the content, I don't know why. If I change the $url to https://raku.org, all three modules works fine:

my $url = "https://translate.google.cn";
use Cro::HTTP::Client;
my $resp = await Cro::HTTP::Client.new(
    headers => [
       User-agent => 'Cro'
   ]
).get($url);
say await $resp.body-text();



use HTTP::UserAgent;
my $ua = HTTP::UserAgent.new;
$ua.timeout = 30;
my $response = $ua.get($url);

if $response.is-success {
    say $response.content;
} else {
    die $response.status-line;
}
)

use WWW;
say get($url)

Did I miss something? Thanks for suggestion for me.


Solution

  • For me HTTP::UserAgent works and Cro::HTTP::Client gets stuck. If you wish to debug things further both modules have a debug option:

    raku -MHTTP::UserAgent -e 'my $ua = HTTP::UserAgent.new(:debug); say $ua.get("https://translate.google.cn").content'

    CRO_TRACE=1 raku -MCro::HTTP::Client -e 'my $ua = Cro::HTTP::Client.new(); say $ua.get("https://translate.google.cn").result.body-text.result'

    WWW also works for me. It is surprising it works for you since it is backed by HTTP::UserAgent ( which does not work for you ). Here is its get method to show you how it uses HTTP::UserAgent:

    sub get ($url, *%headers) is export(:DEFAULT, :extras) {
        CATCH { .fail }
        %headers<User-Agent> //= 'Rakudo WWW';
        with HTTP::UserAgent.new.get: $url, |%headers {
            .is-success or fail .&err;
            .decoded-content
        }
    }