perlweb-scraping

Perl: Problems with WWW:Mechanize and a form


i am trying to write a script that will navigate through a soccer website to the player of my choice and scrape their info for me. I have the scraping part working by just hard coding an individual player's page in, but trying to implement the navigation is giving me some problems. The website in question is http://www.soccerbase.com.

I have to fill in a form present at the top of the page with the player's name, then submit it for the search. I have tried it two different ways(commenting out one of them) based on info i found online but to no avail. I am an absolute novice when it comes to Perl so any help would be greatly appreciated! Thanks in advance. here is my code:

#!/usr/bin/perl
use strict;

require WWW::Mechanize;
require HTML::TokeParser;

my $player = 'Luis Antonio Valencia';
#die "Must provide a player's name" unless $player ne 1;

my $agent = WWW::Mechanize->new();
$agent->get('http://www.soccerbase.com/players/home.sd');
$agent->form_name('headSearch');
$agent->set_fields('searchTeamField', $player);
$agent->click_button(name=>"Search");

#$agent->submit_form(
#       form_number => 1,
#       fields    => {   => 'Luis Antonio Valencia', }    
#   );

my $stream = HTML::TokeParser->new(\$agent->{content});
my $player_name;

$stream->get_tag("strong");
$player_name = $stream->get_trimmed_text("/strong");

print "\n", "Player Name: ", $player_name, "\n";

Solution

  • It's a bit tricky because the form action plays switcharoo with Javascript, but HTML::Form is able to handle that perfectly fine:

    #!/usr/bin/env perl
    use WWW::Mechanize qw();
    use URI qw();
    
    my $player = 'Luis Antonio Valencia';
    my $agent = WWW::Mechanize->new;
    $agent->get('http://www.soccerbase.com/players/home.sd');
    my $form = $agent->form_id('headSearch');
    {
        my $search_uri = $agent->uri;
        $search_uri->path('/players/search.sd');
        $form->action($search_uri);
        # requires absolute URI
    }
    $agent->submit_form(
        fields => {
            search => $player,
            type => 'player',
        }
    );