[SOLVED] How can I extract URL tags and link text from HTML in Perl?

How can I extract URL tags and link text from HTML in Perl?

I have a page which contains this:

<a href="http://www.trial.com" title="yellow">Trial</a>
<a href="http://www.trial1.com" title="red">Trial2</a>

How can I get the anchor text, URL and title?

I want to have this output:

Trial, http://www.trial.com, yellow
Trial2, http://www.trial1.com, red

I have tried to use WWW::Mechanize as explained also here, but I do not know how to get the title in this way. Do you have any ideas?

Solution

The simple version, based on your question

a page that looks like yours (so no obscure html that can mess up)
te desired output

This might be what you are looking for:

use strict;
use warnings;

use WWW::Mechanize;

my $mech = WWW::Mechanize->new;
$mech->get('file:page.html');

foreach my $link ($mech->links) {
    my $text  = $link->text;
    my $url   = $link->url;
    my $title = $link->attrs->{title};

    print "$text, $url, $title\n"
}

Happy coding, TIMTOWTDI