htmlperlparsingurlwww-mechanize

How can I extract URL tags and link text from HTML in Perl?


I have a page which contains this:

<a href="http://www.trial.com" title="yellow">Trial</a>
<a href="http://www.trial1.com" title="red">Trial2</a>

How can I get the anchor text, URL and title?

I want to have this output:

Trial, http://www.trial.com, yellow
Trial2, http://www.trial1.com, red

I have tried to use WWW::Mechanize as explained also here, but I do not know how to get the title in this way. Do you have any ideas?


Solution

  • The simple version, based on your question

    This might be what you are looking for:

    use strict;
    use warnings;
    
    use WWW::Mechanize;
    
    my $mech = WWW::Mechanize->new;
    $mech->get('file:page.html');
    
    foreach my $link ($mech->links) {
        my $text  = $link->text;
        my $url   = $link->url;
        my $title = $link->attrs->{title};
    
        print "$text, $url, $title\n"
    }
    

    Happy coding, TIMTOWTDI