I have an HTML file with several tables (all tables have same number of columns and same column names). The tables are separated by other HTML tags.
For each row in each table, I would like to change the value of cell 1 and cell 3.
This what I have so far (thanks to @depesz):
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use open qw( :std :utf8 );
use HTML::TreeBuilder;
my $input_file_name = shift;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file( $input_file_name ) or die "Cannot open or parse $input_file_name\n";
$tree->elementify();
my @tables = $tree->find_by_tag_name( 'table' );
for my $table (@tables) {
foreach my $row ($table->find_by_tag_name('tr')) {
foreach my $column ($table->find_by_tag_name('td')) {
# how do I change the text of first and 3rd column text to "removed"
}
}
}
print $tree->as_HTML();
exit;
It works great for iterating through all the rows in the HTML file. I'm just not sure how to do the last bit of changing the text in columns 1 and 3.
The HTML::TreeBuilder::XPath
module allows much more convenient access to the HTML nodes in the document.
Take a look at this program for example. It seems to do what you need.
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new_from_file('anon.html');
for my $table ($tree->findnodes('//table')) {
my $row = 0;
for my $tr ($table->findnodes('//tr')) {
$row++;
for my $td ($tr->findnodes('td[position() = 1 or position() = 3]')) {
$td->delete_content;
$td->push_content("name$row");
}
}
}
print $tree->as_HTML('<>&', ' ');