xmlperlxml-twigxml-entities

How can I add entity declarations via XML::Twig programmatically?


For the life of me I cannot understand the XML::Twig documentation for entity handling.

I've got some XML I'm generating with HTML::Tidy. The call is as follows:

my $tidy = HTML::Tidy->new({
    'indent'          => 1,
    'break-before-br' => 1,
    'output-xhtml'    => 0,
    'output-xml'      => 1,
    'char-encoding'   => 'raw',
});

$str = "foo   bar";
$xml = $tidy->clean("<xml>$str</xml>");

which produces:

<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo &nbsp; bar</body>
</html>

XML::Twig (understandably) barfs at the &nbsp;. I want to do some transformations, running it through XML::Twig:

my $twig = XML::Twig->new(
  twig_handlers => {... handlers ...}
);

$twig->parse($xml);

The $twig->parse line barfs on the &nbsp;, but I can't figure out how to add the &nbsp; element programmatically. I tried things like:

my $entity = XML::Twig::Entity->new("nbsp", "&#160;");
$twig->entity_list->add($entity);
$twig->parse($xml);

... but no joy.

Please help =)


Solution

  • use strict;
    use XML::Twig;
    
    my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp "&#160;">]>';
    my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo &nbsp; bar</body></html>';
    
    my $xTwig = XML::Twig->new();
    
    $xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@";
    
    print $xTwig->sprint();