I would like to write a command line script which combines multiple <author>
tags from an atom feed into one. For example, an entry like:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One</name>
</author>
<author>
<name>Author Two</name>
</author>
<author>
<name>Author Three</name>
</author>
</entry>
should become:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One, Author Two, Author Three</name>
</author>
</entry>
I think I could do it myself using Perl and regexes but, as parsing XML with regexes is not a good idea, I would be thankful for a more elegant solution that uses a proper xml-parser.
Ted has the right idea, but a few things were done in a more complicated manner than needed, and they were unaware of the properties of the Atom format (e.g. its use of namespaces).
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs(a => 'http://www.w3.org/2005/Atom');
# See XML::LibXML::Parser for more ways to create the document object.
my $doc = XML::LibXML->load_xml( location => 'atom.xml' );
for my $entry_node ($xpc->findnodes('/a:feed/a:entry', $doc)) {
my @author_names;
for my $author_node ($xpc->findnodes('a:author', $entry_node)) {
push @author_names, $xpc->findvalue('a:name', $author_node);
$author_node->unbindNode();
}
my $author_node = XML::LibXML::Element->new('author');
my $name = $author_node->appendTextChild('name', join(", ", @author_names));
$entry_node->appendChild($author_node);
}
$doc->toFile('atom.new.xml');