pythonphpperlatom-feed

atom feed: script to combine multiple <author> items into one?


I would like to write a command line script which combines multiple <author> tags from an atom feed into one. For example, an entry like:

<entry>
    <id>someid</id>
    <published>somedate</published>
    <title>Title</title>
    <summary>Summary</summary>
    <author>
      <name>Author One</name>
    </author>
    <author>
      <name>Author Two</name>
    </author>
    <author>
      <name>Author Three</name>
    </author>
  </entry>

should become:

<entry>
    <id>someid</id>
    <published>somedate</published>
    <title>Title</title>
    <summary>Summary</summary>
    <author>
      <name>Author One, Author Two, Author Three</name>
    </author>
  </entry>

I think I could do it myself using Perl and regexes but, as parsing XML with regexes is not a good idea, I would be thankful for a more elegant solution that uses a proper xml-parser.


Solution

  • Ted has the right idea, but a few things were done in a more complicated manner than needed, and they were unaware of the properties of the Atom format (e.g. its use of namespaces).

    use XML::LibXML               qw( );
    use XML::LibXML::XPathContext qw( );
    
    my $xpc = XML::LibXML::XPathContext->new();
    $xpc->registerNs(a => 'http://www.w3.org/2005/Atom');
    
    # See XML::LibXML::Parser for more ways to create the document object.
    my $doc = XML::LibXML->load_xml( location => 'atom.xml' );
    
    for my $entry_node ($xpc->findnodes('/a:feed/a:entry', $doc)) {
       my @author_names;
       for my $author_node ($xpc->findnodes('a:author', $entry_node)) {
          push @author_names, $xpc->findvalue('a:name', $author_node);
          $author_node->unbindNode();
       }
    
       my $author_node = XML::LibXML::Element->new('author');
       my $name = $author_node->appendTextChild('name', join(", ", @author_names));
       $entry_node->appendChild($author_node);
    }
    
    $doc->toFile('atom.new.xml');