I'm trying to remove all the comments from an XML export of a Blogger blog. This involves finding and deleting each <entry> that contains a child node of <blogger:type>COMMENT</blogger:type>. I don't want to delete any <entry> that has a child node of <blogger:type>POST</blogger:type> , as those contain the post content.
But, this deletes only the <blogger:type>COMMENT</blogger:type> node, and not the enclosing <entry>:
xmlstarlet edit -d "//blogger:type[text()='COMMENT']" input.xml > output.xml
How can I remove each entire <entry> node that contains <blogger:type>COMMENT</blogger:type> ?
input.xml:
<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'>
<id>tag:blogger.com,1999:blog-18962386</id>
<title>ImportTestv1</title>
// Don't remove this node:
<entry>
<id>tag:blogger.com,1999:blog-18962386.post-468340</id>
<blogger:type>POST</blogger:type>
<blogger:status>LIVE</blogger:status>
<author>
<name>Author</name>
<uri></uri>
<blogger:type>BLOGGER</blogger:type>
</author>
<title></title>
<content type='html'>Test Post 3</content>
<blogger:metaDescription/>
<blogger:created>2025-09-05T14:02:59.354Z</blogger:created>
<published>2025-09-04T23:09:00Z</published>
<updated>2025-09-05T14:02:59.354Z</updated>
<blogger:location/>
<category scheme='tag:blogger.com,1999:blog-17477683' term='News'/>
<blogger:filename>/2025/09/post.html</blogger:filename>
<link/>
<enclosure/>
<blogger:trashed/>
</entry>
// Remove this entire <entry> node that contains <blogger:type>COMMENT</blogger:type>:
<entry>
<id>tag:blogger.com,1999:blog-18962386.post-3978997</id>
<blogger:parent>tag:blogger.com,1999:blog-18962386.post-468340</blogger:parent>
<blogger:inReplyTo/>
<blogger:type>COMMENT</blogger:type>
<blogger:status>LIVE</blogger:status>
<author>
<name>Author</name>
<blogger:type>BLOGGER</blogger:type>
</author>
<content type='html'>Test Comment2 Test Comment2 Test Comment2</content>
<blogger:created>2025-09-14T14:42:53.755Z</blogger:created>
<published>2025-09-14T14:42:53.755Z</published>
<updated>2025-09-14T14:42:53.755Z</updated>
<blogger:trashed/>
</entry>
</feed>
Try the following:
xmlstarlet edit -N "atom=http://www.w3.org/2005/Atom" -d "//atom:entry[blogger:type/text()='COMMENT']" input.xml > output.xml
The above appeared to work for me: it removed the second of the two entry elements from your sample XML document.
The first change I've made is to make the XPath expression evaluate to an entry element, because it is an entry element that we want to delete, rather than just the blogger:type element.
The second change I've made is to map the prefix atom to the Atom namespace that the entry element is within, and used that prefix with the entry element name within the XPath expression. I think I'm right in saying that unprefixed names in XPath are always in the empty namespace even if the source document declares a default namespace (i.e. has an xmlns="..." attribute). The -N option to xmlstarlet edit allows you to map an additional prefix to a namespace URI.