xmlxsltxml-parsingxml-namespacessaxon

Remove an XML node with Saxon using an XSLT 3.0 stylesheet


What's a simple example of removing a node from an XML file using Saxon 12.9 and an XSLT 3.0 stylesheet?

I've got an XML export from Blogger, and I'm puzzled on how to remove just the COMMENT entries but retain the POST entries.

Below is the input.xml file, which contains COMMENT entries I want to remove, but also POST entries that I want to retain in output.xml:

input.xml:

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'>
  <id>tag:blogger.com,1999:blog-17477</id>
  <title>Test Blog</title>

  <entry>
    <id>tag:blogger.com,1999:blog-17477.post-3947073770</id>
    <blogger:parent>tag:blogger.com,1999:blog-17477.post-23573</blogger:parent>
    <blogger:inReplyTo/>
    <blogger:type>COMMENT</blogger:type>
    <blogger:status>LIVE</blogger:status>
    <author>
      <name>Name Name</name>
      <blogger:type>ANONYMOUS</blogger:type>
    </author>
    <content type='html'>A comment.....</content>
    <blogger:created>2024-06-10T10:32:13.389Z</blogger:created>
    <published>2024-06-10T10:32:13.389Z</published>
    <updated>2024-06-10T10:32:13.389Z</updated>
    <blogger:trashed/>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-17477.post-670855911</id>
    <blogger:type>POST</blogger:type>
    <blogger:status>LIVE</blogger:status>
    <author>
      <name>Author</name>
      <uri></uri>
      <blogger:type>BLOGGER</blogger:type>
    </author>
    <title>Title Title</title>
    <content type='html'>Content Content Content Content Content</content>
    <blogger:metaDescription/>
    <blogger:created>2011-01-05T16:33:59.731Z</blogger:created>
    <published>2011-01-06T12:32:00.001Z</published>
    <updated>2011-01-06T12:32:00.138Z</updated>
    <blogger:location/>
    <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/>
    <blogger:filename>/2011/01/post.html</blogger:filename>
    <link/>
    <enclosure/>
    <blogger:trashed/>
  </entry>

  <entry>
    <id>tag:blogger.com,1999:blog-17477.post-4539665487</id>
  <blogger:parent>tag:blogger.com,1999:blog-17477.post-8659501057</blogger:parent>
    <blogger:inReplyTo/>
    <blogger:type>COMMENT</blogger:type>
    <blogger:status>LIVE</blogger:status>
    <author>
      <name>Author 2</name>
      <blogger:type>BLOGGER</blogger:type>
    </author>
    <content type='html'>My comment</content>
    <blogger:created>2009-11-30T20:09:49.055Z</blogger:created>
    <published>2009-11-30T20:09:49.055Z</published>
    <updated>2009-11-30T20:09:49.055Z</updated>
    <blogger:trashed/>
  </entry>
</feed>

Desired output.xml:

<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:blogger='http://schemas.google.com/blogger/2018'>
  <id>tag:blogger.com,1999:blog-17477</id>
  <title>Test Blog</title>

  <entry>
    <id>tag:blogger.com,1999:blog-17477.post-670855911</id>
    <blogger:type>POST</blogger:type>
    <blogger:status>LIVE</blogger:status>
    <author>
      <name>Author</name>
      <uri></uri>
      <blogger:type>BLOGGER</blogger:type>
    </author>
    <title>Title Title</title>
    <content type='html'>Content Content Content Content Content</content>
    <blogger:metaDescription/>
    <blogger:created>2011-01-05T16:33:59.731Z</blogger:created>
    <published>2011-01-06T12:32:00.001Z</published>
    <updated>2011-01-06T12:32:00.138Z</updated>
    <blogger:location/>
    <category scheme='tag:blogger.com,1999:blog-17477683' term='News'/>
    <blogger:filename>/2011/01/post.html</blogger:filename>
    <link/>
    <enclosure/>
    <blogger:trashed/>
  </entry>
</feed>

Below is the skeleton of a stylesheet.xsl, borrowed from Martin Honnen's answer to an earlier question of mine Use xmlstarlet and XPath to find/replace HTML entities in an XML node

sample stylesheet.xsl:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xpath-default-namespace="http://www.w3.org/2005/Atom"
  exclude-result-prefixes="#all"
  expand-text="yes">

// how to remove only COMMENT nodes and leave POST nodes?

</xsl:stylesheet>

How do I designate only the COMMENT nodes to be removed in stylesheet.xsl?


Solution

  • You can use

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xpath-default-namespace="http://www.w3.org/2005/Atom"
      xmlns:blogger='http://schemas.google.com/blogger/2018'
      exclude-result-prefixes="#all">
    
    <xsl:mode on-no-match="shallow-copy"/>
    
    <xsl:template match="entry[blogger:type = 'COMMENT']"/>
    
    </xsl:stylesheet>
    

    might want to add <xsl:output indent="yes"/><xsl:strip-space elements="*"/> as children of xsl:stylesheet, to avoid the identity shallow-copy leaving you with empty lines between elements you probably don't want.

    Example online fiddle.