I am trying to transform RSS 2 coming from Wordpress into XHTML 1.0 Strict (using a cronjob and xsltproc); however, Wordpress inserts an img
into the CDATA
at the end of the summary
element. The img
has a border
attribute, which is invalid in XHTML 1.0 Strict. Because it's CDATA, I assume that means I can't match it with my XSLT. I can say for certain that the img
is always the last thing before the CDATA
ends. I'd prefer to strip the border
attr and keep the image, but I'd rather get rid of the element entirely than have invalid markup.
Is it possible to match inside CDATA using XSLT, perhaps using a string expression? If so, is that the right way to go here, or is there a better solution to be had?
Remember what CDATA means: "character data". Putting something in CDATA means: this might look like markup, but I don't want you to treat it as markup. So if that thing inside the CDATA looks like an img
element, the CDATA is there to tell you not to be fooled - it's not an element at all. Having said that, you can of course process the text in the way you process any other character string, including passing it to an XML parser to be turned into a tree of nodes.