in open OpenXmlPowerTools self closing tags are implemented for all elements when empty. E.g.
<div />
or
<td />
This is not valid html. How to prevent this. Or have I to regex me through the result to replace them :-(
But it seems, this is not a problem of the powertools. linq to xml does this:
var tc = new XElement(W.tc);
results in
<tc xmlns="http://schemas.openxmlformats.org/wordprocessingml/2006/main" />
So is there a way to prevent this?
In case somebody else has this problem, here is what I will use: With the help of ChatGpt (with lot of tries) a regex is now my solution: You can call the function e.g. with
string fixedHtml = FixSelfClosingTags(htmlContent,"div|a|td");
just pipe seperated tags
public static string FixSelfClosingTags(string html, string tags) {
//({tags}): Captures the specified tags (e.g., div, a, td)
//(\s[^>]*)?: Captures the space and attributes if they exist, ensuring that the space is only added if attributes are present.
// If there are no attributes, this group is optional, avoiding extra spaces.
string pattern = $@"<({tags})(\s[^>]*)?\/>";
//string.IsNullOrWhiteSpace(attributes):
// This condition checks if there are no attributes or just whitespace. If true, it converts the tag into <tag></tag> without any space.
// Otherwise: The tag is converted with attributes, preserving the space.
//
string fixedHtml = Regex.Replace(html, pattern, m =>
{
string tagName = m.Groups[1].Value;
string attributes = m.Groups[2].Value;
return string.IsNullOrWhiteSpace(attributes)
? $"<{tagName}></{tagName}>"
: $"<{tagName}{attributes}></{tagName}>";
});
return fixedHtml;
}