linq-to-xmlopenxml-powertools

Self closing tags in powertools


in open OpenXmlPowerTools self closing tags are implemented for all elements when empty. E.g.

<div />

or

 <td />

This is not valid html. How to prevent this. Or have I to regex me through the result to replace them :-(

But it seems, this is not a problem of the powertools. linq to xml does this:

var tc = new XElement(W.tc);

results in

<tc xmlns="http://schemas.openxmlformats.org/wordprocessingml/2006/main" />

So is there a way to prevent this?


Solution

  • In case somebody else has this problem, here is what I will use: With the help of ChatGpt (with lot of tries) a regex is now my solution: You can call the function e.g. with

    string fixedHtml = FixSelfClosingTags(htmlContent,"div|a|td");
    

    just pipe seperated tags

    public static string FixSelfClosingTags(string html, string tags)    {
            
            //({tags}): Captures the specified tags (e.g., div, a, td)
            //(\s[^>]*)?: Captures the space and attributes if they exist, ensuring that the space is only added if attributes are present. 
            // If there are no attributes, this group is optional, avoiding extra spaces.
            
            string pattern = $@"<({tags})(\s[^>]*)?\/>";
            
            //string.IsNullOrWhiteSpace(attributes): 
            // This condition checks if there are no attributes or just whitespace. If true, it converts the tag into <tag></tag> without any space.
            // Otherwise: The tag is converted with attributes, preserving the space.
            // 
            
            string fixedHtml = Regex.Replace(html, pattern, m =>
            {
                string tagName = m.Groups[1].Value;
                string attributes = m.Groups[2].Value;
               
                return string.IsNullOrWhiteSpace(attributes)
                    ? $"<{tagName}></{tagName}>" 
                    : $"<{tagName}{attributes}></{tagName}>"; 
            });
    
        return fixedHtml;
    }