I would like to replace instances of "\n", "\t", and " " (four spaces) in RegEx, but preserve all whitespace inside of an HTML comment block. Unfortunately, the comment can contain anything, including other HTML tags, so I have to match "<!--" and "-->" specifically. Furthermore, there may be multiple instances of comments with whitespace to match in between. I can use multiple RegEx expressions if needed, but I cannot modify the HTML content aside from the replacement.
Here is some sample code to experiment with:
<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>
<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>
<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>
In this instance, all sets of four spaces should be matched except for the ones in each comment (lines 4, 5, 10, 11, 16, 17).
I have already split up my expressions into one for each type of whitespace, and I have been experimenting with spaces. The closest I have gotten is this:
/(?<!<!--.*?(?<!-->.*?)) (?!(?!.*?<!--).*?-->)/gs
which matches instances of tabs not in the first or last comment block, but it does match tabs in the middle comment blocks which is incorrect. However I suspect it could be accomplished by modifying something in the second half:
/ (?!(?!.*?<!--).*?-->)/gs
Any suggestions? Is this even possible?
UPDATE: In this situation I am not trying to match opening tags; rather, I want the whitespace outside of a specific element (and in this case, the comment block does not have the same syntax as other elements anyways). My ultimate goal here was to use it for a heavily customized instance of TinyMCE in which I wanted to prevent whitespace from being clobbered using the protect attribute. This specifically takes a list of regular expressions and does its own replace with its own <!--mce:protected %0A-->
type comment.
After I posted this I then realized that I could just protect the entire comment separately because it would not show up in the editor regardless...
Instead of lookarounds, you could match the comments first and then keep them as is. Then alternatively remove all 4 white spaces.
Such as
/(<!--.*?-->)| /gs
and replace it with $1
.
See the test case
const text = `<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>
<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>
<div>
<p>Sample text!</p>
<!--
<img src="test.jpg" alt="This is an image!" width="500" height="600">
-->
</div>`;
const output = text.replace(/(<!--.*?-->)| /gs, '$1');
console.log(output);