I would like to split a string containing HTML snippets in characters and the elements.
let str = 'wel<span class="t">come</span> to all'
console.log("split attempt " +JSON.stringify(str.split(/(<.*?>)|/)));
giving me:
split attempt ["w",null,"e",null,"l","<span class=\"t\">","c",null,"o",null,"m",null,"e","</span>"," ",null,"t",null,"o",null," ",null,"a",null,"l",null,"l"]
By filtering out the null, I get what I want:
split attempt ["w","e","l","<span class=\"t\">","c","o","m","e","</span>"," ","t","o"," ","a","l","l"]
But is there some way in the regular expression to filter out specific sequences (like short HTML tags) and split the rest in the vanilla character by character way?
It seems you want to split into individual characters, except for <...>
tags, which you want to treat as atomic.
To avoid the empty strings and null
s, I'd suggest using match
instead of split
:
const str = 'wel<span class="t">come</span> to all';
console.log(str.match(/<.*?>|./sg));
Disclaimer: for any more complex HTML parsing you should use an HTML parser. Think of HTML comments, CDATA blocks, <script>
tags with code that has tags in string literals, etc.