javascriptregexsplit

How can I split a string in characters or short html elements in javascript


I would like to split a string containing HTML snippets in characters and the elements.

let str = 'wel<span class="t">come</span> to all'
console.log("split attempt " +JSON.stringify(str.split(/(<.*?>)|/)));

giving me:

split attempt ["w",null,"e",null,"l","<span class=\"t\">","c",null,"o",null,"m",null,"e","</span>"," ",null,"t",null,"o",null," ",null,"a",null,"l",null,"l"]

By filtering out the null, I get what I want:

split attempt ["w","e","l","<span class=\"t\">","c","o","m","e","</span>"," ","t","o"," ","a","l","l"]

But is there some way in the regular expression to filter out specific sequences (like short HTML tags) and split the rest in the vanilla character by character way?


Solution

  • It seems you want to split into individual characters, except for <...> tags, which you want to treat as atomic.

    To avoid the empty strings and nulls, I'd suggest using match instead of split:

    const str = 'wel<span class="t">come</span> to all';
    console.log(str.match(/<.*?>|./sg));
     

    Disclaimer: for any more complex HTML parsing you should use an HTML parser. Think of HTML comments, CDATA blocks, <script> tags with code that has tags in string literals, etc.