I'm trying to extract values of all attributes in an HTML snippet that match font-family:""
pattern
Example input:
`<body lang=EN-ZA style='tab-interval:36.0pt;word-wrap:break-word'>
<!--StartFragment--><span style='font-size:14.0pt;line-height:107%;
font-family:"Comic Sans MS";:"Times New Roman";
:"Times New Roman";mso-font-kerning:0pt;mso-ligatures:none;
mso-ansi-language:EN-ZA;mso-fareast-language:EN-ZA;mso-bidi-language:AR-SA'>
Test Font 1
</span><span
style='font-size:14.0pt;line-height:107%;font-family:"Boucherie Block";
:"Times New Roman";:"Times New Roman";
mso-font-kerning:0pt;mso-ligatures:none;mso-ansi-language:EN-ZA;mso-fareast-language:
EN-ZA;mso-bidi-language:AR-SA'>Test Font 2 </span><!--EndFragment-->
</body>`
Required output:
Comic Sans MS
Boucherie Block
I've tried using the following regex:
var tmpStr = targetText.match('font-family:"(.*)";');
But this includes the font after the semicolon (Times New Roman) which I'm not interested in and it doesn't contain the other font family, which is supposed to be Boucherie Block. Any tips would be appreciated,if there's another way to get the required output without using regex I'm open to that,the main thing is to get both fonts out of the string
I would use [^"]
and regex instead of a string
const targetText = `<body lang=EN-ZA style='tab-interval:36.0pt;word-wrap:break-word'>
<!--StartFragment--><span style='font-size:14.0pt;line-height:107%;
font-family:"Comic Sans MS";:"Times New Roman";
:"Times New Roman";mso-font-kerning:0pt;mso-ligatures:none;
mso-ansi-language:EN-ZA;mso-fareast-language:EN-ZA;mso-bidi-language:AR-SA'>
Test Font 1
</span><span
style='font-size:14.0pt;line-height:107%;font-family:"Boucherie Block";
:"Times New Roman";:"Times New Roman";
mso-font-kerning:0pt;mso-ligatures:none;mso-ansi-language:EN-ZA;mso-fareast-language:
EN-ZA;mso-bidi-language:AR-SA'>Test Font 2 </span><!--EndFragment-->
</body>`
var tmpStr = targetText.matchAll(/font-family:"([^"]*)";/g);
console.log([...tmpStr].map(e => e[1]))