My web-extension fails to initiate file download for filenames having a pair of emojis with invalid filename error, this seems to be some unicode surrogate pair
issue when multiple emojis are used. Here is the offending filename example:
<a href="https://www.example.com/filestream.xyz"
download="The New World Order Presentation π¨βπΎπ³π±.pdf"
target="_blank">Download File</a>
As evident from the 'Chrome devtools DOM elements' screenshot below the farmer emoji (https://emojipedia.org/man-farmer/) seem's to be a combination of multiple code-points and is the reason causing the filename to be invalid. When the code is pasted here as above the emoji's are correctly parsed as farmer and flag but when we see it in Dev-tools DOM they are different. Inspecting the filename shared above in devtools displays the issue.
The code which pushes the download:
function notifyExtension(e) {
var elem = e.currentTarget;
var fileSaveName = elem.getAttribute("download");
e.returnValue = false;
if (e.preventDefault) {
e.preventDefault();
}
var loop = elem.getAttribute("loop");
if (loop) {
chrome.runtime.sendMessage({
url: elem.getAttribute("href"),
filename: fileSaveName,
});
}
return false;
}
The background code which starts the download using browser api:
chrome.runtime.onMessage.addListener(function (message) {
let fname = message.filename
.trim()
.replace(/[`~!@#$%^&*()_|+\-=?;:'",<>{}[\]\\/]/gi, "-")
.replace(/[\\/:*?"<>|]/g, "_")
.substring(0, 240)
.replace(/\s+/g, " ");
chrome.downloads.download({
url: message.url,
filename: fname,
conflictAction: "uniquify",
saveAs: true,
});
});
The error we get in browser console:
Unchecked lastError value: Error: filename must not contain illegal characters
How to sanitise the string to have only valid filenames for such situations in javascript? It seems emojis are not an issue, but multiple emojis are !!!
you can use Unicode properties class to find emojis in a string
syntax is \p{...}
example
console.log("π¨βπΎaa".replace(/\p{So}/gu, ""))
there are more options to use class \p{...}, you can see them in docs
If single emojis do not cause a failure, but man farmer does cause of the problem is zero width joiner. It is an invalind symbol in filenames in chrome. Run a search for U+200D
Resulting regex
/\p{So}\u{200D}\p{So}/gu