Update:
Yes, thank You very much satesrah for the idea! You are right, there is mashup of encodings and I can't convert the whole text in Win-1251 or Win-1252..
I didn't want to insert unicode and keep use single encoding in this file, but the only way I see is to convert all text with such symbols as \u1234?. So created this function:
function unicode_to_rtf_representation_u(srcStr) {
if (srcStr == undefined) return "";
let tgtStr = "";
for (var i = 0; i < srcStr.length; i++) {
let c = srcStr.charCodeAt(i);
let result = "\\u" + c + "?";
tgtStr += result;
}
console.log("result strings is: " + tgtStr);
return tgtStr;
}
it does something like
Abc Ø абв --> \u65?\u98?\u99?\u32?\u216?\u32?\u1072?\u1073?\u1074?
and this works..
Thank You much again!
Can You please help mу how to encode non-latin (russian) letters, that are mixed with special symbols, for example: Abc Ø абв
(here is english text, special symbol 'latin o' and russian text).
I have existing RTF template with 'placeholder' text inside, and what I need is to replace this 'placeholder' with 'Abc Ø абв':
I use function from here, at the bottom of the page to decode UTF-8 to Win-1251 - it successfully writes russian letters but finally I get "Ш" Instead of 'Ø':
Here is my example code and input and output files:
input rtf: https://mega.nz/file/CtNB2CiY#yid1nLq9P6Jo8zSRAsXeGai-mZLV6xP1OvN1jDpFyG4
output rtf generated by the code below: https://mega.nz/file/asMExKJI#q8oRn1J9oWMlUck6tJ6MdpVGiIjt81kNFRo7T3eSBTU
const http = require('http');
const port = 3100;
function utf8_decode_to_win1251(srcStr) {
var tgtStr = "",
c = 0;
for (var i = 0; i < srcStr.length; i++) {
c = srcStr.charCodeAt(i);
if (c > 127) {
if (c > 1024) {
if (c === 1025) {
c = 1016;
} else if (c === 1105) {
c = 1032;
}
c -= 848;
}
// c = c % 256; // ???
}
tgtStr += String.fromCharCode(c);
}
return tgtStr;
}
const server = http.createServer(function (req, res) {
const fs = require('fs');
// read existing file
fs.readFile("C:\input.rtf", "utf8", (err, inputText) => {
if (err) {
console.error(err);
return;
}
// I want to replace 'placeholder' text in file with this test text:
let text = `Abc Ø абв`; // 'Abc Ø абв'
text = utf8_decode_to_win1251(text); // text with encoded russian letters 'Abc Ø àáâ'
// replace placeholder from input RTF with text with non-latin characters 'Abc Ø àáâ':
inputText = inputText.replace("placeholder", text);
// RTF uses 8-bit so need to convert from unicode
let buf = Buffer.from(inputText, "ascii"); // "binary" also gives wrong output text https://stackoverflow.com/a/34476862/348736
// write output file to disk
fs.writeFile("C:\output.rtf", buf, function (error, resultFile) { // result file contains 'Abc Ш абв', which is wrong..
if (!error) {
console.info('Created file', resultFile);
}
else {
console.error(error);
}
});
});
});
server.listen(port, function (error) {
if (error) {
console.log(`${error}`);
} else {
console.log(`listening on port ${port}`);
}
})
I don't think you can represent "Abc Ø абв" with an 8-bit encoding. At least as far as I know.
I tried to make sense of what happens in your code. The thing is that in Windows-1251 there is no character Ø, you can check that in this table https://www.ascii-code.com/CP1251. And in Windows-1251 the characters aбв do exist. So it does not make sense that the function actually produces Windows-1251. But if you would try to convert "Abc Ø абв" to Windows-1252, you'd find that Windows-1252 does have the character Ø, but does not have абв (the a here is the cyrillic a which is different from the latin a). I think what's happening is, that you decode to Windows-1252, but the data ends up somewhere where it's supposed to be Windows-1251.
Playing that through:
"Abc Ø абв" translates to the hex (utf-8) 41 62 63 C3 98 D0 B0 D0 B1 D0 B2
. Trying to decode this to Windows-1252 gives 41 62 63 D8 E0 E1 E2
.
Printing that gives "Abc Ø àáâ" which is exactly what you got.
If you then change the encoding from Windows-1252 to Windows-1251 for the same hex, it prints "Abc Ш абв". Which again is what happend in your example.
(You can try that out here https://www.rapidtables.com/convert/number/hex-to-ascii.html).