javascriptnode.jsunicodecharacter-encodingunicode-escapes

Reading escaped unicode sequence from a text file in Node.JS do not render on console or api response


What I tried

'use strict'
const fs = require('fs')
const app = require('express')();

let data = fs.readFileSync('./unichar.txt')
let str = "\u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)"
console.log("from file ========> ",data.toString('utf-8'))
console.log("from str variable >>>>>>>>>> ", str)

let buffer = new Buffer(data,'utf-8')
console.log("After some encoding changes ++++++++> ", buffer.toString('utf8'))

app.get('/',(req,res,next)=>{
    res.contentType('text/plain')
    res.write(data)
    res.write(str)
    res.end()
    next()
})

app.listen(3002)

Output

// from console
$ node index.js 
from file ========>  \u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)

from str variable >>>>>>>>>>  女裝多way連身裙(長袖)
After some encoding changes ++++++++>  \u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)

// from postman
\u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)
女裝多way連身裙(長袖)

// from curl
$ curl localhost:3002
\u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)
女裝多way連身裙(長袖)

Problem I have a problem with unicode text rendering.

For example please consider I have a file with the following single line of unicode characters - \u5973\u88DD\u591Away\u9023\u8EAB\u88D9(\u9577\u8896)

When I read the file using fs module and display the contents it is not rendered as actual characters. It is just displayed as unicode sequences.

Where as when I manually load the same string into a variable and then console that variable, then the actual japanese characters are rendered on the console.

The same problem happens when the same data is send as http response.

Why is the text from file do not get rendered as actual Japanese characters?

I 'am confused and not sure what is to be done to get the file contents display/rendered as actual Japanese characters on console and http response.

It would be very helpful if somebody could please help me to figure out the missing part?

Thankyou


Solution

  • The question seems based on a misunderstanding: The \uxxxx notation cannot be used in text files. In other words: A text file with

    \u5973
    

    in it contains six US-ASCII characters, not one Japanese Unicode character.

    The \uxxxx notation works only in Javascript, where the statement

    fs.writeFileSync("./unichar.txt", "\u5973");
    

    produces a text file with

    in it.