javascripthtmlunicode

Unicode characters are not properly rendered in transpiled JS


I have a simple transliteration application which works well in simple JavaScript (in a file named cctest.js).

function parse() {
    return '\u0d85';
}

The test HTML I have used is as follows.

<!DOCTYPE html>
<html>
<head>
    <title></title>
    <script type="text/javascript" src="cctest.js"></script>
    <script type="text/javascript" src="dist/cctest.js"></script>
</head>
<body>
    <script type="text/javascript">
        console.log(parse());
        console.log(CCTest.parse());
    </script>
</body>
</html>

I have decided to refactor this into follows so that I leverage object-oriented aspects better (src/cctest.js).

export default class CCTest {
    parse() {
        return '\u0d85';
    }
}

src/index.js

import CCTest from "./ccctest";

(function(window){
    window.CCTest = new CCTest();
})(window)

I have configured this to be transpiled using Webpack and Babel, which works. But when I open the test page, I have the following results in the console.

enter image description here

As you can see, the plain JS renders the Unicode without any issues, while the JS transpiled with Webpack and Babel gives gibberish. I looked into everywhere I would possibly put my hands on, but I am having a hard time figuring out what's going on. What am I doing wrong here?)


Solution

  • As I dug a little deeper, I figured that it was a problem with the minification of the file. During the minification, the unicode literal '\u0d85' is being replaced with its actual unicode representation, resulting in the minified version having ...return 'අ'..., which results in the Mojibake 0xE0, 0xB6 and 0x85 (thanks JosefZ for his hint on Mojibake.)

    Managed to fix this with the use of a 3rd party minifier uglifyjs-webpack-plugin. The configuration in webpack.config.js for this would be something along the lines of

    const UglifyJsPlugin = require('uglifyjs-webpack-plugin');
    
    module.exports = (env = {
      minify: false
    }) => {
      return {
        ...,
        optimization: {
          minimize: env.minify && env.minify === 'true',
          minimizer: [new UglifyJsPlugin({
            uglifyOptions: {
              output: {
                // true for `ascii_only`
                ascii_only: true
              },
            },
          })],
        },
        ...
      }
    };
    

    The key here is to configure the minifier to use only ASCII characters by using the config option ascii_only: true. That resolved the problem for me.