javascriptutf-8character-encodingascii

How to convert large UTF-8 strings into ASCII?


I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm.

How can I do this? I need the source code (using loops) or the JavaScript code. (should not be dependent on any platform/framework/library)

Edit: I understand that the ASCII representation will not look correct and would be larger (in terms of bytes) than its UTF-8 counterpart, since its an encoded form of the UTF-8 original.


Solution

  • You could use an ASCII-only version of Douglas Crockford's json2.js quote function. Which would look like this:

        var escapable = /[\\\"\x00-\x1f\x7f-\uffff]/g,
            meta = {    // table of character substitutions
                '\b': '\\b',
                '\t': '\\t',
                '\n': '\\n',
                '\f': '\\f',
                '\r': '\\r',
                '"' : '\\"',
                '\\': '\\\\'
            };
    
        function quote(string) {
    
    // If the string contains no control characters, no quote characters, and no
    // backslash characters, then we can safely slap some quotes around it.
    // Otherwise we must also replace the offending characters with safe escape
    // sequences.
    
            escapable.lastIndex = 0;
            return escapable.test(string) ?
                '"' + string.replace(escapable, function (a) {
                    var c = meta[a];
                    return typeof c === 'string' ? c :
                        '\\u' + ('0000' + a.charCodeAt(0).toString(16)).slice(-4);
                }) + '"' :
                '"' + string + '"';
        }
    

    This will produce a valid ASCII-only, javascript-quoted of the input string

    e.g. quote("Doppelgänger!") will be "Doppelg\u00e4nger!"

    To revert the encoding you can just eval the result

    var encoded = quote("Doppelgänger!");
    var back = JSON.parse(encoded); // eval(encoded);