javascriptutf-8google-chrome-extensioniso-8859-1transcoding

How do I transcode a Javascript string to ISO-8859-1?


I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

And also:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

But that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}

Solution

  • It is my understanding that Javascript uses UTF-8 for its strings

    No, no.

    Each page has its charset enconding defined in meta tag, just below head element

    <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
    

    or

    <head>
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
    

    Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

    And it is a good idea to define its target charset encoding on server side.

    Java
    <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
    
    PHP
    header("Content-Type: text/html; charset=UTF-8");
    
    C#
    I do not know how to...
    

    And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

    <script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>
    

    ...

    So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

    No, no.

    The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

    Java
    request.setCharacterEncoding("UTF-8")
    
    PHP
    // I do not know how to...
    

    If you really want to translate the target charset encoding, TRY as follows

    InternetExplorer
        formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
    ELSE
        formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";
    

    Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

    alert("á without its Unicode Character Set numerical representation");
    function convertToUnicodeCharacterSet(value) {
        if(value == "á")
            return "\u00E1";
    }
    alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));
    

    Here you can see in action:

    You can use this link as guideline (See JavaScript escapes)

    Added to original answer how I implement jQuery funcionality

    var dataArray = $(formElement).serializeArray();
    var queryString = "";
    for(var i = 0; i < dataArray.length; i++) {
        queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
    }
    $.ajax({
        url:"url.htm",
        data:dataString,
        contentType:"application/x-www-form-urlencoded; charset=UTF-8",
        success:function(response) {
            // proccess response
        });
    });
    

    It works fine without any headache.

    Regards,