node.jsutf-8iso-8859-1iconvcharset

NodeJS and Iconv - "ISO-8859-1" to "UTF-8"


I created a NodeJS application which should get some data from an external API-Server. That server provides its data only as 'Content-Type: text/plain;charset=ISO-8859-1'. I have got that information through the Header-Data of the server.

Now the problem for me is that special characters like 'ä', 'ö' or 'ü' are shown as . I tried to convert them with Iconv to UTF-8, but then I got these things '�'...

My question is, what am I doing wrong?


For testing I use Postman. These are the steps I do to test everything:

Another strange thing: When I connect Postman directly to the API-Server, the special characters get shown as they have too without problems. Therefore i guess my application causes the problem but I cannot see where or why...


// Javascript Code:

try {
    const response = await axios.get(
      URL 
      {
        params: params, 
        headers: headers
      }
    );

    var iconv     = new Iconv('ISO-8859-1', 'UTF-8');
    var converted = await iconv.convert(response.data);
    return converted.toString('UTF-8');

  } catch (error) {
    throw new Error(error);
  }

Solution

  • So after some deeper research I came up with the solution to my problem.

    The cause of all trouble seems to lie within the post-process of axios or something similar. It is the step close after data is received and convertet to text and shortly before the response is generated for my nodejs-application.

    What I did was to define the "responseType" of the GET-method of axios as an "ArrayBuffer". Therefore an adjustment in axios was necessary like so:

    var resArBuffer = await axios.get(
          URL, 
          {
            responseType: 'arraybuffer',
            params: params, 
            headers: headers
          }
        );
    

    Since JavaScript is awesome, the ArrayBuffer provides a toString() method itself to convert the data from ArrayBuffer to String by own definitions:

        var response = resArBuffer.data.toString("latin1");
    

    Another thing worth mentioning is the fact that I used "latin1" instead of "ISO-8859-1". Don't ask me why, some sources even recommended to use "cp1252" instead, but "latin1" workend for me here.

    Unfortunately that was not enough yet since I needed the text in UTF-8 format. Using "toString('utf-8')" itself was the wrong way too since it would still print the "�"-Symbols. The workaround was simple. I used "Buffer.from(...)" to convert the "latin1" defined text into a "utf-8" text:

        var text = Buffer.from(response, 'utf-8').toString();
    

    Now I get the desired UTF-8 converted text I needed. I hope this thread helps anyone else outhere since thse informations hwere spread in many different threads for me.