c++utf-8iconvcharacter-set

Output buffer empty in iconv , while converting from ISO-8859-1 to UTF-8


In linux I have created a file with Turkish characters and changed file characterset to "ISO-8859-9". With below cpp, I am trying to convert it to UTF-8. But iconv returns empty outbuffer. But "iconv" returns "inbytesleft" as "0" means conversion done on input. What could be the mistake here?

My linux file format: [root@osst212 cod]# file test.txt test.txt: ISO-8859 text

[root@osst212 cod]# cat test.txt --> Here my putty Characterset setting is ISO-8859-9 fıstıkçı şahap

#include <string>
#include <iostream>
#include <locale>
#include <cstdlib>
#include <fstream>
#include <string>
#include <sstream>
#include <iconv.h>
#include <cstring>
#include <cerrno>
#include <csignal>

using namespace std;

int main()
{

const char* lna = getenv("LANG");
cout << "LANG is " << lna << endl;
setlocale(LC_ALL, "tr_TR.ISO8859-9");

ifstream fsl("test.txt",ios::in);
string myString;
if ( fsl.is_open() ) {
        getline(fsl,myString); }

size_t ret;
size_t inby = sizeof(myString);                   /*inbytesleft for iconv */
size_t outby = 2 * inby;                          /*outbytesleft for iconv*/

char* input = new char [myString.length()+1];     /* input buffer to be translated to UTF-8 */
strcpy(input,myString.c_str());
char* output = (char*) calloc(outby,sizeof(char)); /* output buffer */

iconv_t iconvcr = iconv_open("UTF-8", "ISO−8859-9");
if ((ret = iconv(iconvcr,&input,&inby,&output,&outby)) == (size_t) -1) {
        fprintf(stderr,"Could not convert to UTF-8 and error detail is \n",strerror(errno)); }

cout << output << endl;
raise(SIGINT);
iconv_close(iconvcr);

}

Local variables after iconv called are as below, when I run it under gdb. You can see output is empty.

(gdb) bt
#0  0x00007ffff7224387 in raise () from /lib64/libc.so.6
#1  0x0000000000401155 in main () at stack.cpp:41
(gdb) frame 1
#1  0x0000000000401155 in main () at stack.cpp:41
41      raise(SIGINT);
(gdb) info locals
lna = 0x7fffffffef72 "en_US.UTF-8"
fsl = <incomplete type>
ret = 0
inby = 0
outby = 4
myString = "f\375st\375k\347\375 \376ahap"
input = 0x606268 " \376ahap"
output = 0x60628c ""
iconvcr = 0x606a00

Solution

  • man 3 iconv

    The iconv() function converts one multibyte character at a time, and for each character conversion it increments *inbuf and decrements *inbytesleft by the number of converted input bytes, it increments *outbuf and decrements *outbytesleft by the number of converted output bytes.

    output is updated to point next not used byte in the originally allocated buffer.

    The proper usage

    char* nextouput = output:
    if ((ret = iconv(iconvcr, &input, &inby, &nextoutput, &outby)) == (size_t) -1) {
        fprintf(stderr, "Could not convert to UTF-8 and error detail is \n", strerror(errno)); }