c++qtcharacter-encodingqtextstream

How to create an ISO 8859-15 (instead of default UTF-8) encoded text file on Linux using QTextStream?


The function below is something I have created in a unit test for a Qt project I'm working on.

It creates a file (empty or filled) that is then opened in various use cases, processed and the outcome evaluated. One special use case I have identified is that the encoding actually does affect my application so I decided to cover non-UTF-8 files too (as far as this is possible).

void TestCsvParserOperators::createCsvFile(QString& path, CsvType type, bool utf8)
{
    path = "test_data.txt";

    QFile csv(path);
    // Make sure both reading and writing access is possible. Also turn on truncation to replace any existing files
    QVERIFY(csv.open(QIODevice::ReadWrite | QIODevice::Truncate | QIODevice::Text) == true);

    QTextStream csvStream(&csv);

    // Set encoding
    if (utf8)
    {
        csvStream.setCodec("UTF-8");
    }
    else
    {
        csvStream.setCodec("ISO 8859-15");
        csvStream.setGenerateByteOrderMark(false);
    }

    switch(type)
    {
    case EMPTY:     // File doesn't contain any data
        break;
    case INVALID:   // File contains data that is not supported
        csvStream << "abc" << '\n';
        break;
    case VALID:
    {
        // ...
        break;
    }
    }

    csv.close();
}

While the project runs on Linux the data is exported as a plain text file on Windows (and possibly edited with Notepad) and used by my application as it is. I discovered that it is encoded not as UTF-8 but as ISO 8859-15. This led to a bunch of problems including incorrectly processed characters etc.

The actual part in my application that is tested is

// ...

QTextStream in(&csvFile);
if (in.codec() != QTextCodec::codecForName("UTF-8"))
{
    LOG(WARNING) << this->sTag << "Expecting CSV file with UTF-8 encoding. Found " << QString(in.codec()->name()) << ". Will attempt to convert to supported encoding";

    // Handle encoding
    // ...
}

// ...

Regardless of the combination of values for type and utf8 I always get my test text file. However the encoding remains UTF-8 regardless of the utf8 flag.

Calling file on the CSV file with the actual data (shipped by the client) returns

../trunk/resources/data.txt: ISO-8859 text, with CRLF line terminators

while doing the same on test_data.txt gives me

../../build/test-bin/test_data.txt: UTF-8 Unicode text

I've read somewhere that if I want to use some encoding other than UTF-8 I have to work with QByteArray. However I am unable to verify this in the Qt documentation. I've also read that setting the BOM should do the trick but I tried with both enabling and disabling its generation without any luck.

I've already written a small bash script which converts the encoding to UTF-8 (given that the input file is ISO 8859) but I'd like to

Any ideas how to achieve this?


UPDATE: I replaced the content I'm writing to the text file as

csvStream << QString("...").toLatin1() << ...;

and now I get

../../build/test-bin/test_data.txt: ASCII text

which is still not what I'm looking for.


Solution

  • Usually this is what I do:

    QTextCodec *codec1 = QTextCodec::codecForName("ISO 8859-15");
    QByteArray csvStreambyteArray = " .... "; // from your file
    QString csvStreamString = codec1->toUnicode(csvStreambyteArray);
    csvStream << csvStreamString ;