The function below is something I have created in a unit test for a Qt project I'm working on.
It creates a file (empty or filled) that is then opened in various use cases, processed and the outcome evaluated. One special use case I have identified is that the encoding actually does affect my application so I decided to cover non-UTF-8 files too (as far as this is possible).
void TestCsvParserOperators::createCsvFile(QString& path, CsvType type, bool utf8)
{
path = "test_data.txt";
QFile csv(path);
// Make sure both reading and writing access is possible. Also turn on truncation to replace any existing files
QVERIFY(csv.open(QIODevice::ReadWrite | QIODevice::Truncate | QIODevice::Text) == true);
QTextStream csvStream(&csv);
// Set encoding
if (utf8)
{
csvStream.setCodec("UTF-8");
}
else
{
csvStream.setCodec("ISO 8859-15");
csvStream.setGenerateByteOrderMark(false);
}
switch(type)
{
case EMPTY: // File doesn't contain any data
break;
case INVALID: // File contains data that is not supported
csvStream << "abc" << '\n';
break;
case VALID:
{
// ...
break;
}
}
csv.close();
}
While the project runs on Linux the data is exported as a plain text file on Windows (and possibly edited with Notepad) and used by my application as it is. I discovered that it is encoded not as UTF-8 but as ISO 8859-15. This led to a bunch of problems including incorrectly processed characters etc.
The actual part in my application that is tested is
// ...
QTextStream in(&csvFile);
if (in.codec() != QTextCodec::codecForName("UTF-8"))
{
LOG(WARNING) << this->sTag << "Expecting CSV file with UTF-8 encoding. Found " << QString(in.codec()->name()) << ". Will attempt to convert to supported encoding";
// Handle encoding
// ...
}
// ...
Regardless of the combination of values for type
and utf8
I always get my test text file. However the encoding remains UTF-8 regardless of the utf8
flag.
Calling file
on the CSV file with the actual data (shipped by the client) returns
../trunk/resources/data.txt: ISO-8859 text, with CRLF line terminators
while doing the same on test_data.txt
gives me
../../build/test-bin/test_data.txt: UTF-8 Unicode text
I've read somewhere that if I want to use some encoding other than UTF-8 I have to work with QByteArray
. However I am unable to verify this in the Qt documentation. I've also read that setting the BOM
should do the trick but I tried with both enabling and disabling its generation without any luck.
I've already written a small bash script which converts the encoding to UTF-8 (given that the input file is ISO 8859) but I'd like to
Any ideas how to achieve this?
UPDATE: I replaced the content I'm writing to the text file as
csvStream << QString("...").toLatin1() << ...;
and now I get
../../build/test-bin/test_data.txt: ASCII text
which is still not what I'm looking for.
Usually this is what I do:
QTextCodec *codec1 = QTextCodec::codecForName("ISO 8859-15");
QByteArray csvStreambyteArray = " .... "; // from your file
QString csvStreamString = codec1->toUnicode(csvStreambyteArray);
csvStream << csvStreamString ;