I have a transformation on Pentaho Data Integration where the first thing I do is I use the "CSV Input" to map my flat file.
I've never had a problem with it on windows, but now I'm chaning my server that spoon is going to run to a linux server and now I'm having problems with special characters.
The first thing I noticed was that my tables where being updated because the system was understanding the names as diferent strings to the ones that are at my database.
Checking for the problem, I also noticed that if I go to my "CSV Input" -> Preview, it will show me the preview of my data with the problem above:
Special characters are not showing.
Where it should be:
Diretoria de Suporte à Decisão e Aplicação
I used a command to checked my file charset/codification and it showed:
$ file -bi foo.csv
text/plain; charset=iso-8859-1
If I open foo.csv on vi, it understands the special characters.
Any idea on what could be the problem or what should I try?
I don't have any data files with this encoding, so you'll have to do some experimenting, but there are some steps designed to deal with these issues.
First, the CSV Input
step has a field that allows you to select the encoding of the source file. The Text File Input
step has both a "Format" (meaning line terminator) and "Encoding" selector under the "Content" tab.
In Transforms, you have the Change file encoding
step under the Utility tab. This step is designed to copy many files while changing their encoding; that's why it's in a transform.
In Jobs, there's the Convert file between Windows and Unix
step under the File Management tab, but this appears to only deal with line terminators.
Either way it appears if the CSV/Text file input steps don't suit your needs, you'll have to copy the file to a new encoding before reading it in. It will probably be easiest to try handling it with the file input steps first.