data-cleaningdata-warehousepentaho-data-integrationstagespoon

Pentaho spoon search and replace especial character in rows


I have a csv file with mime type US-ASCII and one column in the dataset look like this:

id V_name
210001 cha?ne des Puys
210030 M?los
213004 G?ll?
213021 S?phan
221110 Afd?ra

And so on.

I would like to change those characters to:

id V_name
210001 chaine des Puys
210030 Milos
213004 Gollu
213021 Suphan
221110 Afdera

The thing is that there are 95 rows of this kind, how can I search and replace those rows? I using the suite PDI spoon. Thanks in advance.


Solution

  • As @Iłya Bursov has stated, the source file you are reading doesn't provide the correct characters, it is providing the ? in the source, so if you want to correct it, you'll have to do it manually.

    I don't think it is worth it, unless you know you are going to get always the same set of V_name over time and different files. In that case you could create a file to correlate the V_name in your source with the ? characters to a V_name_corrected with the correct display for the characters. This seems to be an exercise, so I would let the names as they are. In real life, I would contact the provider of the file with the incorrect character set to tell them that they need to correct the generation of the file to provide the correct characters in the file.