I am reading and processing CSV files in a Spring Boot based application. The CSV file is in ISO-8859-1 format and contains Turkish characters. When I first read the file with BufferedReader
the content appears correct, but when logged or transferred to other services the Turkish characters are corrupted.
For example:
EDIRNE;IPSALA;ATATÜRK;28.04.2025;16:28:00;İİÇŞÜĞİİİ
EDÝRNE;IPSALA;ATATÜRK;28.04.2025;16:28:00;ÝÇÞÜÐÝÝÝ
My CSV reading code:
public List<String> read(InputStream inputStream) {
logger.info("csv read started");
List<String> csvFileContent = new ArrayList<>();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = reader.readLine()) != null) {
csvFileContent.add(line);
}
reader.close();
} catch (IOException e) {
//Handle exception
}
logger.info("csv read finished");
logger.info(csvFileContent.toString());
return csvFileContent;
}
My CSV processing code:
private CSVParser parseCsvContent(List<String> sftpContent) throws IOException {
return CSVParser.parse(String.join("\n", sftpContent),
CSVFormat.DEFAULT.builder()
.setHeader()
.setDelimiter(';')
.build());
}
code to read files from SFTP:
public List<String> readFromSFTP(String ftpAddress, Integer ftpPort, String ftpUserName, String ftpPassword, String ftpDirectory) {
logger.info("readFromSFTP started");
try {
Session session = sftpSessionFactory.createSession(ftpUserName, ftpAddress, ftpPort, ftpPassword);
ChannelSftp channel = sftpChannelService.createAndConnectChannel(session);
logger.info("channel: "+channel.isConnected());
String targetFileName = getValidFileNames(channel, ftpDirectory);
logger.info("/readFromSFTP/targetedFileName: "+targetFileName);
channel.cd(ftpDirectory);
if (targetFileName != null) {
InputStream inputStream = channel.get(targetFileName);
sftpFileContent = fileReadingService.readFromStream(inputStream, targetFileName);
} else {
setValidFileName(null);
logger.warn("There are no valid CSV files available for processing.");
}
channel.disconnect();
session.disconnect();
} catch (JSchException | SftpException e) {
throw new SftpFileAccessException("Error while accessing or processing SFTP files."+ e.getMessage(),e);
} catch (Exception e) {
e.printStackTrace();
}
logger.info("readFromSFTP finished");
return sftpFileContent;
}
I tried the following in the read
method.
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "ISO-8859-1"));
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "ISO-8859-9"));
But the result did not change.
When I did the following mode and debugged it, I saw that I received the data in csvFileContent with Turkish characters. However, it still failed when the data is written to the log or transferred to services.
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "windows-1254"));
Write your data in the appropriate character set/encoding such as ISO-8859-9 (or, better, Unicode (UTF-8)).
Files.writeString ( path , turkish , Charset.forName ( "ISO-8859-9" ) );
Read your data according to that same character set/encoding.
Files.lines ( path , Charset.forName ( "ISO-8859-9" ) )
The Answer by Andy Turner is correct and clear. Your use of ISO-8859-1 (“Latin-1”) is not appropriate for the text you want to write and read. It lacks some characters needed for Turkish language.
Your data file needs to be written using a character set such as ISO-8859-9 (“Latin-5”).
To quote Wikipedia:
ISO/IEC 8859-9:1999 … is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language (and the vast majority of users use it for that language, even though it can also be used for some other languages), designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters (Ðð, Ýý, Þþ) with characters unique to the Turkish alphabet (Ğğ, İ, ı, Şş). And the uppercase of i is İ; the lowercase of I is ı.
Here is some example code showing how using the proper character set can successfully write and read your desired text.
This little example app writes a temporary file with your desired string. Then the app pauses, so you can go inspect that written file. When the app resumes, it reads that file, and dumps the text to console. Upon exiting, the app deletes the temporary file.
package work.basil.example.text;
import java.io.IOException;
import java.nio.charset.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.stream.Stream;
public class Turkish
{
public static void main ( String[] args )
{
// Charset.availableCharsets ( ).forEach ( ( k , v ) -> System.out.println ( k + " = " + v ) );
String charSetName = "ISO-8859-9"; // Latin-9 for Turkish characters in a ISO-8859 character set. Tip: Use UTF-8 instead if feasible.
if ( ! Charset.isSupported ( charSetName ) ) { throw new RuntimeException ( "Charset not supported: " + charSetName ); }
Charset charset = Charset.forName ( charSetName );
final String turkish = "EDIRNE;IPSALA;ATATÜRK;28.04.2025;16:28:00;İİÇŞÜĞİİİ";
// Write -------------------------------------------
System.out.println ( "Press Enter/Return to write file with sample Turkish text." );
System.console ( ).readLine ( );
Path path = null;
try
{
path = Files.createTempFile ( "ExampleTurkish" , ".txt" );
path.toFile ( ).deleteOnExit ( ); // Eventually delete, when app exits.
Files.writeString ( path , turkish , charset );
System.out.println ( "Example Turkish file written to: " + path );
}
catch ( IOException e )
{
throw new RuntimeException ( e );
}
// Paused here, so you can examine the file with a hex-editor if you so desire.
// Read -------------------------------------------
System.out.println ( "Press Enter/Return to read sample Turkish file." );
System.console ( ).readLine ( );
try
(
Stream < String > linesStream = Files.lines ( path , charset )
)
{
linesStream.forEach ( System.out :: println );
System.out.println ( "Versus original: " );
System.out.println ( turkish );
}
catch ( IOException e )
{
throw new RuntimeException ( e );
}
}
}
When run:
EDIRNE;IPSALA;ATATÜRK;28.04.2025;16:28:00;İİÇŞÜĞİİİ
Versus original:
EDIRNE;IPSALA;ATATÜRK;28.04.2025;16:28:00;İİÇŞÜĞİİİ
During that pause you could inspect that file with a text-editor or a hex-editor. Here we use the Hex Fiend app available on macOS to read that file.
The catch is that you must manually instruct your hex-editor as to what character set should be assumed for reading this file. Yes, amazingly, after 64 years of file systems, the computer information industry has not yet invented metadata to indicate the character set nor character encoding of a file.
To instruct Hex Fiend about the character encoding:
Turkish (ISO Latin 5)
(meaning ISO-8859-9) to the Text Encoding
menu.Text Encoding
menu.You can then see the proper Turkish characters appear on the right side. Shown in this screenshot:
The ideal solution would be abandoning the legacy character encodings including both Latin-1 and Latin-5.
👉🏽 Instead, just use UTF-8.
In Java, that is StandardCharsets.UTF_8
. In modern Java, UTF-8 is used by default for most purposes across all platforms. But I recommend specifying explicitly rather than rely implicitly on default.
For more info, see:
By the way, a tip: Educate the publisher of your data about using standard ISO 8601 formats for exchanging date-time values in text. Example: 2025-04-28T16:28:00
rather than 28.04.2025;16:28:00
. And if that was meant to represent a moment, attach an offset and time zone for complete information. For offset of zero, append a Z
: 2025-04-28T16:28:00Z
.