I have a Spring Boot 3 project which (apart from other things) will have to save some CSV files to disk, and then also create a ZIP from these files into another folder as backup. The CSV files are generated correctly. I used Jackson CSV mapper to generate the CSV contents individually for each file (3 base classes), and then ZipOutputStream to put all 3 files into a single zip. The resulting CSV files are correct when checking on the disk, but the same file from the zip is incorrect. Sample correct CSV contents from the original file:
"ID";"DATE";"REC_STATUS";"ORIG_ID";"PARAM_NAME";"PARAM_VALUE"
"109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"125";"2023-11-27";"0";"124";"param name1";"param1 value1"
"126";"2023-11-27";"0";"124";"name2";"value2"
Same file when taken from the zip contains corrupt data:
"ID";"DATE";"REC_STATUS";"ORIG_ID";"PARAM_NAME";"PARAM_VALUE"
"109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"125";"2023-11-27";"0";"124";"param name1";"param1 value1"
"126";"2023-11-27";"0";"124";"name2";"value2"
109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"
Notice how the CSV file seems to repeat itself in the middle, but without the leading " in the row, and there is also an unneeded " in the last row.
The CSV generation:
private String collectValuesToCSV(List<? extends Object> values, CsvMapper csvMapper, CsvSchema csvSchema) {
String result;
try (StringWriter strW = new StringWriter();
SequenceWriter seqW = csvMapper.writer(csvSchema).writeValues(strW)) {
for (Object value : values) {
seqW.write(value);
}
seqW.flush();
strW.flush();
result = strW.toString();
} catch (IOException e) {
log.fatal(String.format("%s - Unable to generate CSV", LOG_PREFIX), e);
throw e;
}
return result;
}
// usage in other parts of the code
CsvMapper csvMapper = CsvMapper.builder()
.enable(CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS)
.enable(CsvGenerator.Feature.ALWAYS_QUOTE_EMPTY_STRINGS)
.build();
CsvSchema csvSchema = csvMapper
.schemaFor(MyDTO.class)
.withColumnSeparator(';')
.withHeader();
List<MyDTO> objects = generateObjects();
String myCsv = collectValuesToCSV(objects, csvMapper, csvSchema);
byte[] contents = myCsv.getBytes(StandardCharsets.UTF_8);
// optionally the contents may be encrypted, so a byte array is written to disk,
// but the same problem occurs even without encryption, simply writing this byte[] to disk
Path filePath = Path.of(config.baseFolder, "output_" + dateStr + ".csv");
Files.write(filePath, contents);
Creation of the ZIP file:
private void zipCsvFiles(String dateStr, Path path1, Path path2, Path path3) throws IOException {
String zipFileName = Path.of(config.zipFolder, "output_" + dateStr + ".zip").toString();
try (FileOutputStream fos = new FileOutputStream(zipFileName);
ZipOutputStream zipOut = new ZipOutputStream(fos)) {
addZipEntry(zipOut, path1);
addZipEntry(zipOut, path2);
addZipEntry(zipOut, path3);
}
}
private void addZipEntry(ZipOutputStream zipOut, Path path) throws IOException {
try (FileInputStream fis = new FileInputStream(path.toFile()) {
ZipEntry entry = new ZipEntry(path.getFileName().toString());
zipOut.putNextEntry(entry);
byte[] bytes = new byte[1024];
while ((fis.read(bytes)) >= 0) {
zipOut.write(bytes);
}
zipOut.closeEntry();
}
}
The resulting zip looks okay, it shows the 3 files, with correct filenames. But when opening any of them, the contents are broken... I looked into other questions on this topic, and tried some ideas already (calling flush and finish on the streams manually), but as far as I can tell these are redundant as the close() methods are calling them as needed, and the close() methods are always called as I'm handling everything in try-with-resources. So I'm not sure what could be causing the problem.
The problem is in the following lines in addZipEntry
:
byte[] bytes = new byte[1024];
while ((fis.read(bytes)) >= 0) {
zipOut.write(bytes);
}
The method read
returns the number of read bytes, but you don't use it and always send the whole array into zipOut
.
Change it to the following:
byte[] bytes = new byte[1024];
int read = fis.read(bytes);
while (read >= 0) {
zipOut.write(bytes, 0, read);
read = fis.read(bytes);
}