spring-bootcsvzip

Java ZipOutputStream creates corrupt file with partially duplicated contents


I have a Spring Boot 3 project which (apart from other things) will have to save some CSV files to disk, and then also create a ZIP from these files into another folder as backup. The CSV files are generated correctly. I used Jackson CSV mapper to generate the CSV contents individually for each file (3 base classes), and then ZipOutputStream to put all 3 files into a single zip. The resulting CSV files are correct when checking on the disk, but the same file from the zip is incorrect. Sample correct CSV contents from the original file:

"ID";"DATE";"REC_STATUS";"ORIG_ID";"PARAM_NAME";"PARAM_VALUE"
"109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"125";"2023-11-27";"0";"124";"param name1";"param1 value1"
"126";"2023-11-27";"0";"124";"name2";"value2"

Same file when taken from the zip contains corrupt data:

"ID";"DATE";"REC_STATUS";"ORIG_ID";"PARAM_NAME";"PARAM_VALUE"
"109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"125";"2023-11-27";"0";"124";"param name1";"param1 value1"
"126";"2023-11-27";"0";"124";"name2";"value2"
109";"2023-11-27";"0";"116";"transfer1 param";"transfer1 value"
"110";"2023-11-27";"0";"116";"transfer2";"transfer2 value2"
"111";"2023-11-27";"0";"117";"transfer1 param";"transfer1 value"
"112";"2023-11-27";"0";"117";"transfer2";"transfer2 value2"
"113";"2023-11-27";"0";"118";"transfer1 param";"transfer1 value"
"114";"2023-11-27";"0";"118";"transfer2";"transfer2 value2"
"115";"2023-11-27";"0";"119";"transfer1 param";"transfer1 value"
"116";"2023-11-27";"0";"119";"transfer2";"transfer2 value2"
"117";"2023-11-27";"0";"120";"param name1";"param1 value1"
"118";"2023-11-27";"0";"120";"name2";"value2"
"119";"2023-11-27";"0";"121";"param name1";"param1 value1"
"120";"2023-11-27";"0";"121";"name2";"value2"
"121";"2023-11-27";"0";"122";"param name1";"param1 value1"
"122";"2023-11-27";"0";"122";"name2";"value2"
"123";"2023-11-27";"0";"123";"param name1";"param1 value1"
"124";"2023-11-27";"0";"123";"name2";"value2"
"

Notice how the CSV file seems to repeat itself in the middle, but without the leading " in the row, and there is also an unneeded " in the last row.

The CSV generation:

private String collectValuesToCSV(List<? extends Object> values, CsvMapper csvMapper, CsvSchema csvSchema) {
    String result;
    
    try (StringWriter strW = new StringWriter();
            SequenceWriter seqW = csvMapper.writer(csvSchema).writeValues(strW)) {
        for (Object value : values) {
            seqW.write(value);
        }
        
        seqW.flush();
        strW.flush();
    
        result = strW.toString();
    } catch (IOException e) {
        log.fatal(String.format("%s - Unable to generate CSV", LOG_PREFIX), e);
        throw e;
    }
    
    return result;
}

// usage in other parts of the code
CsvMapper csvMapper = CsvMapper.builder()
        .enable(CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS)
        .enable(CsvGenerator.Feature.ALWAYS_QUOTE_EMPTY_STRINGS)
        .build();

CsvSchema csvSchema = csvMapper
        .schemaFor(MyDTO.class)
        .withColumnSeparator(';')
        .withHeader();

List<MyDTO> objects = generateObjects();

String myCsv = collectValuesToCSV(objects, csvMapper, csvSchema);

byte[] contents = myCsv.getBytes(StandardCharsets.UTF_8);

// optionally the contents may be encrypted, so a byte array is written to disk,
// but the same problem occurs even without encryption, simply writing this byte[] to disk

Path filePath = Path.of(config.baseFolder, "output_" + dateStr + ".csv");
Files.write(filePath, contents);

Creation of the ZIP file:

private void zipCsvFiles(String dateStr, Path path1, Path path2, Path path3) throws IOException {
    String zipFileName = Path.of(config.zipFolder, "output_" + dateStr + ".zip").toString();
    
    try (FileOutputStream fos = new FileOutputStream(zipFileName);
            ZipOutputStream zipOut = new ZipOutputStream(fos)) {
        addZipEntry(zipOut, path1);
        addZipEntry(zipOut, path2);
        addZipEntry(zipOut, path3);
    }
}

private void addZipEntry(ZipOutputStream zipOut, Path path) throws IOException {
    try (FileInputStream fis = new FileInputStream(path.toFile()) {
        ZipEntry entry = new ZipEntry(path.getFileName().toString());
        zipOut.putNextEntry(entry);

        byte[] bytes = new byte[1024];

        while ((fis.read(bytes)) >= 0) {
            zipOut.write(bytes);
        }
        
        zipOut.closeEntry();
    }
}

The resulting zip looks okay, it shows the 3 files, with correct filenames. But when opening any of them, the contents are broken... I looked into other questions on this topic, and tried some ideas already (calling flush and finish on the streams manually), but as far as I can tell these are redundant as the close() methods are calling them as needed, and the close() methods are always called as I'm handling everything in try-with-resources. So I'm not sure what could be causing the problem.


Solution

  • The problem is in the following lines in addZipEntry:

        byte[] bytes = new byte[1024];
    
        while ((fis.read(bytes)) >= 0) {
            zipOut.write(bytes);
        }
    

    The method read returns the number of read bytes, but you don't use it and always send the whole array into zipOut.

    Change it to the following:

        byte[] bytes = new byte[1024];
        int read = fis.read(bytes);
                
        while (read >= 0) {
            zipOut.write(bytes, 0, read);
            read = fis.read(bytes);
        }