I'm having a problem with a CSV data import code into the database. I'm using Java 11 with SpringBoot 2.7.3 REST API
I have a Report class that has a Many-To-Many one-way relationship with the SensorError class, where only Report knows about SensorError and not the other way around:
@ManyToMany @JoinTable( name = "rel_report__sensor_error", joinColumns = @JoinColumn(name = "report_id"), inverseJoinColumns = @JoinColumn(name = "sensor_error_id") ) private Set<SensorError> sensorErrors = new HashSet<>();
I created a code to import CSV data:
public static List<Report> csvToReports(InputStream is, ReportRepository reportRepository,
SensorErrorRepository sensorErrorRepository) {
try (BufferedReader fileReader = new BufferedReader(new InputStreamReader(is, Charset.defaultCharset()));
CSVParser csvParser = new CSVParser(fileReader,
CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());) {
// Create an empty list to hold the SiraReports that will be parsed from the CSV
// file
List<Report> reports = new ArrayList<Report>();
// Parse the SensorError objects from the CSVRecord
Set<SensorError> sensorErrors = new HashSet<>();
// Get an Iterable of CSVRecords from the CSVParser
Iterable<CSVRecord> csvRecords = csvParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
Report report = new Report(csvRecord.get("final_state"), Instant.parse(csvRecord.get("datetime_begin")),
Instant.parse(csvRecord.get("datetime_end")), csvRecord.get("cutting_time"),
csvRecord.get("total_time"));
// Check if sensor_name, error_code and error_detail fields have values
if (!csvRecord.get("sensor_name").isEmpty() || !csvRecord.get("error_code").isEmpty()
|| !csvRecord.get("error_detail").isEmpty()) {
// Create a new SensorError object using the data from the CSVRecord
SensorError sensorError = new SensorError(csvRecord.get("sensor_name"), csvRecord.get("error_code"),
csvRecord.get("error_detail"));
// Add the SensorError object to the Set
sensorErrors.add(sensorError);
}
// Set the SensorError objects for the SiraReport object
report.setSensorErrors(sensorErrors);
// Save the SensorError objects to the database using the SensorErrorRepository
sensorErrorRepository.saveAll(sensorErrors);
// Add the Report object to the List
reports.add(report);
}
// Return the List of Reports
return reports;
} catch (IOException e) {
throw new RuntimeException("Failed to parse CSV file: " + e.getMessage());
}
}
This code works partially for my interest, in my CSV file:
final_state,datetime_begin,datetime_end,cutting_time,total_time,sensor_name,error_code,error_detail
TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor operador","operando","falha parcial","sensor navalha","erro mundial","falha total"
The code saves only the first SensorError Information inside a Report: ""sensor operador","operando","falha parcial"", but not the second information: ""sensor navalha","erro mundial","falha total"".
I already tried some things, like adding: Set sensorErrors = new HashSet<>();, inside the Loop, but then it saves the second SensorError Information and not the first one anymore. I also tried changing it to a List instead of Set, but it didn't solve the problem.
How to make it import more than one existing SensorError information in the CSV file to a single SiraReport?
Well, due to the questions made here in this topic, I could observe that my error was initially, in how the data were being sent, because if for 1 Report I have several SensorError information, then I would have to duplicate the data lines of the Report and include information other than SensorError to look like this:
final_state,datetime_begin,datetime_end,cutting_time,total_time,sensor_name,error_code,error_detail TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor operador","operando","falha parcial" TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor navalha","erro mundial","falha total"
From there, I had to review my code due to questions from @g00se and @Tom Elias, arriving at this code that works:
public static List<Report> csvToReports(InputStream is, ReportRepository reportRepository, SensorErrorRepository sensorErrorRepository) {
try (
BufferedReader fileReader = new BufferedReader(new InputStreamReader(is, Charset.defaultCharset()));
CSVParser csvParser = new CSVParser(fileReader, CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());
) {
Map<String, Report> reportMap = new HashMap<>();
List<Report> reports = new ArrayList<>();
Iterable<CSVRecord> csvRecords = csvParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
String datetimeBegin = csvRecord.get("datetime_begin");
String datetimeEnd = csvRecord.get("datetime_end");
String reportKey = datetimeBegin + datetimeEnd; // Create a new key using final_state and datetime_begin
Report report = reportMap.get(reportKey);
if (report == null) {
report = new Report(
csvRecord.get("final_state"),
Instant.parse(datetimeBegin),
Instant.parse(datetimeEnd),
csvRecord.get("cutting_time"),
csvRecord.get("total_time"),
);
}
// Check if sensor_name, error_code and error_detail fields have values
if (!csvRecord.get("sensor_name").isEmpty() || !csvRecord.get("error_code").isEmpty() || !csvRecord.get("error_detail").isEmpty()) {
// Create a new SensorError object using the data from the CSVRecord
SensorError sensorError = new SensorError(
csvRecord.get("sensor_name"),
csvRecord.get("error_code"),
csvRecord.get("error_detail")
);
// Add the SensorError object to the Report
report.addSensorError(sensorError);
// Save the SensorError object to the database using the SensorErrorRepository
sensorErrorRepository.save(sensorError);
}
// Save the Report object to the database using the ReportRepository
reportRepository.save(report);
// Add the Report object to the Map and List
reportMap.put(reportKey, report);
reports.add(report);
}
// Return the List of Reports
return reports;
} catch (IOException e) {
throw new RuntimeException("Failed to parse CSV file: " + e.getMessage());
}
}
I had to create a method that performs an identification if the lines sent in the CSV refer to the same Report and add the SensorError information to the same Report object.
So now the code creates a new identifier using the datetime_begin and datetime_end field combination to check if there is an existing Report with the datetime_begin and datetime_end field combination. If there is no existing Report, it creates a new Report.
In summary, it will check the duplicated lines with Report information, if it is the same Report object and then it will register the different SensorError information for that Report object.
I am aware that a better check would be by ID, but this is generated sequentially by the application.