Basically, Using one Scheduler I want to process multiple csv file at same time. File can have 1 to 10K records in it. I want to process each file parallelly and also if records>1K, I want to process that file's records parallelly.
Let's say 1OK records with 10 different thread.
My task is to read one DB tables from where I have FTP file path URL. and process those csv files, validate csv data and ultimately save into DB table.
List<CSVFileRecords> files = filesRepo.findAll();
files.foreach(file->processFile(file));
@Async
void processFile(file) {
InputStream i = getStream(file); //download file
List<Data> data = csvParser.csvToBean(i); //consider 10K records
List<List<Data>> dataList = getListOfList(data);
dataList.parallelStream().foreach(data-> processData(data));
}
List<Response> processData(data) {
validate();
saveAll();
}
Nope, it won't work as you are expecting.
because of, files.foreach(file->processFile(file));
good option would be
// 10 - Number of threads
ExecutorService executorService = Executors.newFixedThreadPool(10);
files.foreach(file -> executorService.execute(() -> processFile(file)));
Things to note, inside process method you are using parallelStream, that will subsequently create more threads based on the size of data.
A good practice would be, partition your data before processing.