javaspring-bootmultithreadingpaginationexecutorservice

Efficiently processing a large dataset using pagination and multithreading in Java


I'm working on a Java application that requires me to process a large dataset of rows retrieved from the database. Here is the sample situation:

class Example {
   @Autowired
   private Service service;

    public static void main(String[] args) {
        long totalRows = service.getTotalRows(); // E.g., 976451 rows
       
        int totalPages = (int) Math.ceil((double) totalRows / 20 ); //48823
        ExecutorService executorService = Executors.newFixedThreadPool(20); //lets say 20 threads

        // Fetch rows for each page and process
        for (int pageNumber = 0; pageNumber < totalPages; pageNumber++) {
            // so now i want to fetch each page with (0,48823 ) - assign to 1 thread
            // than (1, 48823 ) - assign to another thread
            // ..... same for all 20 threads.... 
            // and simultaneously do the tasks analyzeMethod(row); processMethod(row);
                
            Page<Row> all = service.getRowsByPagination(pageNumber, pageSize);
            all.get().toList().forEach(row -> {
                analyzeMethod(row);
                processMethod(row);
            });
        }
    }

    public void analyzeMethod(Row row) {
        // Perform database-related analysis
    }

    public void processMethod(Row row) {
        // Additional processing
    }
}

So in simple terms, I will fetch thousands or millions of rows from a database. I have a method where it will retrieve the Page objects. I need to perform the tasks asynchronously.

I have tried the above way and I am new to multithreading concepts, I am not getting the expected output and getting exceptions. How to perform the above scenario?


Solution

  • Then you just need to use the submit (Runnable Task) method to perform the analyze method and processMethod methods asynchrono. Something like this:

    executorService.submit(() -> analyzeMethod(row));
    executorService.submit(() -> processMethod(row));
    

    Be careful with Row objects, because If in these methods you change these objects, then some data will be rewritten with other threads.