spring-batchspring-integrationspring-batch-integration

How to processing multiples large files at same time with multiples instances using Spring Batch Integration?


I created a Spring Batch Integration project for process multiples files and it is working like a charm.

While I'm writing this question I have four Pods running, but the behaviour isn't like I'm expecting, I expect 20 files being processing at the same time (five per Pod).

My pooler setup is using the following parameters:

    poller-delay: 10000
    max-message-per-poll: 5

I also using Redis for store the files and filter:

    private CompositeFileListFilter<S3ObjectSummary> s3FileListFilter() {
        return new CompositeFileListFilter<S3ObjectSummary>().addFilter(
                new S3PersistentAcceptOnceFileListFilter(new RedisMetadataStore(redisConnectionFactory), "prefix-"))
                .addFilter(new S3RegexPatternFileListFilter(".*\\.csv$"));
    }

Seems like each pod is processing only one file and also another strange behaviour is like one of the pods register all the files in the Redis, so the others Pods only get new files.

How is the best practice and also how to solve that for processing multiples files at the same time?


Solution

  • See this option on the S3InboundFileSynchronizingMessageSource:

    /**
     * Set the maximum number of objects the source should fetch if it is necessary to
     * fetch objects. Setting the
     * maxFetchSize to 0 disables remote fetching, a negative value indicates no limit.
     * @param maxFetchSize the max fetch size; a negative value means unlimited.
     */
    @ManagedAttribute(description = "Maximum objects to fetch")
    void setMaxFetchSize(int maxFetchSize);
    

    And here is the doc: https://docs.spring.io/spring-integration/docs/current/reference/html/ftp.html#ftp-max-fetch