spring-bootamazon-s3spring-integrationpcf

How to optimally process large number of files stored in S3 bucket using spring integration app deployed in pcf?


We have application which is developed using classic spring mvc(4.x) and spring integration framework and deployed in weblogic(we didnt had other option) which polls for files in nfs mount and process them.

Now we have been asked to move away from weblogic and use PCF as the platform. Problem we have now is PCF is not configured to use Volume service so we have to poll file from S3 like storage and process it.

Since pcf app has max 2gb disk space, and 2gb memory (Organization limitation) limit, what is the optimal way to process these files stored in S3 without downloading it or if we have to download, how can we optimize it.

Note : As part of processing these files which are zipped, we have to extract some files and re upload it back to S3 and some we have to reject it.


Solution

  • To avoid copying locally consider to use a Streaming Inbound Channel Adapter: https://github.com/spring-projects/spring-integration-aws/#streaming-inbound-channel-adapter

    @Bean
    @InboundChannelAdapter(value = "s3Channel", poller = @Poller(fixedDelay = "100"))
    public MessageSource<InputStream> s3InboundStreamingMessageSource() {    
        S3StreamingMessageSource messageSource = new S3StreamingMessageSource(template());
        messageSource.setRemoteDirectory(S3_BUCKET);
        messageSource.setFilter(new S3PersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
                                   "streaming"));       
        return messageSource;
    }
    

    See more info about streaming channel adapters logic in the main Spring Integration docs: https://docs.spring.io/spring-integration/reference/html/ftp.html#ftp-streaming