spring-integrationsmb

Discarding overaged remote files using Spring Integration SMB inbound adapter


My application has to download files from a remote file server in regular time intervals. My implementation is based on Spring Integration SMB and generally works as expected (see the listing below).

However, some infrastructure and runtime conditions make things a bit more complicated: My application has neither control over when files are deleted from the remote file server nor from the local directory. So there is a certain chance that files will be downloaded multiple times, thus unnecessarily wasting network and storage resources.

On the short run, I can fix this with an AcceptOnceFileListFilter. But for reasons of technical simplicity, I don't want to use a metadata store. So the AcceptOnceFileListFilter will not survive a system restart, which is kind of a problem on the long run.

It would be acceptable for my use case, if polling a file multiple times could be limited to remote files that have not yet reached a given age (like several days), based on the last modified date and time.

So what I'm looking for is basically something like the opposite of a SmbLastModifiedFileListFilter (which discards files if they are too young). Is there a (simple) way to achieve this?

If the required functionality could even be combined with the existing SmbLastModifiedFileListFilter to filter on a time interval, this would be even better.

@Bean
public IntegrationFlow integrationFlow() {
    return IntegrationFlow.from(
                    Smb.inboundAdapter(smbSessionFactory())
                            .remoteDirectory("/myRemoteDirectory")
                            .filter(fileListFilter())
                            .localDirectory(new File("/myLocalDirectory"))
                            .autoCreateLocalDirectory(true)
                            .preserveTimestamp(true),
                    e -> e.poller(Pollers.fixedDelay(60000).maxMessagesPerPoll(-1))
            )
            .channel(new NullChannel()) // Discard payload
            .get();
}

@Bean
public CompositeFileListFilter<SmbFile> fileListFilter() {
    final var filters = new CompositeFileListFilter<SmbFile>();
    filters.addFilter(new AcceptOnceFileListFilter<>());
    filters.addFilter(new SmbRegexPatternFileListFilter(".*\\.txt"));
    filters.addFilter(new SmbLastModifiedFileListFilter(30));
    // another "reverse" SmbLastModifiedFileListFilter required here
    return filters;
}

Solution

  • So, you are looking for what would discard files which are too old. Essentially, similar logic to the SmbLastModifiedFileListFilter, but negated.

    Therefore you just can implement such a filter yourself:

    public class SmbRecentFileListFilter implements FileListFilter<SmbFile> {
    
        private Duration age = Duration.ofDays(1);
    
        public SmbRecentFileListFilter() {
        }
    
        public SmbRecentFileListFilter(long age) {
            this.age = Duration.ofMinutes(age);
        }
    
        @Override
        public List<SmbFile> filterFiles(SmbFile[] files) {
            List<SmbFile> list = new ArrayList<>();
            Instant now = Instant.now();
            for (SmbFile file : files) {
                if (!fileIsAged(file, now)) {
                    list.add(file);
                }
            }
            return list;
        }
    
        private boolean fileIsAged(SmbFile file, Instant now) {
            return getLastModified(file).plus(this.age).isBefore(now);
        }
    
        private static Instant getLastModified(SmbFile remoteFile) {
            return Instant.ofEpochSecond(remoteFile.getLastModified() / 1000);
        }
    
    }
    

    (Didn't tested though).

    I guess we can improve AbstractLastModifiedFileListFilter in the framework with a recent option to add a logic to skip those old files. Feel free to raise a GH issue on the matter!