javamultithreadingperformancearchitectureapplication-design

File synchronizer architecture


I have to make a file synchronizer: an application that essentially synchronizes H24 a large amount of data files from many systems outside to my local system using essentially FTP, SFTP and NFS.

The streams are more than twenty, for each of them the logic is slightly different and it must be configurable. One of the requirements is that if one of the streams for some reason falls down it must be possible to retrieve it on without restarting the entire system.

Another requirement is that the transfer rate is balanced. In other words, there must not be a stream or a part of them synchronized and another stream 10 hours late

I have some perplexity about architecture to be realized: if I realize a single multithread system I would have a very high thread count (more than 100 I would say) and make it complicated by fulfilling the two requirements outlined above.

I was thinking of realizing several processes or different instances of the same process even if It seems a little "ugly" .. so in this way some load balancing would be done by the operating system and it would be simpler to kill or to start a flow ..Perhaps even performance might be better as several processes could use much more ram Someone has any tips/advice? Thanks a lot and sorry for my poor english. Gian


Solution

  • As @kayaman said, 100 threads is not a lot. If that means 100 threads per unit of work and you will have many units of work which would imply many magnitudes increase in threads, I would suggest having a look at Fibers

    As long as you don't block the fibers, you can have 100000+ fibers running over a couple (typically number of CPU cores) of threads. Each fiber would then just wait for a callback from the process before continuing.

    To access your endpoints and handle them in similar ways, have a look at Apache Camel - it will allow you to stream the FTP, SFTP, etc and handle each as just another endpoint (in theory you should be able to plug email in as well and stream packets that are emailed to the endpoint)

    Regarding balancing the streams, this is business logic you need to implement. If one stream is receiving packets faster than another stream, you should be able to limit the rate by not requesting more packets under certain conditions. Need some more information on how you retrieve the packages and which libraries you are using in order to be of better assistance here.