Problem: I have a number of file uploads coming via HTTP in parallel ( uploads receiver ). I'm storing them temporarily on a local disk. Another process ( uploads submitter ) gets notified about new uploads and does specific processing ( parsing, extracting metadata, uploading to S3 etc ). Once upload processing done I want uploads receiver to be notified by submitter to reply back with status ( whether submission is ok or error ) to the remote uploader. Using ZeroMQ PUB/SUB pattern, what would be better:
While more recent versions may use PUB
-side topic filtering, the early ZeroMQ versions did use SUB
-side approach, which means that all the ( network ) message-transport traffic goes to all SUB
-s as an acceptable penalty for distributing the processing-workload, that would otherwise be needed to get handled at lowest possible latency on the PUB
-side.
This is important for cases, where in an open distributed system association the homogenity of versions is not enforceable.
Whereas you design architecture seems to be co-located on the same <localhost>
the performance impact remains non-distributed ( concentrated ) and may implicate just some limited latency/priority tweaking, if overall bottleneck appears during this Use-Case up-scaling.
As Martin Sustrik ( ZeroMQ co-father ) presented in details, ZeroMQ was designed with expected scales up to some small tens of thousands:
(cit.:) " Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and matchPUB/SUB
subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. "
Further details on design & scaling might be found interesting in this Martin's post.
A fair approach would be to mock-up each of the questioned approaches and benchmark them, scaled to { 1.0x , 1.5x, 2.0x, 5.0x } of the expected static scales in-vitro to have quantitatively supported data about real overheads, performance and latencies relevant to the alternative strategies under review.
Anyway, Vovan, enjoy the worlds of smart signalling/messaging in the distributed processing.