google-cloud-platformgoogle-cloud-firestoregoogle-cloud-functionsgoogle-cloud-dataflowgoogle-cloud-tasks

How to handle multiple deferred tasks as one?


There is one type of problem I'm facing in multiple projects that I haven't yet found a good solution for within the Google Cloud services.

I would like to queue/defer some work originating from user action, in such a way that the task handler processes multiple "duplicate" items as one.

For example, an email needs to be sent out when a user enables some feature, but also when the feature is disabled again. In the UI the feature can be easily toggled. I want to prevent sending multiple emails if the user is indecisive and happens to turn the toggle on and off multiple times in a short interval.

Ideally, I would place all these events in a queue like cloud tasks, tagging them with some sort of common identifier. Then, have a handler (probably triggered by a cron job), fetch a task, but then immediately fetch all other queued tasks that share that common identifier. This way the handler could figure out if eventually the feature was enabled or disabled, or maybe it was a no-op because first it was enabled and then disabled again. So only a maximum of 1 email is sent out at any cron cycle for tasks sharing the common identifier.

At least that's how I see it working in my mind. Possibly there's a better way.

I could roll my own solution using Firestore, as I have done in the past when I needed to queue items that had to be scheduled weeks from now, but I feel that in this case, the problem is common enough that there must be existing solutions for it.


Solution

  • As the question is about design advice, I can only provide my personal opinion. Especially taking into account limited details about non functional requirements, context, etc.

    The first criteria is a scale. If we are speaking about a few (dozen) events per minute (and the rate might have a significant cyclical fluctuations - i.e. none at night, and a peak in the evening), I would probably use a firestore and group of cloud functions (with some other GCP resources) - probably similar to your past experience. If we are speaking about a few hundreds events per minute (and the rate is more or less constant over 24 hours, and there is no cycles), I would think about Apache Beam (Cloud Dataflow). Based on your description - it might be a nice fit.

    The second criteria - development and support skills and costs - who and how is going to develop the solution and support it upon commissioning. If you (your company) have (has) plenty of Java (and Apache Beam) experience, I might lean to the Cloud Dataflow (as the majority of solutions are developed in Java there). As an additional minor benefit - it might be easier to find engineers of the market who already worked with the Apache Beam. On the other hand, if there is scarcity of such knowledge and resources, and if your software development experience is more with Go or Python - that might be an indicator to stay with the bespoke firestore and cloud functions solution.

    I am not sure that there is one correct design… It might be worth trying both approaches and compare if you have time/budget for it. Personally I can speculate that the firestore and cloud functions solution might be significantly cheaper (in terms of GCP costs), but more expensive in initial development… But again - I don’t know your priorities.