algorithmbackground-processmessagebrokerarchitect

How to process a logic or job periodically for all users in a large scale?


I have a large set of users in my project like 50m.

I should create a playlist for each user every day, for doing this, I'm currently using this method:

I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.

I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at

After that, I run a foreach on the result and publish a message for each in my message-broker.

Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.

If my message-broker messages list was empty, I will repeat these steps (While - Do-While)


I think the processing method is good enough and can be scalable as well, but the method we use to create the message for each user is not scalable!

How should I do to dispatch a large set of messages for each of my users?


Solution

  • Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.

    Although this is a design question and there are multiple solutions, here's how I would solve it.

    First up, think of updating the playlist for a user as a job.

    Now, in your case this is a scheduled Job. ie. once a day.

    1. So, use a scheduler to schedule the next job time.
    2. Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
    3. Generate the playlist for the user based on the job. Create a Schedule event for the next day.
    4. You could persist Scheduled Job data just to avoid race conditions.