We came up with the solution and called it "Email Pipeline." It starts with running PHP script to go over all users and creates a bunch of async tasks. Each task contains information about:
- user IDs that should be processed
- what kind of operation needs to be done
It may happen that some users do not have any recommendations. In this case, we have to clean up their data on the third-party email service. The rest of them should be updated with new recommendations data. This diversification happens on the fly and group users based on the type of task.
The tasks creation process also has a time ramp up feature. Task processing jobs are heavy and time costly operations. To mitigate that, pipeline services run on Kubernetes' cluster with autoscaling, depending on their load. So, to make sure that autoscaler has enough time to ramp up the cluster size, we create tasks with ramp up time. We increase the rate of tasks being created in 1 hour. After that time, we spawn tasks as fast as we can as all services are scaled and ready to process increased loads created by task processing jobs.
While tasks are being created, workers start to pick them up and process. The process starts with defining the type of task. In case a user doesn't have any recommendations, it goes straight to a third-party email service and nulls recommendation data for this user. If the user has products recommendations data, then we need to fetch it first.