Each user from the marketplace of 10 million users has an individual set of products that are recommended for purchase. Actual up-to-date data on products recommended for purchase must be collected, stored, and delivered to each user in a “Resources you may like” (RYML) email. A way to generate RYML emails with the relevant content and send them to users on a regular basis has to be developed.
Goals and challenges
- Gather actual up-to-date lists of products recommended for purchase for each user.
- This data must be stored and then delivered to all subscribed users via “Resources you may like” (RYML) emails on a regular basis.
- Delegate email creating and sending sequence functions to email sending service.
- Maintain business logic of collecting recommended data and providing it to the email sending service.
How Wise Engineering helped
The Recommendations Email Generation Pipeline solution was developed. It allows us to gather accurate information about products recommended for purchase for each user and store this info afterward for a needed period of time. Results
- Information about recommended products is being accurately collected.
- Email Pipeline solutions gives an ability to transfer collected information regarding 10 million users to the email sending service within only 5 hours.
- Email sending service gets the information if a particular user has any recommended products, and can easily send that list in email messages on a regular basis.
There is a marketplace with 10 million users and each user has specific products that are recommended for their purchase. This information must be collected and delivered to the user on a regular basis.'
To begin with, we focused on how to orchestrate composition of microservices and to queue extracted products’ recommendations data. Next, we’ve evaluated options for transforming this data into a format that is readable for third-party services. Subsequently, we’ve designed a way to load this data into an email sending service.
The following tools and techniques were applied to achieve these goals: 1. PHP script to go over all users to group them and chunk by task type (if it has recommendation or not). After that, assigned user ID chunks into queue. 2. PHP workers process the queue with user’s ID data and retrieves recommendations for each of them. 3. Elixir Graph API application exposes data on recommended products to PHP workers. Next, it processes products’ recommendation data and applies some internal rules and limitations on it. 4. Python REST API application generates product recommendation data for a particular user ID on demand.
We came up with the solution and called it “Email Pipeline.” It starts with running PHP script to go over all users and creates a bunch of async tasks. Each task contains information about:
- user IDs that should be processed
- what kind of operation needs to be done
It may happen that some users do not have any recommendations. In this case, we have to clean up their data on the third-party email service. The rest of them should be updated with new recommendations data. This diversification happens on the fly and group users based on the type of task.
The tasks creation process also has a time ramp up feature. Task processing jobs are heavy and time costly operations. To mitigate that, pipeline services run on Kubernetes’ cluster with autoscaling, depending on their load. So, to make sure that autoscaler has enough time to ramp up the cluster size, we create tasks with ramp up time. We increase the rate of tasks being created in 1 hour. After that time, we spawn tasks as fast as we can as all services are scaled and ready to process increased loads created by task processing jobs.
While tasks are being created, workers start to pick them up and process. The process starts with defining the type of task. In case a user doesn’t have any recommendations, it goes straight to a third-party email service and nulls recommendation data for this user. If the user has products recommendations data, then we need to fetch it fir
To fetch this data, we call Elixir Graph API service which represents a middleware between tasks, or in our case, frontend and recommendation storage. Recommended products data itself is stored in a non-precalculated state hold by Python microservice. This service can multiply matrices on the fly and calculate scores for possible product recommendations and return the data to the caller. After Graph API calls and receives data from Python microservice, the data needs to be processed and cleaned up as this service just keeps the list of recommended products and doesn’t know anything about the user. The cleanup process means removing items that were bought by a user already, were wishlisted, added to the cart, or even seen somewhere else on the site. This information is accessible by Graph API so we can easily clean up recommendations we don’t want to be served to the user next time.
We now have information about the set of products recommended to each user in the task, regardless whether it is nulled data or new recommendations. The only step left is to transfer this data to a third-party email sending service. We will use API of the third-party email sending service to perform this.
As a result, the email sending service has the information if a particular user has any recommended products, list of recommended products if there are any, and can easily send that list in email messages on a regular basis.
The Recommended Products Emails Generation Pipeline gives the ability to transfer collected information about relevant recommended products concerning 10 million users to the email sending service in 5 hours. This allows us to send “Resources You May Like” emails on a daily basis. All tasks that are related to email composing have been delegated to a third party email sending service. This gives us an opportunity to send emails to all the users with one click after the info on recommended products has been imported.