Recommendations Email Generation Pipeline

Problem

Each user from the marketplace of 10 millions users has an individual set of products recommended to purchase. Actual an up-to date data on products recommended to purchase have to be collected, stored delivered to each user in “Resources you may like” (RYML) email. The way to generate RYML emails with the relevant content and send them to users on a regular basis have to be developed.

Goals and challenges

  • Gather actual an up-to date lists of products recommended to purchase for each user
  • This data have to be stored and then delivered to all subscribed users via email “Resources you may like” (RYML) on a regular basis basis
  • Delegate email creating and sending sequences function to email sending service
  • Maintain business logic of getting recommended data and providing it to the email sending service

How Wise Engineering helped

Recommendations Email Generation Pipeline solution was developed. It allows us to gather accurate information about products recommended for purchase for each user and store this info afterward for a needed period of time.

Results

  • Information about recommended products is being accurately collected
  • Email Pipeline solution gives an ability to transfer collected information regards 10 million users to the email sending service within only 5 hours
  • Email sending service gets the information if particular user has any recommended products, list of recommended products if there are any and can easily send that list in email messages on a regular basis

Background

There is a marketplace with the 10 million users and each user has specific products that are recommended for purchase. This information has to be collected and delivered to the user on a regular basis.

Implementation process

To begin with, we focused on how to orchestrate composition of microservices and queue to extract products recommendations data. Next, we’ve evaluated options of transforming this data into a format that is readable for third-party services. Afterward, we’ve designed a way how to load this data has to email sending service.

The following tools and techniques were applied to achieve those goals:

  1. PHP script to go over all users to group them and chunk by task type (has recommendation or not). After that, put user IDs chunk into queue.
  2. PHP workers to process queue with user’s ID data and get recommendations for each of them.
  3. Elixir Graph API application to expose data on recommended products to PHP workers. Next, process products recommendation data and apply some internal rules and limitation on it.
  4. Python REST API application to generate product recommendation data for particular user ID on demand.

Solution

We came up with the solution and called it “Email Pipeline”. It starts with running PHP script to go over all users and create a bunch of async tasks. Each task contains information about:

  • user IDs that should be processed
  • what kind of operation needs to be done

It may happen that some users do not have any recommendations. In this case, we have to clean up their data on the third party email service. Rest of them should be updated with new recommendations data. This diversification happens on the fly and groups users based on the type of task.

Tasks creation process also has time ramp up feature. Task processing job is pretty heavy and time costly operation. To mitigate that, pipeline services run on Kubernetees cluster with autoscaling depending on their load. So to make sure that autoscaler has enough time to ramp up the cluster size we create tasks with rump up time. We increase the rate of tasks being created in 1 hour. After that time we spawn tasks as fast as we can as all services are scaled and ready to process increased load created by task processing jobs.

While tasks are being created, workers start to pick them up and process. The process starts with defining the type of the task. In case, user doesn’t have any recommendations, it goes straight to third-party email service and null recommendation data for this user. If the user has products recommendations data, then we need to fetch it first

To fetch this data we call Elixir Graph API service which represents some kind of middleware between tasks or frontend and recommendation storage in our case. Recommended products data itself is stored in non-precalculated state hold by Python microservice. This service can multiply matrices on the fly and calculate scores for possible product recommendations and return the data to the caller. After Graph API called and received data from Python microservice, data needs to be processed and cleaned up as this service just keeps the list of recommended products and doesn’t know anything about the user. Clean up process means removing items that were bought by a user already or were wishlisted, added to the cart or even seen somewhere else on the site. This information is accessible by Graph API so we can easily clean recommendations we don’t want to be served to the user next time.

We now have information about the set of products recommended to each user in the task, regardless whether it is nulled data or new recommendations. The only step left - transfer this data to a third-party email sending service. We will use API of third-party email sending service to perform this.

As a results, email sending service has the information if particular user has any recommended products, list of recommended products if there are any and can easily send that list in email messages on a regular basis.

Results

Recommended Products emails generation pipeline gives the ability to transfer collected information about relevant recommended products concerning 10 million users to the email sending service in 5 hours. This allows us to send “Resources You May Like” emails on a daily basis. All tasks that are related to emails composing have been delegated to 3rd party email sending service. This gives us an opportunity to send emails to all the users in one click after the info on recommended products has been imported.