Creating Webhooks Notification System

Problem

Possibility to send a high number of emails without keeping own server for email delivery.

Goals and challenges:

  • Track user interaction with the email (up to 10K notifications per minute)
  • Reduce load on 3rd party APP that sends high number of emails daily
  • Webhook receiver’s downtime shouldn’t affect main application flow
  • System has to be scalable and guarantee to deliver during one day, when hook target isn’t available
  • Convenient and timely email delivery status reporting and displaying this info on a dashboard
  • Service is easy to use and customize to meet changing requirements
  • Availability and comprehensiveness of documentation

How WE helped

We've implemented hooks delivery system with Amazon Simple Notification Service (SNS) and proxy server that process subscribe and redirect requests. Little coding was required and responsibility of webhook delivery is assigned to external service.

Results:

  • Email delivery system with the user’s activity tracking feature that processes a high number of events (up to 20K notifications per minute)
  • Information about user interaction with the email is captured and stored
  • Redelivery solution is available when webhook receiver is unreachable
  • System is separated and do not affect main app services
  • Solution is easy to scale
  • 2 times higher level of throughput

Implementation process

To achieve project objectives we’ve consider several implementation options:

  • Naive implementation
  • Implementation via SQS Queue
  • Webhooks delivery SNS Notification

We have the following components that are responsible for events in our system:

  1. Event consumer
    Pull queue and decide what we need to do with events.
  2. Event storage
    Store all events in our system.
  3. 3rd party APP
    Have to be informed about events related to the APP account.

Naive implementation

The simple way to inform 3rd party API it’s just to make HTTP request to the hook URL directly from consumer, as displayed on the following diagram:

Pros:

  • Easy to implement

Cons:

  • Main application flow is affected, thus if a 3rd party APP won't respond, we will receive an error in our internal consumer
  • Сonsumer can not be scaled separately from webhooks which are sended to 3rd party APP
  • Impossible to implement webhooks redelivery correctly

Accordingly, we came to the conclusion that this approach will not give us expected results or satisfy needs in a full manner.
Therefore we’ve started to consider other options.

Implementation via SQS Queue

We use a lot of AWS services at that moment and already have a bunch of SQS Messaging Queues. Main idea for the solution it is to use seperate queue for hooks (webhooks) and consumer to process this Queue. Take a look at the schema that shows how SQS Queue works:

In this manner our main app flow does not depend on working state of external APP and we are able to implement retries via SQS Queue.

The flow of retry is like so/as follows:

  1. Pull webhooks events from Queue
  2. Send HTTP request with events in body to 3rd party APP
  3. If request completed successfully, then remove message from Q
  4. If request failed, retry with the specified delay and send message once again
  5. Set maximum receive count that equals delivery retry count

The flow is displayed on diagram below:

Pros:

  • Main application flow is not affected
  • Redelivery can be easily implemented
  • Independent main event consumer can be scaled

Cons:

  • More complex to implement
  • Max redelivery delay is 15 minutes (since max delay is 15 minutes in SQS Queues), thus we can’t make logarithmic retry for our messages
  • Logic of retries and responsibility for webhook delivery has to be managed manually

This option was a way better, however, not all of our primary goals were about to be met by doing implementation via SQS Queue.

Webhooks delivery by SNS Notification

Another way to implement webhooks delivery is using AWS SNS. The service developed for delivering notifications and can send them via HTTP, the same way as webhooks do. Also if notification delivery is failed, SNS is responsible for retry delivery and we can use linear or logarithmic function to manage retry frequency.

This solution will give us desirable outcome, however it can not be applied due to these 2 reasons:

  1. SNS send confirmation request to subscribe endpoint (webhooks destination)
  2. SNS provide redundant metadata to delivered messages

The SNS was designed for communication between different components within one system and not for communication among different components of different systems with different owners. Hence, we need a proxy component inside our system that SNS can communicate with.

Let's take a look at the scheme of proposed webhooks implementation using SNS:

Flow of hooks delivery using SNS:

  1. The consumer sends events to a certain SNS topic
  2. We subscribe 3rd party APP to the topic and URL should contain endpoint that we need a redirect for (http://proxy.example.com/http%3A%2F%2Fwebhook.endoint%2Ftarget)
    P.S.: we’ve encoded the URL with Base64 instead of simple URL encoding
  3. When the proxy receives a request for subscribing confirmation it will send a request to SNS with a proper confirmation token
  4. When the proxy receives a request with notification (an event in our case), it will format event data and redirect to URL that contains in URL path part
  5. All responses from hooks endpoints pass as response to SNS request

Schema of proxy API is shown below:

Pros:

  • Main application flow isn’t affected by the state of hooks receiver application
  • The responsibility for message delivery is shifted to SNS
  • Redelivery strategy is already implemented by AWS
  • The simple proxy has good throughput, very lightweight and can be scaled
  • Little of code is needed (less code, fewer bugs)

Cons:

  • The proxy has to be developed

Solution

After considering each implementation option in terms of benefit it will bring and required resources for its execution, we’ve chosen to go with the webhooks delivery by SNS Notification. With SNS and simple proxy server we’ve implemented hooks delivery system with minimum coding involved and shifted responsibility of webhook delivery to external service. As a result, we are able to send about 20 000 events per minute without any scaling, which is even higher compare to planned 10 000 events/min. The major bottleneck is the proxy, but we designed this one for scaling so throughput can be easily increased.

Additional info

We wrote our proxy using Node.js platform. The proxy hosted on EC2 t2.small instance and can process about 4000 concurrent requests with throughput about 400 req/sec. The proxy doesn’t have any interaction with the database since all data for redirect passed in URL. This gives us an option to do a horizontal scaling. If we place our proxy behind load balancer, then we will be able to add more server instances and as a result obtain 2 times higher throughput.

Vertical scaling won’t give us the same effect. When we scale node processes within one server instance we can get about 30% higher throughput for each additional processor kernel. Therefore horizontal scaling is more suitable and effective approach.

We don’t need to worry about scaling SNS because AWS does it automatically, we only have to send proper count of messages.