Queue Backlog Notification
Incident Report for Mergent
Postmortem

We encountered an unexpected surge in task submissions, leading to a significant backlog in our task queues. This unusual influx resulted in delays and a build-up of uninvoked tasks.

What Happened

The backlog was initially detected at 12:41PM PT (8:41PM UTC) when our monitoring systems alerted us.

Upon investigation, we found that a combination of a sudden increase in task submissions and a bottleneck in our processing capabilities led to the backlog.

Response and Resolution

In response, we provisioned an additional 200 worker servers between AWS, Render, and Heroku — a new, temporary addition to our existing infrastructure on AWS & Render. This expansion of our processing capacity allowed us to work through the backlog effectively.

Though distributing worker servers across three cloud providers was meant to be a temporary solution, we are evaluating the feasibility of maintaining this setup for the future.

In addition, a new component, “Task Execution” has been added to our status page to track task execution status independently from API status.

We apologize for any inconvenience and appreciate your patience. Our commitment to providing the world’s most reliable task queue remains our top priority.

Thank you for your continued support.

Mergent Team

Posted Dec 05, 2023 - 02:29 UTC

Resolved
This incident has been resolved.
Posted Dec 05, 2023 - 00:44 UTC
Monitoring
A fix has been implemented, and our estimates show that the backlog will be clear within the next 7 minutes.
Posted Dec 05, 2023 - 00:28 UTC
Update
We are continuing to work on a fix for this issue.
Posted Dec 04, 2023 - 23:52 UTC
Identified
We have detected an unusual surge in task submissions that has led to a backlog in our processing queues. Our team is actively investigating the cause of this surge and implementing measures to handle the increased load more efficiently.

This does not affect the API or the dashboard.
Posted Dec 04, 2023 - 20:41 UTC