Significant disruption to run starts (runs stuck in queueing)
Resolved
Mar 07 at 09:15pm GMT
We are confident that most queues have caught up again but are still monitoring the situation.
If you are experiencing unexpected queue times this is most likely due to plan or custom queue limits. Should this persist, please get in touch.
Affected services
Trigger.dev API
Updated
Mar 07 at 08:56pm GMT
The service is stable again and metrics are looking good. We're still catching up with a backlog of runs. You may see increased queue times until this is fully resolved. We'll keep you updated.
Affected services
Trigger.dev API
Updated
Mar 07 at 08:16pm GMT
We managed to clear the huge backlog of VMs that were completed but hadn't been cleaned up like they normally are. This was causing a lot of issues, including the initial drop in runs starting and another 8–8.13pm UTC.
There's a backlog of runs and it will unfortunately take a bit of time for everything to catch up. We're actively monitoring the situation and doing what we can to improve throughput.
Affected services
Trigger.dev API
Updated
Mar 07 at 07:30pm GMT
Runs are executing at the normal speed (from 7.30pm UTC). There is a backlog of runs to work through.
There's still a problem we're looking at where the completed run VMs aren't being cleared properly. We think that's the cause of this issue in the first place. Hasn't happened before and there have been no code changes.
Affected services
Trigger.dev API
Updated
Mar 07 at 07:17pm GMT
A significant proportion of runs are not starting. There is an issue in our worker cluster and we are trying to diagnose the issue.
There have been no deployments today. We're unsure at this point if this is a cloud provider issue or not.
Affected services
Trigger.dev API
Created
Mar 07 at 06:48pm GMT
There is an issue in the worker cluster we are investigating.
This is causing runs not to start quickly.
Affected services
Trigger.dev API