Previous incidents | Trigger.dev

Oct 2025 to Dec 2025

December 2025

Dec 02, 2025

1 incident

Realtime delays

Degraded

Resolved Dec 02 at 12:51am GMT

Realtime updates back to normal. We're monitoring recovery.

1 previous update

Dec 01, 2025

1 incident

open telemetry logs and spans ingestion issues

Degraded

Resolved Dec 02 at 11:18am GMT

We've published a full post-mortem on this incident here: https://trigger.dev/blog/clickhouse-too-many-parts-postmortem

3 previous updates

November 2025

Nov 28, 2025

1 incident

Dashboard unreliable as we work through clickhouse issues

Degraded

Resolved Nov 28 at 07:54pm GMT

We have recovered the clickhouse instance and the dashboard is responsive again and serving queries and data ingestion is back online. There has been some loss of otel data during this downtime but we don't know the extent of it at this moment as we continue to recover and investigate.

1 previous update

Nov 23, 2025

1 incident

Realtime is down

Downtime

Resolved Nov 23 at 12:38pm GMT

Realtime is working again. ElectricSQL was only working for the first 10 seconds after the server came up. We updated some settings around the startup sequencing that means it is now working properly and are working with the Electric team to determine why this was happening.

1 previous update

October 2025

Oct 28, 2025

1 incident

Deployments failing due to upstream provider

Degraded

Resolved Oct 28 at 09:20pm GMT

Depot resolved the issue on their side. Deployments are now working normally again.

1 previous update

Oct 27, 2025

1 incident

us-east-1 slow dequeues for large machines

Degraded

Resolved Oct 27 at 09:21pm GMT

The issue is now resolved. We scaled up massively to absorb a load spike from one of our customers. Mainly runs on large machines where impacted.

1 previous update

Oct 22, 2025

1 incident

Dashboard runs list is delayed

Degraded

Resolved Oct 22 at 12:29pm BST

We are still working with clickhouse cloud on identifying the root cause of the issue but have pushed out a fix that has allowed the runs list to come back online and show live results in the meantime. We suspect the issue is related to our clickhouse server being "rotated" into another AZ which caused network degradation and cause our replication pipeline from postgresql to clickhouse to fall behind and unable to keep up with changes. The fix was to stop sending payload data to clickhouse wh...

2 previous updates

Oct 20, 2025

4 incidents

Deploys are still impacted by the us-east-1 outage

Degraded

Resolved Oct 20 at 10:00pm BST

Our remote build provider Depot.dev was able to fully recover after ongoing EC2 instance issues in us-east-1 - AWS had several regressions and limited service availability throughout the day.

1 previous update

Runs list and dashboard logs are impacted by the AWS us-east-1 outage

Degraded

Resolved Oct 20 at 06:12pm BST

Our clickhouse instance is back online and serving queries. The runs list is now working but some requests to clickhouse syncing run state are queued and waiting to finish. We should be all caught up in about 5-10 minutes.

1 previous update

Realtime is slow to update

Degraded

Resolved Oct 20 at 02:11pm BST

Realtime is now back to normal operation, serving live updates. We'll continue to monitor the situation.

2 previous updates

AWS outage in us-east-1 is causing service disruption

Degraded

Resolved Oct 20 at 10:33am BST

AWS us-east-1's latest update has stated that they are seeing "significant signs of recovery" and we're seeing the same in our worker cluster, as now image pulls are working and both cold and warm starts are now processing as normal. We'll continue to closely monitor the situation.

2 previous updates

Oct 05, 2025

1 incident

us-east-1 slow dequeues for large machines

Degraded

Resolved Oct 05 at 06:14pm BST

Runs on large-2x machines were processing more slowly due to incredibly high volume. One of our customers caused an infinite loop in their large machine tasks which kept triggering more runs. Runs on small and medium machines were unlikely to have been impacted by this.

1 previous update

Oct 02, 2025

1 incident

us-east-1 slow dequeues for cold starts

Degraded

Resolved Oct 02 at 07:56pm BST

This resolved shortly after a problematic control plane rollout.

1 previous update