Previous incidents

November 2025
No incidents reported
October 2025
Oct 22, 2025
1 incident

Dashboard runs list is delayed

Degraded

Resolved Oct 22 at 12:29pm BST

We are still working with clickhouse cloud on identifying the root cause of the issue but have pushed out a fix that has allowed the runs list to come back online and show live results in the meantime. We suspect the issue is related to our clickhouse server being "rotated" into another AZ which caused network degradation and cause our replication pipeline from postgresql to clickhouse to fall behind and unable to keep up with changes. The fix was to stop sending payload data to clickhouse wh...

2 previous updates

Oct 20, 2025
4 incidents

Deploys are still impacted by the us-east-1 outage

Degraded

Resolved Oct 20 at 10:00pm BST

Our remote build provider Depot.dev was able to fully recover after ongoing EC2 instance issues in us-east-1 - AWS had several regressions and limited service availability throughout the day.

1 previous update

Runs list and dashboard logs are impacted by the AWS us-east-1 outage

Degraded

Resolved Oct 20 at 06:12pm BST

Our clickhouse instance is back online and serving queries. The runs list is now working but some requests to clickhouse syncing run state are queued and waiting to finish. We should be all caught up in about 5-10 minutes.

1 previous update

Realtime is slow to update

Degraded

Resolved Oct 20 at 02:11pm BST

Realtime is now back to normal operation, serving live updates. We'll continue to monitor the situation.

2 previous updates

AWS outage in us-east-1 is causing service disruption

Degraded

Resolved Oct 20 at 10:33am BST

AWS us-east-1's latest update has stated that they are seeing "significant signs of recovery" and we're seeing the same in our worker cluster, as now image pulls are working and both cold and warm starts are now processing as normal. We'll continue to closely monitor the situation.

2 previous updates

Oct 05, 2025
1 incident

us-east-1 slow dequeues for large machines

Degraded

Resolved Oct 05 at 06:14pm BST

Runs on large-2x machines were processing more slowly due to incredibly high volume. One of our customers caused an infinite loop in their large machine tasks which kept triggering more runs. Runs on small and medium machines were unlikely to have been impacted by this.

1 previous update

Oct 02, 2025
1 incident

us-east-1 slow dequeues for cold starts

Degraded

Resolved Oct 02 at 07:56pm BST

This resolved shortly after a problematic control plane rollout.

1 previous update

September 2025
Sep 26, 2025
1 incident

us-east-1 runs are slow to start executing

Degraded

Resolved Sep 26 at 06:35am BST

Service in us-east-1 has fully recovered

2 previous updates

Sep 24, 2025
1 incident

Large machines are slow to dequeue runs

Degraded

Resolved Sep 24 at 05:30pm BST

Larger machines are now dequeuing faster.

We are going to change the packing algorithm we use which means this will be much less likely to happen in the future. That work will begin this week.

1 previous update

Sep 15, 2025
2 incidents

Slower dequeuing and API responses

Degraded

Resolved Sep 15 at 06:40pm BST

The database load is back to normal.

We're still investigating why this happened, the top theory at the moment is a an auto-vacuuming issue possibly to do with transaction wraparound.

1 previous update

V4 runs are slow to dequeue in the us-nyc-3 region

Degraded

Resolved Sep 15 at 01:45pm BST

This issue was caused by our us-nyc-3 cloud provider taking an abnormally long time to spin up new servers and capacity issues, along with some runs getting stuck after restoring from a snapshot and not completing in under 2 minutes, which also mostly happened in us-nyc-3.

2 previous updates

Sep 10, 2025
1 incident

eu-central-1 runs are slow to dequeue

Degraded

Resolved Sep 10 at 06:18pm BST

We were seeing crashes on multiple servers in the EU only. It was related to an out of memory issue in our “supervisor” which meant some servers weren’t dequeuing consistently. We’ve changed some settings to allow for more memory. We're still investigating why these supervisor processes were using up too much memory and crashing, and we're monitoring the situation. us-east-1 runs have not been impacted.

1 previous update