Previous incidents
We're experiencing longer than normal queue times on v4
Resolved Aug 20 at 08:44pm BST
Queue times are back to normal. We had an unprecedented number of new v4 runs. We've adjusted our autoscaling rules across multiple services to account for this. We are looking into how to avoid this happening as v4 scales up.
1 previous update
Run log failures and cascading API failures
Resolved Aug 01 at 01:29am BST
This was resolved at 00:29am. Logs and the API started recovering once a valid partition was in place.
1 previous update
Runs are missing from the dashboard and runs.list is degraded
Resolved Jul 23 at 03:45pm BST
The dashboard/runs.list is back to normal. We're working on and deploying multiple changes which will reduce and prevent these kind of issues from happening.
1 previous update
Runs are missing from the dashboard and runs.list is degraded
Resolved Jul 23 at 12:14am BST
The runs list is now fully operational. There is still missing data that we will be backfilling ASAP.
2 previous updates
Some runs list calls impacted by ClickHouse server crashes
Resolved Jul 18 at 04:26pm BST
We've opened a case with ClickHouse Cloud to try and understand why this happened.
2 previous updates
Batches with more than 20 runs are slow to process
Resolved Jul 15 at 03:15pm BST
Batches are processing as normal now. We have increased future capacity.
This was caused by a runaway loop of batches by a customer and this part of the system didn't have enough capacity to process them all fast enough.
We are updating how we process and rate-limit batches to prevent this from happening again, as well as improved internal alerts if similar issues happen in the future.
2 previous updates
Realtime not processing updates
Resolved Jul 13 at 05:37pm BST
Realtime is sending updates again. The attached storage stopped working and restarting the AWS task didn't work. A hard reset caused it to become healthy again.
We're looking into how to prevent this from happening again
1 previous update
Realtime not sending updates
Resolved Jul 04 at 10:34am BST
This is resolved – Realtime is sending updates again.
Restarting Electric released and reacquired the Postgres replication slot. We're discussing why this happened to try and prevent it in the future.
1 previous update