All services are online

Last updated on Jun 14 at 09:49am BST cloud
30 days ago
Today API
30 days ago
Today OpenTelemetry
30 days ago

Previous incidents

Jun 13, 2024

v3 runs are paused due to network issues


Resolved Jun 13 at 01:20pm BST

Runs are operating at full speed.

We think this issue was caused by the clean-up operations that clear completed pods. There are far more runs than a week ago, so that list can get very large causing a strain on the system including internal networking. We've increased the frequency and are monitoring the load including networking. After 15 mins everything seems normal.

2 previous updates

Jun 12, 2024

v3 runs have stopped


Resolved Jun 12 at 11:10pm BST

v3 runs are now executing again.

Networking was down because of an issue with BPF. While networking was down tasks couldn't heartbeat back to the platform. If the platform doesn't receive a heartbeat every 2 mins then a run will fail. Less than 500 total runs were failed because of this.

You can filter by status "System Failure" in the runs list to find these and then bulk replay them by selecting all, move to the next page and select all again. You can replay them using the bottom bar.

1 previous update

Jun 10, 2024

v2 runs are slower than normal to start


Resolved Jun 10 at 01:30pm BST

v2 p95 start times have been under 2s for 10 mins, so resolving this issue.

We think this is because there are a lot of schedules that send an event at midday UTC on a Monday. We're looking into what we can do about that.

2 previous updates