Intermittent DNS failures affecting some run executions
Resolved
Jan 23 at 04:19am GMT
Full service has been restored. Task execution is back to normal. If you experienced failures between 01:37 and 04:19 UTC, those runs can be retried successfully now.
What happened: During a period of high activity, a backlog of completed runs built up faster than our cleanup processes could handle, which put pressure on internal services and caused intermittent failures.
What we did: We spun up additional cleanup capacity to clear the backlog and restore normal operation.
What we're doing next: We're increasing resource limits on critical internal services and adding better alerting so we can catch this earlier if it happens again.
Affected services
Created
Jan 23 at 01:37am GMT
We are experiencing intermittent issues that may cause some task runs to fail. Automatic retries are in place and should recover most affected runs. Our team is actively working on resolution.
Affected services