Run log failures and cascading API failures
Resolved
Aug 01 at 01:29am BST
This was resolved at 00:29am. Logs and the API started recovering once a valid partition was in place.
Affected services
Trigger.dev cloud
Trigger.dev API
Trigger.dev OpenTelemetry
Realtime
Created
Aug 01 at 01:00am BST
Runs created after midnight August 1st UTC failed to get run logs created. For new runs this lasted from 00:00 – 00:29am and impacted logs for new runs created in that time.
It caused a spike in open database connections which led to some other API failures during this time, especially from 00:21 – 00:29am.
This issue was caused by an invalid table partition for the run logs table, which meant inserts failed. We are looking at how we can prevent this from happening again in future.
Affected services
Trigger.dev cloud
Trigger.dev API
Trigger.dev OpenTelemetry
Realtime