Dashboard runs list is delayed
Resolved
Oct 22 at 12:29pm BST
We are still working with clickhouse cloud on identifying the root cause of the issue but have pushed out a fix that has allowed the runs list to come back online and show live results in the meantime. We suspect the issue is related to our clickhouse server being "rotated" into another AZ which caused network degradation and cause our replication pipeline from postgresql to clickhouse to fall behind and unable to keep up with changes. The fix was to stop sending payload data to clickhouse which is not currently being used but we planned to in the future. Turning it off for now while we investigate the underlying cause seems like a good tradeoff and we suspect will fix this issue going forward.
Affected services
Updated
Oct 22 at 11:26am BST
We're working with Clickhouse Cloud to find the root cause of the issue, in the meantime we are working on a temporary mitigation which should hopefully help. There may be some missing runs from the dashboard temporarily, but runs will be backfilled into clickhouse after the issue has been addressed.
Affected services
Created
Oct 22 at 10:23am BST
We're currently experiencing an issue with our Clickhouse cluster that is causing runs list in the dashboard and via the runs.list()
endpoint to return stale data as inserts into clickhouse have degraded p95 latencies. We're investigating the issue. Runs executions are uneffected.
Affected services