[08:06:39] interesting thing - ml-serve-codfw alterted about api latencies, the issue seems to be related to some knative controllers getting stuck and being slow. I am pretty sure these are bugs of our version, too old (0.18.1), we need to get to a newer one asap (so k8s 1.23 asap as well :D) [08:09:40] it doesn't happen in eqiad, so I suspect it is due to "manual" configs (I usually test and re-create pods in there, it is the only diff with eqiad) [08:10:05] see https://grafana.wikimedia.org/d/000000435/kubernetes-api?var-site=codfw&var-cluster=k8s-mlserve&orgId=1&from=1668843647063&to=1668845385430 [08:10:14] this is after deleting some knative pods