[14:36:44] FIRING: LiftWingServiceErrorRate: ... [14:36:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=fiwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [18:36:44] FIRING: LiftWingServiceErrorRate: ... [18:36:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=fiwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [18:55:57] firiiiiiing [19:31:44] RESOLVED: LiftWingServiceErrorRate: ... [19:31:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=fiwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:48:44] I had to do a dummy commit to be able to redeploy the service which resolved the issue [19:49:28] CPU utilization reached 100% [19:52:20] I attached an incident record to the related task https://phabricator.wikimedia.org/T363336#10204784 [19:59:51] 10Lift-Wing, 06Machine-Learning-Team: Set up a livenessProb to kill throttled isvcs - https://phabricator.wikimedia.org/T376543 (10isarantopoulos) 03NEW [20:00:35] also added the above task as a conversation started so we can discuss a solution (maybe not this one) [20:00:37] * isaranto afk!