[07:06:27] (03PS4) 10AikoChou: reference-quality: add reference-risk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) [07:13:48] isaranto: ack! [07:13:56] morning folks :) [07:43:35] (03PS1) 10AikoChou: locust: add reference-risk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077310 (https://phabricator.wikimedia.org/T372405) [07:45:18] (03PS2) 10AikoChou: locust: update load testing result for reference_quality [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077069 [09:07:58] (03CR) 10Kevin Bazira: "Thank you for working on this, Aiko. I've tested the ref-risk isvc locally, and it ran successfully:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou) [09:19:05] Morning! [09:39:28] (03CR) 10Kevin Bazira: [C:03+1] locust: update load testing result for reference_quality [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077069 (owner: 10AikoChou) [10:40:34] * aiko lunch! [11:26:40] ditto [13:50:17] (03PS1) 10Kevin Bazira: article-country: containerize model-server [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897) [13:55:15] (03CR) 10AikoChou: "I had another round of review and have a few more suggestions for your implementation. I also resolved some of Ilias's comments that I bel" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1075033 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [14:01:47] sorry 5 mins late [14:02:34] (03CR) 10Kevin Bazira: "I built the model-server image locally and the largest layer is ~495MB as shown here:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077391 (https://phabricator.wikimedia.org/T371897) (owner: 10Kevin Bazira) [15:12:55] (03CR) 10AikoChou: reference-quality: add reference-risk model (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) (owner: 10AikoChou) [15:13:55] (03CR) 10AikoChou: [C:03+2] locust: update load testing result for reference_quality [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077069 (owner: 10AikoChou) [15:14:09] (03CR) 10AikoChou: [V:03+2 C:03+2] locust: update load testing result for reference_quality [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1077069 (owner: 10AikoChou) [16:00:45] (03PS5) 10AikoChou: reference-quality: add reference-risk model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1076163 (https://phabricator.wikimedia.org/T372405) [17:09:44] FIRING: LiftWingServiceErrorRate: ... [17:09:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=fiwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [17:34:44] RESOLVED: LiftWingServiceErrorRate: ... [17:34:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=fiwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:54:14] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [23:54:14] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn