[05:54:08] Good morning! No alerts this weekend \o/ [06:06:27] first deployment for "dummy" articlequality model https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1046133 [08:57:02] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: memory errors during boot for ml-staging2001.codfw.wmnet - https://phabricator.wikimedia.org/T366670#9897295 (10klausman) Tuesday sounds good. I'll drain and shutdown the machine on Tuesday 17:00 CEST/15:00 UTC/10:00CDT, does that w... [09:15:12] morning! [09:15:25] no alerts! nice \o/ [09:16:38] Morning everyone :) [09:37:23] Morning folks! [09:37:50] * isaranto afk lunch and errand! [10:10:23] 06Machine-Learning-Team, 06Moderator-Tools-Team, 06Research, 10Temporary accounts, 06Trust and Safety Product Team: RevertRisk model readiness for temporary accounts - https://phabricator.wikimedia.org/T352839#9897693 (10MunizaA) 05Open→03Resolved @kostajh Liftwing is now running version 0.8.0 of... [10:32:39] * klausman lunch as well [12:40:34] 06Machine-Learning-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#9898352 (10isarantopoulos) Dummy version has been deployed on ml-staging-codfw experimental. It is just a dummy service that returns the json input passed in the POST req... [12:44:47] hello folks! [12:45:17] good job :) I don't want to ruin the party, but don't lower the attention to the problem since it is still there :( [12:45:49] o/ Luca! [12:46:05] no party here, just some temporary relief :) [12:49:09] yes yes I know :) [13:44:38] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q3:rack/setup/install ml-staging2003 - https://phabricator.wikimedia.org/T357415#9898750 (10Jhancock.wm) [14:23:53] 06Machine-Learning-Team, 10Observability-Metrics, 13Patch-For-Review: SLO dashboards for Lift Wing showing unexpected values - https://phabricator.wikimedia.org/T359879#9898929 (10herron) >>! In T359879#9875881, @elukey wrote: > @herron let's double check, maybe we can drop the secondary rules and keep going... [14:25:29] Good morning all [14:28:44] hey Chris! [16:00:50] (03PS1) 10AikoChou: articlequality: add feature preprocess [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1046720 (https://phabricator.wikimedia.org/T360455) [16:06:22] ---^ submit an initial ver for feature preprocessing. It can work, returns normalised features :) [16:19:37] thanks aiko, will review! [16:20:01] I'm having some issues with RESTBase/Gateway/Rest api [16:20:44] I'm just updating the task with more info! [16:28:04] ack! [16:38:13] 06Machine-Learning-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#9899838 (10isarantopoulos) I'm trying to migrate the following example request to be used by the service: ` response = request.get("https://en.wikipedia.org/w/rest.php/v... [16:47:38] (03CR) 10Ilias Sarantopoulos: "Just a missing requirement. Other than that LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1046720 (https://phabricator.wikimedia.org/T360455) (owner: 10AikoChou) [16:48:49] aiko: so I'm going to replace the get_article_html function from the utils.py once I figure out the best way to do it [16:49:14] going afk, going to dig a bit deeper tomorrow! [17:49:49] 06Machine-Learning-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#9900244 (10Isaac) > The above would return a json that contains an "html" entry which we could use. The issue is that it seems to not be supported by the REST Gateway (e.... [18:09:44] FIRING: LiftWingServiceErrorRate: ... [18:09:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=arwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [18:11:27] Sorry I jinxed it [18:44:44] RESOLVED: LiftWingServiceErrorRate: ... [18:44:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=arwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [19:10:28] lol [22:54:46] whaaat is happening