[06:53:56] Ouch --^ [06:53:59] o/ [07:24:56] whenever someone has time please take a look here https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039776 [07:25:09] I'll deploy it when it is ready [07:39:33] I also set maxReplicas back to the default value (6) as we don't need 8 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1039980 [07:39:45] for viwiki-reverted that is [08:20:31] +1 [08:20:45] good morning o/ [08:29:49] o/ thanks! [08:34:19] * isaranto afk for a bit [09:15:08] deploying the changes now! [09:36:33] staging done, deploying to prod now [09:43:01] all done! double-checking that all pods are in the desired state as this affects many services [10:10:58] all seems good! [10:26:01] nice! [10:26:44] 06Machine-Learning-Team, 13Patch-For-Review: Apply multi-processing to preprocess() in isvcs that suffer from high latency - https://phabricator.wikimedia.org/T349274#9870565 (10isarantopoulos) Applied multiprocessing to `eswiki-damaging` and `viwiki-reverted` but only for large revisions (to avoid cpu throttl... [10:38:52] * isaranto lunch! [12:34:27] I also deployed the change that reduces maxreplicas in viwiki-reverted from 8 to 6 (the default for revscoring-editquality-reverted) [13:29:29] \o/ [13:37:35] 🤞 [13:39:01] isaranto: wdty about trying https://phabricator.wikimedia.org/T363336#9855485 ? [13:39:29] in theory we could select some models + rev-id combinations, and check before/after [13:39:43] from some quick tests it seems not affecting prediction values [13:39:47] but shaving down the total time [13:45:21] elukey: yes we mentioned that we'd try this as well but I didn't create the ticket to track it yet! [13:45:39] oook! I can help in case! [13:49:34] thanks for offering! we'll ping you once we start working on it [13:50:19] at least for feedback since you've worked on this already [13:55:53] worked is a big word, I just did some hacking :D [14:16:53] klausman: o/ https://phabricator.wikimedia.org/T356252#9870963 [14:16:58] not urgent, when you have a moment [14:25:50] checking [14:41:58] elukey: want em to edit the main comment tickboxes or will you do so? [14:42:02] me* [14:42:35] go ahead, thanks for checking! [14:44:01] np, thanks for thje heads up [15:54:01] I'm checking issues on mwparserfromhell repo to see if there is any clue [15:54:34] https://github.com/earwig/mwparserfromhell/issues/40 [15:55:07] looks like the tokenizer has many issues and some of them are not solved yet [15:56:02] ack :( [15:57:48] one thing that we could do is to contact the mwparserfromhell's author privately asking for an advice [15:58:20] maybe with a way to repro etc.., to know if there is a "natural" bottleneck due to size or if it is something else [15:58:56] using the extra option that I indicated in the task shaves a lot of seconds from the total, maybe there are other things that they can suggest to improve the perfs [16:00:15] aiko: --^ [16:02:41] ok! I'll look a bit more deeply into this and contact the author for help [16:08:20] ack! [16:08:26] going afk folks, have a nice weekend! [16:08:42] ciao Luca , have a nice weekend o/ [16:09:25] bye Luca! :D [16:30:13] aiko: mercelisv As we're going for the first release of the liftwing-python package in PyPI I created an example of what we had discussed about listing all available models https://github.com/wikimedia/liftwing-python/pull/8 [16:32:38] I'm going afk as well folks, have a nice weekend! [16:34:05] isaranto: thanks for working on it! [16:35:05] have a nice weekend o/ [22:25:44] FIRING: LiftWingServiceErrorRate: ... [22:25:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=frwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [22:30:44] RESOLVED: LiftWingServiceErrorRate: ... [22:30:44] LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://grafana.wikimedia.org/d/G7yj84Vnk/istio?orgId=1&refresh=30s&var-cluster=eqiad%20prometheus/k8s-mlserve&var-namespace=revscoring-editquality-damaging&var-backend=frwiki-damaging-predictor-default.%2A - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate