[01:14:15] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [03:07:51] 06Machine-Learning-Team, 10ORES, 06Moderator-Tools-Team, 07Spike: [SPIKE] Investigate how to install ORES in idwiki - https://phabricator.wikimedia.org/T374077 (10Scardenasmolinar) 03NEW [03:08:46] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Growth-Team, 10MediaWiki-Recent-changes, and 2 others: Enable Revert Risk RecentChanges filter on id.wiki - https://phabricator.wikimedia.org/T365701#10120111 (10Scardenasmolinar) Done! [05:14:15] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [05:50:36] 06Machine-Learning-Team, 06Content-Transform-Team, 06Research, 13Patch-For-Review: Add Article Quality Model to LiftWing - https://phabricator.wikimedia.org/T360455#10120161 (10isarantopoulos) I have temporarily disabled the production deployments (available through the API Gateway) until we finalize the s... [07:23:35] Hello! [07:27:55] (03PS2) 10Kevin Bazira: locust: add Makefile to run load tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) [07:28:30] (03CR) 10Kevin Bazira: "After suggestions implemented in patchset 2, instead of running `make logo-detection` we run `MODEL_LOCUST_DIR="logo_detection" make run-l" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [09:07:33] (03CR) 10AikoChou: [C:03+1] "Thanks for working on this, Kevin. I tested it and it works great!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [09:09:00] (03CR) 10AikoChou: [C:03+2] "Thanks for the review :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070060 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [09:11:37] (03CR) 10Ilias Sarantopoulos: "Thanks for working on this. some last details and we're good to go!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [09:14:15] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [09:14:59] (03Merged) 10jenkins-bot: reference-need: initial commit [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070060 (https://phabricator.wikimedia.org/T371902) (owner: 10AikoChou) [09:28:00] (03PS3) 10Kevin Bazira: locust: add Makefile to run load tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) [09:40:06] (03CR) 10Ilias Sarantopoulos: [C:03+1] "Nice, thanks!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [10:25:49] * aiko lunch! [10:44:31] (03CR) 10Kevin Bazira: [C:03+2] "Thanks for the reviews :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [10:45:45] (03CR) 10Kevin Bazira: [V:03+2 C:03+2] locust: add Makefile to run load tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070575 (https://phabricator.wikimedia.org/T369728) (owner: 10Kevin Bazira) [11:03:11] 06Machine-Learning-Team, 13Patch-For-Review: Create a Makefile to run locust load tests - https://phabricator.wikimedia.org/T369728#10120896 (10kevinbazira) This Makefile has been added and can used as shown below: `bash # ssh into a statbox then clone the LW isvc repo $ ssh stat1008.eqiad.wmnet $ git clone ht... [11:24:26]  /me lunch! [11:24:54] lol [11:31:19] 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Growth-Team, 10MediaWiki-Recent-changes, and 2 others: Enable Revert Risk RecentChanges filter on id.wiki - https://phabricator.wikimedia.org/T365701#10120966 (10Samwalton9-WMF) [11:31:21] 06Machine-Learning-Team, 10ORES, 06Moderator-Tools-Team, 07Spike: [SPIKE] Investigate how to install ORES in idwiki - https://phabricator.wikimedia.org/T374077#10120967 (10Samwalton9-WMF) [13:14:15] FIRING: ErrorBudgetBurn: - https://alerts.wikimedia.org/?q=alertname%3DErrorBudgetBurn [13:16:23] ---^ mmm not sure what this is, looking [13:22:30] I thought this was manually triggered by Keith, but it looks like the budget is indeed burning [13:23:10] referring to the latency budget. we probably need to redefine this and have something useful that we won't ignore [13:23:11] https://grafana-rw.wikimedia.org/d/slo-Lift_Wing_Revscoring/lift-wing-revscoring-slo-s?orgId=1 [13:26:07] we also need to update the dashboards for the new quarter! [13:26:12] yeah it's a disaster right now lol [13:26:47] it seems like a disaster [13:27:18] disaster is in the eye of the beholder :P [13:27:45] :D [13:35:00] (03PS1) 10Kevin Bazira: Makefile: add support for reference-need [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070941 (https://phabricator.wikimedia.org/T371902) [13:38:03] I silenced this alert for 10 days https://alerts.wikimedia.org/?q=team%3Dml [13:40:30] (03PS2) 10Kevin Bazira: Makefile: add support for reference-need [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070941 (https://phabricator.wikimedia.org/T371902) [13:41:40] (03CR) 10Kevin Bazira: "I tested this locally by running:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1070941 (https://phabricator.wikimedia.org/T371902) (owner: 10Kevin Bazira) [13:56:36] I started looking into the RRML load test that we discussed this week https://phabricator.wikimedia.org/T372298 [13:57:04] and I see a new model deployed for rrml in august which seems super fast compared to the previous one (serves requests in under 1s) [13:58:10] lol nevermind on the "new model", the model was august 2023 [13:58:16] but the requests are still fast [13:58:26] e.g. `time curl https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-multilingual:predict -X POST -d '{"rev_id": 1241713361, "lang": "en"}' -H "Content-type: application/json"` [13:58:43] ` 0,02s user 0,02s system 5% cpu 0,708 tota` [13:59:01] will run it with locust and wrk and let you know [14:13:50] yes the model of august 2023 did improve performance, but I don't recall it becoming super fast [14:14:18] but yeah we should run some load tests to verify [14:35:22] 06Machine-Learning-Team, 10Automoderator, 10Moderator-Tools-Team (Kanban): [SPIKE]Perform a load test for Multilingual Revert Risk on LiftWing[4H] - https://phabricator.wikimedia.org/T372298#10121788 (10Kgraessle) 05In progress→03Resolved [14:47:00] yes results from RRLA and RRML seem comparable https://phabricator.wikimedia.org/P68708 [14:51:24] I also found this load test which tells a similar story https://phabricator.wikimedia.org/P64095 [16:46:24] * isaranto afk [19:03:17] Is it? I saw RRLA has an average time of 167ms, while RRML averages 479ms. but 90% of RRML requests are under 1s which is great [19:03:36] https://phabricator.wikimedia.org/P52256 [19:03:37] https://phabricator.wikimedia.org/P52253 [19:07:19] I found the load tests I did for RRLA and RRML [19:11:03] you can see the difference between them. note that this was tested from staging, so only one replica was used. the second link you shared is test results from codfw, where the model benefits from autoscaling [19:20:53] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10123018 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host dse-k8... [19:51:29] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10123081 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host dse-k8s-wo... [20:38:17] 06Machine-Learning-Team, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10123172 (10Jclark-ctr)