[07:59:29] morning o/ [07:59:50] it's friday! [13:11:36] aiko: mandatory reference to Rebecca Black's video - https://www.youtube.com/watch?v=kfVsfOSbJY0 [13:23:30] lol never heard that song [13:24:26] super famous, 171M views :D [13:24:59] It was a huge meme about 13 years ago x) [13:25:21] people still comment on it though [13:25:32] lol [13:25:36] I clearly remember it when somebody says "it's friday" [13:25:51] Same. I think it's a generational marker of some sort :p [13:25:58] ahahhaha yes [13:42:22] I had just been working at G for a year or so at the time, and my team was one floor down from the YT SREs. It was wild :) [14:07:04] (03CR) 10AikoChou: [V:03+2 C:03+2] outlink: move test_transformer to unit test directory [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1031393 (owner: 10AikoChou) [14:07:54] elukey: btw, what's going on with change 1018995? It's the last of the mw-api-int-ro changes from c.laime. [14:13:22] klausman: can you please post the full link when mentioning gerrit patches? :) [14:13:33] sorr, https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018995 [14:13:36] +y [14:15:37] I think it was left out for some reason, lemme check if it works [14:18:00] yeah it fell through the cracks, but it works due to the service entry port 80 bug [14:18:14] so httpbb of course didn't highlight it [14:18:26] I think that we can safely deploy, or wait for monday [14:18:54] Either works for me. Want me to do it? [14:25:52] sure [14:26:02] Alright. [14:36:53] depolyed in codfw and confirmed working, deploying in eqiad [14:38:16] Nice, thanks [14:40:03] All done and confirmed working [14:41:37] \o/ [14:54:25] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9850230 (10elukey) 05Open→03Resolved [15:49:36] folks I have found something interesting for viwiki, all in https://phabricator.wikimedia.org/T363336#9850489 [15:49:48] cc: aiko --^ (we were discussing it during our last meeting) [16:19:44] elukey: ohhh interesting so get_revscoring_extractor_cache with high latency and fetch_features with high latency are related, but that makes sense [16:20:27] thanks for digging into the logs [16:22:19] I'll look into fetch features and try to figure out what happened 🤔 [16:25:16] I am trying to see if I can repro locally using only revscoring, but so far only pickle errors :( [16:27:30] what is the pickle error saying? [16:27:50] nevermind fixed, it was the scikit-learn version sigh [16:28:05] basically trying to do https://github.com/wikimedia/revscoring?tab=readme-ov-file#example [16:32:22] ok I think I found a way [16:32:25] The batch query would work as an amplification, in that a would be attacker could test many revisisons at the same time, to find one that breaks the service [16:32:54] nice, then we can prove if the problem is with revscoring, not the inference service or others [16:33:29] yeah I have the extractor taking long time on my laptop [16:34:07] ohhh! [16:34:09] klausman: not sure if it is an amplification, it increases the noise since the more requests are in flight the more a heavy rev-id stalls everything [16:34:44] The change/revision you mentioned on the phab task doesn't look special in any way, as far as I can tell [16:35:25] let's also be mindful that this channel is publicly logged, so if we have to copy/paste anything let's use the task [16:35:43] acK! [16:35:49] I am pretty sure this is genuine traffic that hits a bottleneck, but better to be safe :D [16:36:07] klausman: it may be related to some template changes? Too ignorant about wikitext, not sure [16:36:47] yeah, same [16:37:03] not speaking the wiki language doesn't help, either [16:38:38] aiko: I am going to write in the task how to repro locally, I have a horrible python script that is not pretty but it works [16:38:54] I used bullseye container + the monkey patching that Ilias did for pyenchant [16:39:01] Extracting features.. [16:39:01] Took 40.8557 seconds [16:39:21] now it is a matter of profiling [16:39:24] * elukey writes in the task [16:41:33] ack! the important thing is it works :D now we know the bottleneck is in revscoring [17:06:04] same to you! [17:15:35] bye Luca! have a nice weekend :) [17:30:32] Add a first stab at profiling [17:32:07] added* [17:32:17] I'm heading out now, have a nice weekend everyone [17:48:04] o/ [21:51:30] (03PS1) 10Rockingpenny4: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1037858 (https://phabricator.wikimedia.org/T218132) [21:52:21] (03Abandoned) 10Rockingpenny4: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1037858 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [21:59:21] (03PS7) 10Rockingpenny4: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) [22:01:19] (03CR) 10CI reject: [V:04-1] Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [23:09:57] (03PS9) 10Rockingpenny4: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132)