[05:39:44] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 18th round of wikis - https://phabricator.wikimedia.org/T308144 (10kevinbazira) [05:41:11] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 18th round of wikis - https://phabricator.wikimedia.org/T308144 (10kevinbazira) The training pipelines of the two biggest wikis run for a really long time and got stuck a couple of times but they have finally complete... [09:40:30] 10Machine-Learning-Team, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MatthewVernon) [10:03:24] hey folks, last code review and then the ores legacy endpoint should be up and running [10:08:06] o/ [10:08:24] nicee [10:10:44] hopefully when creating the prod one it will be easier :D [10:37:31] * elukey lunch! [11:30:01] (03PS5) 10Ilias Sarantopoulos: LLM: model server example with bloom [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/919293 (https://phabricator.wikimedia.org/T333861) [12:34:09] (03CR) 10Kevin Bazira: [C: 03+1] "Super cool!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/919293 (https://phabricator.wikimedia.org/T333861) (owner: 10Ilias Sarantopoulos) [12:35:13] 10Machine-Learning-Team, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MatthewVernon) [12:41:43] 10Machine-Learning-Team, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10Jelto) [13:13:24] isaranto: `curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores/enwiki/123433/damaging" -i --http1.1` now works! [13:14:29] :tada [13:14:40] 🎉 :) Greeat work! [13:27:19] 10Machine-Learning-Team: Create a staging ingress configuration for ml-staging-codfw - https://phabricator.wikimedia.org/T335756 (10elukey) And we are finally done! ` elukey@stat1004:~$ time curl "https://ores-legacy.k8s-ml-staging.discovery.wmnet:31443/v3/scores/enwiki/123433/damaging" -i --http1.1 HTTP/1.1 20... [13:28:18] isaranto: IIUC so far everything is unblocked, so we can test / fix / deploy ores-legacy in staging without any issue [13:28:25] lemme know if anything is missing [13:29:10] thank you, you're amazing! [13:29:34] <3 [13:29:52] Morning all [13:30:00] morning! [13:30:38] after I get some ORES stuff out of my plate (ORES/hackathon) I'll take a closer look and document things that are missing from ores-legacy (exceptions/response codes etc) [13:31:11] lemme know if you want me to pick up anything [13:31:27] * elukey commutes to the office, back in a bit [13:39:12] elukey: \o [13:39:22] ah, bad timing on my side [13:40:06] question: when setting up the revertrisk stuff, should I also add it to `tlsExtraSANs`, or is it better to do that later? [13:49:17] 10Machine-Learning-Team, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ssingh) [13:53:07] 10Lift-Wing, 10Machine-Learning-Team: Lift Wing model registry - https://phabricator.wikimedia.org/T336674 (10isarantopoulos) [13:53:53] klausman: o/ it is needed yes, so if we use change prop we'll be able to call lift wing for rr [13:53:55] above --^ I added the ticket we mentioned last week. I can come back with a small POC to better explain the concept [13:54:21] elukey: alrighty. got a handful of CLs I'll send your way [13:59:05] all done [14:01:12] Right back at ya :) [14:01:30] 10Machine-Learning-Team, 10Data-Engineering, 10Event-Platform Value Stream: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page - https://phabricator.wikimedia.org/T331399 (10Ottomata) [14:02:21] One day I will figure out what prefix to use on what commit :-/ [14:04:24] look to what other SREs do, basically something that it is indicative of the commit's content plus that is consistent (more or less) with the rest :) [14:07:17] That's the thing: I don't see much consistency there, either. [14:07:47] Ok, private data stuff is done (both pseudorprivate and actual) [14:09:42] kevinbazira: o/ I am checking the board with all the tasks, and you have 7 assigned to you into in-progress state :) [14:09:57] should some of those be re-assigned to say the Growth team and moved to Watching? [14:13:27] elukey: weird. I am seeing a diff for ores-legacy [14:14:17] https://phabricator.wikimedia.org/P48230 [14:14:41] klausman: yeah I haven't rolled out ores-legacy configs in prod yet, so they may pop up.. Not sure if we have ores-legacy in production yet, so it should be safe to proceed for admin_ng [14:14:57] Alright [14:16:13] syncing codfw [14:18:04] Hm. Namespece is not visible in get namespaces [14:18:22] ah, is it because it's empty? [14:19:26] revertrisk Active 2m33s [14:19:30] --^ [14:19:33] it is there [14:19:37] Oh, I am inidiot [14:19:49] looking at eqiad while deploying to codfw [14:20:06] ahahha yeah that is not ideal :) [14:20:16] syncing eqiad [14:20:54] revertrisk Active 18s [14:20:56] yaaaay [14:21:21] aiko: revertrisk namespaces created in codfw and eqiad [14:21:32] \o/ [14:24:11] 10Lift-Wing, 10Machine-Learning-Team, 10Patch-For-Review: Move Revert-risk multilingual model from staging to production - https://phabricator.wikimedia.org/T333124 (10klausman) Namespaces are live in both eqiad and codfw: `# kube_env admin ml-serve-eqiad # kubectl get namespaces revertrisk NAME ST... [14:32:29] klausman: later on could you also check if the documentation is up to date, or if we need to add anything? [14:32:39] (about this procedure I mean) [14:33:52] Sure [14:35:50] I skimmed the docs while preparing the CLs, and I don't think anything is missing or wrong [14:38:48] 10Machine-Learning-Team, 10API-Portal, 10Platform Team Initiatives (API Gateway): Add documentation about LiftWing to the API Portal - https://phabricator.wikimedia.org/T325759 (10elukey) 05Open→03Resolved [14:38:50] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10elukey) [14:39:20] 10Machine-Learning-Team: Add guidelines about how to write/use asyncio code in KServe - https://phabricator.wikimedia.org/T324313 (10elukey) 05Open→03Resolved a:03elukey [14:39:22] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10elukey) [14:41:07] klausman: ok thanks [14:42:40] One thing I've been wondering about re: namespaces is how we sort them in the configs. At the moment we vaguely do this: keep the revscoring ones next to each other, vaguely sorted; experimental, ores and everything else hangs out near the end. It's not that anything is hard to find, but I wonder if we should have a stricter/more precise scheme [14:46:48] we can use any convention, I don't have any preference [14:49:23] I created https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Contributing_to_the_project for community folks that want to participate to the project [14:49:44] essentially the goal is to let them present in here so we can meet (virtually) and decide what work to do based on their preferences etc.. [14:49:49] lemme know if it sounds ok [14:51:02] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10elukey) [14:51:05] 10Machine-Learning-Team, 10Epic: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10elukey) [14:51:14] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Create technical documentation for Lift Wing Infrastructure - https://phabricator.wikimedia.org/T276601 (10elukey) 05Open→03Resolved a:03elukey We created https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing with a ton of material, and tod... [14:57:56] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Improve Lift Wing documentation - https://phabricator.wikimedia.org/T316098 (10elukey) In my opinion the documentation is good enough for the moment, I am inclined to close the task. We'll surely improve it over time based of people's feedback, but for t... [14:58:01] 10Machine-Learning-Team, 10Epic: Lift Wing improvements to get out of MVP state - https://phabricator.wikimedia.org/T333453 (10elukey) [14:58:12] 10Lift-Wing, 10Machine-Learning-Team, 10Documentation: Improve Lift Wing documentation - https://phabricator.wikimedia.org/T316098 (10elukey) 05Open→03Resolved a:03elukey [15:00:26] 10Machine-Learning-Team: create checklist before adding models to api gateway/prod - https://phabricator.wikimedia.org/T332711 (10elukey) We have https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing#Hosting_a_model that should take care of it, maybe we are missing a good checklist. [15:01:56] 10Machine-Learning-Team: Create documentation for a workflow for evaluating models submitted for deployment. - https://phabricator.wikimedia.org/T269171 (10elukey) [15:02:11] 10Machine-Learning-Team: create checklist before adding models to api gateway/prod - https://phabricator.wikimedia.org/T332711 (10elukey) [15:04:22] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 13th round of wikis - https://phabricator.wikimedia.org/T308138 (10elukey) a:05kevinbazira→03kostajh [15:04:25] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Chinese-Sites, 10User-notice: Deploy "add a link" to 14th round of wikis - https://phabricator.wikimedia.org/T308139 (10elukey) a:05kevinbazira→03kostajh [15:04:28] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 12th round of wikis - https://phabricator.wikimedia.org/T308137 (10elukey) a:05kevinbazira→03kostajh [15:05:28] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10elukey) a:05kevinbazira→03kostajh [15:06:46] kevinbazira: I have reassigned the old deploy add a link task to kosta*jh so that they'll know our part is completed [15:06:58] now the board looks more up to date [16:41:23] * elukey afk! [16:41:29] have a good rest of the day folks [16:44:45] ciao o/ [16:44:59] going afk as well! [22:32:42] (03PS1) 10Umherirrender: Handle possible null statistics on SpecialORESModels [extensions/ORES] - 10https://gerrit.wikimedia.org/r/919927 (https://phabricator.wikimedia.org/T329304) [22:41:14] 10Machine-Learning-Team, 10ORES, 10Patch-For-Review, 10Wikimedia-production-error: PHP Notice: Trying to access array offset on value of type null (in SpecialORESModels) - https://phabricator.wikimedia.org/T329304 (10Umherirrender) a:03Umherirrender