[06:02:36] (03CR) 10Elukey: events: add code to generate predicted_classification events (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/907923 (https://phabricator.wikimedia.org/T331401) (owner: 10AikoChou) [06:05:59] Morning! [06:13:54] morning :) [06:14:49] elukey: so the ml-cache reimage is just reimage, a chown -R and? [06:17:35] klausman: I wrote everything that I did in the task [06:17:44] Excellent [06:17:45] lemme know if there is anything missing [06:19:03] I have a long-ish errand after lunch, so I'll try and get at least one machine done before noon. [06:19:12] Or is mixing distros not supported? [06:25:33] nono it is fine, the cassandra version doesn't change [06:25:46] I have reimaged eqiad one at the time [06:26:20] Alrighty. I'll start in a few mins, then [06:26:20] but it is not that urgent, if you want to finish the SLO work first it is fine [06:26:34] nah, I'm glad to be doing something more hands-on [06:44:18] Morning folks! [06:45:34] elukey: final confirmation: reimaging ml-cache2001~2003 with sudo cookbook sre.hosts.reimage -t T331712 --os buster ml-cache200x, one at a time, then do the chown/touch/r-p-a dance [06:46:10] +1 [06:46:26] isaranto: Καλημέρα! [06:47:01] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2001.codfw.wmnet with OS buster [06:50:36] need to run an errand, bbl! [07:22:15] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2001.codfw.wmnet with OS buster completed: - ml-cache2001 (**PASS**) - Downtimed on Icinga/Alertm... [07:22:33] Ok, 2001 done, now proceeding with the other two (one at a time) [07:23:46] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2002.codfw.wmnet with OS buster [07:55:54] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2002.codfw.wmnet with OS buster completed: - ml-cache2002 (**PASS**) - Downtimed on Icinga/Alertm... [07:56:00] Two done, one to go [07:57:57] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2003.codfw.wmnet with OS buster [08:29:38] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2003.codfw.wmnet with OS buster completed: - ml-cache2003 (**PASS**) - Downtimed on Icinga/Alertm... [08:31:05] and all done [08:31:33] moritzm: do you prefer me closing T331712 (Bullseye for ml-cache) or should I leave it open? [08:36:04] klausman: nice! Please close the task, looks good [08:42:16] ack [08:42:38] done & done [08:42:55] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10klausman) 05Open→03Resolved All machines in codfw done. [08:52:41] closed tasks are usually better than open tasks :-) [08:53:10] moritzm: in theory the ml team should be all on bullseye, nothing left that I can think of [08:55:15] klausman: ml-cache200[1-3] are still on Buster, though? [08:55:38] elukey: *cough* ORES *cough* [08:56:37] I am an idiot. I indeed used --os buster %-) [08:56:48] Oh well, at least now I know how to do it [08:56:50] oh noes! [08:57:11] I didn't really saw the "buster" reference earlier on [08:57:22] moritzm: yes yes ORES is a special one right [08:57:47] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2001.codfw.wmnet with OS bullseye [08:59:21] moritzm: not sure if we updated you on our plan for ores, but isaranto is working on a k8s service called "ores-legacy" that basically mocks the ORES API and calls lift wing behind the scenes.. When it will be ready (hopefully in this Q) we'll be able to flip ores.w.o to it, and then nuke the ores bare metal nodes [09:11:23] yeah, I'm aware, I'm subscribed to some of the tasks [09:12:54] there's still plenty of time to get rid of it in time :-) [09:31:27] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2001.codfw.wmnet with OS bullseye completed: - ml-cache2001 (**PASS**) - Downtimed on Icinga/Aler... [09:42:10] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2002.codfw.wmnet with OS bullseye [10:11:56] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin2002 for host ml-cache2003.codfw.wmnet with OS bullseye [10:13:32] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2002.codfw.wmnet with OS bullseye completed: - ml-cache2002 (**PASS**) - Downtimed on Icinga/Aler... [10:30:00] * elukey lunch! [10:43:55] 10Machine-Learning-Team, 10SRE: Migrate ml-cache to Bullseye - https://phabricator.wikimedia.org/T331712 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin2002 for host ml-cache2003.codfw.wmnet with OS bullseye completed: - ml-cache2003 (**PASS**) - Downtimed on Icinga/Aler... [10:44:12] Alright, now they all should be on Bullseye. Lunch! [10:47:15] (03PS15) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T332953) [10:48:54] (03CR) 10CI reject: [V: 04-1] feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T332953) (owner: 10Ilias Sarantopoulos) [11:04:39] (03PS16) 10Ilias Sarantopoulos: feat: use Lift Wing instead of ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/910439 (https://phabricator.wikimedia.org/T332953) [11:14:05] sry for all the noise --^ I 'll take some time to figure out how to run all CI steps locally. initially I avoided that to cut some corners :) [11:35:28] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10fgiunchedi) [12:03:58] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10MoritzMuehlenhoff) [15:15:33] isaranto: o/ [15:16:06] I re-ran sextant with the serviceops templates and added more things to the fast-api chart [15:16:20] it was missing containers and configs, now the diff looks better [15:21:01] ack. will review! I also owe a review to Janis for the updated scaffolding model [15:22:07] ah wait I need to fix one thing, but mostly it looks correct [15:22:23] the old scaffolding model was missing some stuff [15:26:49] thanks!! I just commented the scratch dir stuff in values.yaml, so it will not come up in the diff [15:27:01] good point on latest though [15:27:02] lemme check [15:28:09] we could leave it like that for the moment, what do you think? [15:34:23] sure, ofc I dont mind! [15:42:08] "MountVolume.SetUp failed for volume "envoy-config-volume" : configmap "ores-legacy-main-envoy-config-volume" not found" [15:42:14] so something still not working :) [15:44:04] having some weird network issues.. everything takes forever to load [15:44:07] * isaranto sighs [15:48:14] Did u use the code from this patch with sextant? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885281 [15:54:12] yes exactly [15:56:51] the config volume config is missing from the scaffold stuff, I'll comment [16:01:34] all right I think it is all for today :) [16:01:39] Have a nice weekend folks! [16:02:53] cu Luca, have a great time! [16:26:08] heading out as well. Monday is Worker's day, public holiday in CH. See you an Tuesday! [17:35:41] logging off, cu all! [20:52:00] 10Machine-Learning-Team, 10ORES, 10Advanced-Search, 10All-and-every-Wikisource, and 65 others: Remove unnecessary targets definitions - https://phabricator.wikimedia.org/T328497 (10Jdlrobson) p:05Triage→03High @kostajh https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/903733 is... [23:33:09] 10Machine-Learning-Team, 10DBA, 10Data-Engineering, 10Infrastructure-Foundations, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10colewhite)