[07:16:46] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10kevinbazira) @Acraze, thank you for working on the model storage and sharing this documentation. The model upload worked successfully: ` root@ml-sandbox:/srv/home/kevinbazira# mc ls mym... [09:09:15] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10elukey) ==== Partitioning ==== As far as I can see, the kubernetes-node.cfg partman recipe creates two raid1s, one hosting root on ext4 and the... [09:43:36] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) For partitioning I'd prefer sticking close to the standard as well. In addition to an LV for /var/lib/docker we should probably think a... [09:52:50] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10Joe) >>! In T300744#7677738, @JMeybohm wrote: > For partitioning I'd prefer sticking close to the standard as well. In addition to an LV for /var... [10:09:19] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10JMeybohm) >>! In T300744#7677758, @Joe wrote: >>>! In T300744#7677738, @JMeybohm wrote: >> For partitioning I'd prefer sticking close to the stan... [11:29:41] * elukey lunch! [14:35:53] 10Machine-Learning-Team, 10serviceops, 10Patch-For-Review: Move Docker settings for kubernetes workers to overlay fs - https://phabricator.wikimedia.org/T300744 (10elukey) Thanks for all the inputs on the partitioning, I'll try to come up with a partman recipe with all the suggestions highlighted. >>! In T3... [14:49:09] Hi everyone, [14:49:09] Does someone could help me understand how works the utility function dump_cache.py? @elukey, @accraze, @chrisalbon? [14:49:09] Especially the call to the solve function: https://github.com/wikimedia/revscoring/blob/master/revscoring/utilities/dump_cache.py#L80 [14:51:12] Or does someone know where I could find halfak in IRC (he's not in libera chat)? [14:52:40] Hi SiMaig, I think that Andy may be a good point of contact for that revscoring python code, even if I am sure that nobody in the team touched it in a long time [14:53:01] I don't think Aaron is on IRC, but you could ask for help in the gh issue [15:20:19] Thanks @elukey. I will try to summarise my question on github (not easy 😅) [15:45:51] morning all! [16:21:28] o/ [16:22:31] o/ [16:26:37] folks as FYI, I am working on https://phabricator.wikimedia.org/T300744 with ServiceOps [16:26:54] we'd need to move to Overlayfs for Docker (we use device mapper now) [16:27:09] and at the same time we'd like to move to Debian 11 too (we are on 10 now) [16:27:20] it shouldn't affect anything on our stack [16:27:33] but it will require a bit of time since all nodes will be reimaged etc.. [16:27:43] we have the new codfw worker nodes, I'll start from them :) [16:28:00] this is the baseline to then think about the kubernetes upgrade to 1.2x [16:39:52] okay good to know, thanks elukey [16:41:02] 10Lift-Wing, 10SRE, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10Papaul) @klausman can we close this now? [16:45:08] 10Lift-Wing, 10SRE, 10ops-codfw: ml-serve2001 logged a corrected memory error - https://phabricator.wikimedia.org/T299427 (10klausman) 05Open→03Resolved Yes, I think so. Since the reboot, everything has been quiet: `root@ml-serve2001:/sys/devices/system/edac# grep . mc/mc*/*count mc/mc0/ce_count:0 mc/m... [16:49:18] the task if anybody wants to chime in is https://phabricator.wikimedia.org/T300744 [16:49:29] (ah already mentioned, sorry :D) [17:14:01] (03PS4) 10Halfak: nlwiki articlequality, hiwiki editquality, ores observability [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/755731 (https://phabricator.wikimedia.org/T300195) [17:51:12] going afk, have a nice day/weekend folks! [18:30:38] have a great weekend elukey! [19:32:13] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Sunset MiniKF sandboxes - https://phabricator.wikimedia.org/T293677 (10ACraze) @kevinbazira - I took a look at your isvc spec, tried to deploy it and noticed that the Knative Revisions were failing. ` kubectl describe isvc enwiki-articlequality-test-by-kevi...