[00:27:14] * bd808 off
[07:42:14] * Reedy waits for horizon to load
[08:04:45] <dcaro>	 that worker is the one that was stuck on NFS and I restarted earlier
[08:16:16] <arturo>	 ?
[08:20:09] <dcaro>	 tools-k8s-worker-nfs-56,
[08:20:36] <dcaro>	  (from the comment before)
[08:21:34] <arturo>	 I must have lost something
[08:22:00] <arturo>	 https://usercontent.irccloud-cdn.com/file/A1PvhD9s/image.png
[08:22:00] <dcaro>	 ` Warning  FailedMount  29s (x7 over 111s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-4hdj9" : [failed to fetch token: serviceaccounts "default" is forbidden: User "system:node:tools-k8s-worker-nfs-56" cannot create resource "serviceaccounts/token" in API group "" in the namespace "tool-wikibugs-testing": no relationship found between node 'tools-k8s-worker-nfs-56' and this object, failed to sync 
[08:22:00] <dcaro>	 configmap
[08:22:11] <dcaro>	 earlier
[08:22:22] <dcaro>	 yesterday afternoon
[08:22:23] <arturo>	 oh ok!
[08:22:28] <arturo>	 yes, I see that
[09:54:46] * arturo reimaging laptop
[10:16:52] <arturo>	 this thing with debian, with 10000 iso images options to download, and 9999 of them missing firmware, it is ridiculous 
[10:17:35] <taavi>	 I think the one that the big download button on https://www.debian.org/ links to has the firmware these days?
[10:17:48] <arturo>	 but is stable
[10:17:50] <arturo>	 I want testing
[10:18:31] <taavi>	 i've always just installed stable and then upgraded to testing
[10:18:46] <arturo>	 on the other hand, the testing installer image I just tried is 1) missing firmware for the wifi card, 2) affected by debian bug #1067831
[10:19:31] <arturo>	 I might need to do the upgrade this time
[10:46:26] <moritzm>	 not sure what you mean, the default testing netinst images available at https://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/amd64/iso-cd/ do include firmware?
[10:46:59] <moritzm>	 which are the default one is being pointed to if one selects testing
[10:47:20] <moritzm>	 but yeah, with the t64 mess it's a poor time to install testing possibly :-)
[10:50:47] <arturo>	 moritzm: the installer warned me about missing iwlwifi firmware :-( it is true I did not use the daily installer build, but a weekly one I think
[10:52:08] <arturo>	 https://saimei.ftp.acc.umu.se/cdimage/daily-builds/daily/arch-latest/amd64/iso-cd/debian-testing-amd64-netinst.iso <-- no, it was daily
[11:01:22] <moritzm>	 ah, iwlwifi is still special I think
[11:01:45] <moritzm>	 IIRC it cannot be reistributed unless once accepts the EULA in debconf or similar crap
[11:02:39] <arturo>	 I see
[11:02:54] <moritzm>	 but also not for all cards, my Lenovo X1 also has some wifi card managed by iwlwifi, but I could install it via regular d-i via wifi
[11:59:06] <dcaro>	 arturo: are you planning to release a new version of jobs-cli or should I go for it?
[11:59:19] <dcaro>	 (with the healthceck fix)
[12:02:41] <arturo>	 dcaro: mmm
[12:02:58] <dcaro>	 sent https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/25 (being proactive xd)
[12:03:18] <arturo>	 thanks, please you do it
[12:03:24] <arturo>	 I'm still fighting the laptop
[12:03:44] <dcaro>	 👍
[12:04:13] <arturo>	 this patch LGTM, thanks!
[12:43:22] <taavi>	 alerts.wikimedia.org can now silence metricsinfra alerts
[12:47:06] <arturo>	 taavi: 🎉 good work
[12:47:41] <taavi>	 and silencing metricsinfra alerts from cookbooks is basically just pending a new spicerack release
[12:58:47] * arturo food time
[13:24:34] <dcaro>	 \o/
[14:12:39] <dcaro>	 hm, I'm seeing errors when deploying jobs-api on worker-nfs-52, it fails to mount the secrets
[14:13:10] <arturo>	 is that the same worker that was rebooted yesterday?
[14:13:15] <dcaro>	 MountVolume.SetUp failed for volume "kube-api-access-7zjsd" : failed to sync configmap cache: timed out waiting for the condition
[14:13:20] <dcaro>	 no, it's a different one
[14:19:58] <arturo>	 I have no idea what that means
[14:20:13] <arturo>	 does it have good connectivity with etcd?
[14:20:20] <arturo>	 or the api server?
[14:21:48] <dcaro>	 it's non-responsive, trying the console
[14:22:08] <arturo>	 how are the D procs?
[14:22:23] <dcaro>	 it does not show in the graphs
[14:23:26] <dcaro>	 wait it's responding
[14:25:42] <dcaro>	 hmm, one of the pods for jobs-api is actually running there
[14:26:09] <dcaro>	 correctly it seems
[14:26:13] <dcaro>	 maybe it's just a warning?
[14:29:58] <arturo>	 where did you see the message?
[14:30:03] <arturo>	 in `describe pod`?
[15:02:12] <dcaro>	 get events -n jobs-api
[15:46:58] * arturo offline
[15:48:08] * dcaro off
[15:51:58] <bd808>	 Notes at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Monthly_meeting/2024-04-09. Please review and correct anything I summarized badly
[16:02:38] <andrewbogott>	 thx bd808 
[16:21:08] <andrewbogott>	 dcaro: is there any reason to scale up core/ram/cpu for the cloudcephmon refresh? Or are we good with what we have? (It looks like the default spec we would get has 2x the RAM)
[16:25:42] <dcaro>	 andrewbogott: well, we don't have a problem right now, though having more memory allows for things like having a node down for longer (because it has to keep track of the things to shift around when it comes back), so it would be nicer, but not necessary
[16:25:58] <dcaro>	 if it's not a big price jump, I think it's ok to use the new default
[16:26:22] <andrewbogott>	 ok. Momentum seems to be towards scaling up so we can just go with that.
[16:49:47] <bd808>	 old busted: openstack on kubernetes; new hotness: kubernetes on kubernetes with PXE boot images for the bare metal hosts at the bottom of the stack, all managed via Helm
[16:49:53] <bd808>	 https://github.com/aenix-io/kubefarm
[16:51:41] <andrewbogott>	 that doesn't seem terrible, although I'm always puzzled about people are thinking they'll use for a UI when migrating off of openstack to pure k8s.
[16:56:41] <andrewbogott>	 Is it expected that https://prometheus-alerts.wmcloud.org/ is down?
[16:57:51] <bd808>	 Kubefarm is from a hosting company, so I imagine they were already pretty deeply invested in their own management console. Horizon is sort a PoC system already for exposing OpenStack management to tenants. Blog post on the project at https://kubernetes.io/blog/2021/12/22/kubernetes-in-kubernetes-and-pxe-bootable-server-farm/ 
[16:59:21] <bd808>	 (found while web searching for the truly cursed idea of running k8s from UEFI)
[17:00:58] <andrewbogott>	 I don't necessarily mean web UI, I just mean UI at all. K8s has an API but surely no public clouds are exposing the APIs needed to create your own cluster, right?
[17:01:08] <andrewbogott>	 (I mean, except via custom per-cloud web ui)
[17:01:47] <andrewbogott>	 that new alert is me, likely I just need to rerun puppet on the checker host
[18:23:05] * bd808 lunch
[23:41:27] * bd808 off