[08:39:13] blancadesal: Raymond_Ndibe: did we completely remove the ability to give a buildservice image a non-default name? [08:41:28] taavi: I don't remember off the top of my head. If we did, that was probably unintentionally. Do you want me to check? [08:42:14] if you have a moment that'd be nice [08:50:32] I think it happened when we moved out most of the logic from the cli to the new go api, but have not been able to find any info about whether that was deliberate decision. I'd say it probably wasn't. [08:51:20] I have nothing against bringing that option back [08:53:17] is this related to the pywikibot image? [08:53:55] sort of, yeah. I can live with it using the default name, but if there was an easy option to change it that'd been nice [08:58:21] which part(s) of the "%s/tool-%s/tool-%s:latest" pattern would you expect to be able to change? [08:59:52] the later tool-%s [09:01:55] ok, I can make the changes to the api & cli a bit later today [12:33:10] jbond: looks like the puppetserver puppetization won't run g10k correctly on the first run. I sent https://gerrit.wikimedia.org/r/c/operations/puppet/+/975257 but I'm not confident it's the right fix [12:35:17] taavi ack thanks, im about to grab some food but plan to build a puppetserver host later today so i can take a deeper look then [12:35:30] sgtm, thansk [13:24:40] i think the current neutron alerts are for cloudvirts that are decom'd or being reimaged [13:41:33] taavi: for the ability to give the buildservice images custom names: [13:41:33] https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/57 [13:41:33] https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/25 [13:41:57] blancadesal: thanks! the api one seems to have quite a few unrelated changes? [13:43:21] when you run the oapi generator to reflect changes to the openapi yaml, it does that :/ do you know if that can be avoided other than manually resetting? [13:56:31] seems like an issue that should be fixed in the tool to me, to be honest [13:56:41] anyway, I'll wrap up the Puppet 7 thing I'm looking at and review those [13:59:31] yup, it's annoying [14:46:14] blancadesal: both patches lgtm! [14:47:50] thanks taavi! I'll merge them now but won't have time to deploy them today – feel free to do so if you need them urgently, otherwise I'll do it on monday morning [14:48:05] sure, I'll deploy then [14:48:27] thanks for the patches, really appreciated [14:49:58] you're welcome, glad I could be of help [15:07:10] taavi: if https://gerrit.wikimedia.org/r/c/operations/puppet/+/975089 is wrong then should puppet create the subdirs there? [15:08:07] * andrewbogott wonders why a bunch of decommissioned servers are alerting [15:17:20] jbond: any ideas why would /etc/ssl/certs/wmf-ca-certificates.crt be empty on some cloud vps nodes? [15:31:35] taavi: i don't know that know. i noticed it i thik 2 weeks ago and fixed a few things but im not sure why its empty [15:31:51] i did ping here but it was a weekend [15:32:29] taavi: for cloud i think you can just cp /var/lib/puppet/ssl/certs/ca.pem /etc/ssl/certs/wmf-ca-certificates.crt [15:35:50] oh right, I remember now [15:35:59] anyhow https://gerrit.wikimedia.org/r/c/operations/puppet/+/975299/ should fix it for all [15:37:56] balloons: if you'd like a distraction, I could use a second opinion on https://phabricator.wikimedia.org/T348643. As far as I can tell the problem either was already over when we started upgrading (like, was a weird one-time episode), or else it was resolved by firmware updates. So I'm inclined to tell Dell to stand down (but we still need better monitoring on our end). [15:38:03] taavi: +1 thanks [15:46:23] Hmm.. are you comfortable with the drives and servers now? I'm just thinking aloud on what we would ask of Dell at this point. [15:48:27] I think I'm comfortable -- i don't really see anything that's going wrong now, although clearly something went wrong in the past. [15:49:40] I'm a little confused about the cases where the number went down -- I was assuming that that 'Offline uncorrectable sectors' referred to bad sectors that were marked out on the drive itself, but the fact that that number went /down/ in a few cases has me questioning everything :) [15:50:17] I expect to lose some sectors over time from an ssd but don't have any real expectation of what the rate would be. [15:56:46] balloons: komla: the pywikibot image should be functional now, I wrote some docs on https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts [15:57:19] taavi, amazing! Thank you so much for writing docs as well! [15:58:23] thanks taavi! [16:01:14] yay taavi! [16:11:55] komla, let's follow up directly on tools that we know were waiting for this. There's a column on the workboard I believe [20:27:54] I accidentally rebooted cloudvirt1058 [20:28:22] I was trying to restart a neutron service, but typed `systemctl reboot` instead of `systemctl restart` and systemd accepted it [20:31:32] affected VMs https://phabricator.wikimedia.org/P53555 [20:32:45] the automatic failover I set up for the Toolforge k8s haproxy layer just paid off [20:32:46] Nov 17 20:29:10 tools-k8s-haproxy-4 Keepalived_vrrp[588]: (VRRP1) Entering MASTER STATE [20:33:24] andrewbogott: ^ do you think this is worth a notice to somewhere? [21:26:42] sorry taavi was doing a laptop upgrade. I don't think it needs a notice but I'm going to check in on nfs clients of the rebooted hosts. [21:36:39] seems fine