[07:23:07] good morning [07:30:21] morning [08:00:56] morning [08:08:48] there are a bunch of puppet failures on toolsbeta and in cloudinfra, are they known? [08:20:53] morning, does not ring any bells (there was some movement by andrewbogot.t on puppet, but iirc it was fixed already) [08:27:21] stashbot is back [08:27:21] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [08:44:11] the puppet errors seem to be related to a ca certificate: Info: Not using expired certificate for ca from cache; expired at 2024-03-31 20:35:10 UTC [08:44:19] that expired yesterday :/ [08:44:54] (for cloudinfra) [08:46:36] I updated this: https://wikitech.wikimedia.org/wiki/Help:Project_puppetserver#Renewing_puppetserver_CA_certificate not long ago, but it's not updated for puppet 7 (it's from before the move) [08:49:28] o/ [08:52:56] the issuer cert is expired [08:53:15] I'll open a task [08:53:18] \o [08:55:26] T361563 [08:55:27] T361563: [cloudinfra] puppet CA cert expired - https://phabricator.wikimedia.org/T361563 [09:14:59] cloud-cumin-04 gets permission denied when trying to ssh to cloudinfra VMs, is that expected? (it's been a while since I used it) [09:15:29] also other VMs too (toolsbeta) [09:15:46] what is cloud-cumin-04 ? [09:16:13] a VM? [09:16:22] probably needs a `keyholder arm`, although we should consider dropping those in favour of the new cloudcumin* hosts [09:17:07] yes, cloudcumin hosts work just fine: [09:17:13] aborrero@cloudcumin1001:~ $ sudo cumin 'O{project:cloudinfra}' 'date' [09:17:25] aborrero@cloudcumin1001:~ $ sudo cumin 'O{project:toolsbeta}' 'date' [09:17:31] ^^^ both work as expected [09:18:01] `keyholder arm` was the issue yes [09:29:47] there's some uncommited changes on cloudinfra-internal-puppetserver-1 related to designate and nova, it's not letting the repo rebase the puppet code, anyone knows what they are? (I'll just stash them if nobody claims them) [09:30:13] I have no idea, but maybe andrew has [09:42:13] hmm, there's something weird going on there, the puppet-git-sync-upstream script ends up getting stuck, and creating those changes [09:42:25] manually rebasing says I should not do it there [09:42:32] Local rebases are not allowed in this repository. Please go to frontend puppetmaster and rebase (why do you want to rebase anyways????) [09:42:41] that'll be fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1015625 [09:43:47] I'm a bit off the loop on how is the puppet code designed to get updated [09:44:16] those hooks are meant to force you to use the 'puppet-merge' command on the puppetmasters? [09:44:40] on wikiprod puppetservers, yes [09:45:20] but they were also accidentally copied to the cloud vps puppetservers, where we don't want that [09:45:38] ack, that makes sense, thanks [10:07:54] there's some race condition on the tools etcd servers, where the etcd certs are provisioned before the etcd service, and try to change the owner to `etcd` but the user does not exist yet [10:08:18] andrew was working on that iirc [10:11:10] ack [10:12:13] probably https://gerrit.wikimedia.org/r/c/operations/puppet/+/1015363 [10:13:03] I think he missed the certs xd [10:14:13] hmm, that have explicitly `before => [Service['etcd'], Package['etcd-server']],` I'll leave it for andrew [11:39:08] quick review here? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016318 [11:52:45] ^^^ self merging, as the reimage is already in flight, and I need it [12:00:05] Could I get a +1 for https://phabricator.wikimedia.org/T361566 [12:01:08] 👀 [12:01:50] Rook: +1'd [12:04:48] what do they need a floating IP for? [12:44:59] ceph sees host=cloudcephosd1034 as down, anyone doing anything with it? [12:45:09] that's me [12:45:25] oh, I see it did not silence the alerts :/ [12:47:13] quick review? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1015580 [12:52:28] LGTM [12:52:44] thanks! [13:12:06] taavi: thank you for fixing T361537 while I slept :) [13:12:07] T361537: connectivity from cloudbackup200[34] and eqiad ceph - https://phabricator.wikimedia.org/T361537 [14:04:06] dcaro: here's my latest attempt to fix puppet ordering for etcd https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016346 [14:16:44] andrewbogott: +1d and added a comment, not sure it's going to fix it in just one round xd [14:17:16] thx [14:18:10] I'm going to just try it, since it shouldn't break any existing nodes [14:40:10] quick review here? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016363 [14:42:45] * dcaro taking a break, having some neck pain [14:43:15] thanks for the review! get better soon [15:29:58] andrewbogott: is toolsbeta-test-localdisk something you're working on? it's been alerting about puppet issues for a while [15:30:30] I'm not working on it. Probably it can be deleted. [15:31:56] * taavi deletes [15:55:40] "sudo killed by Wheel of Misfortune on tools bastion" -- oops. the wheel of misfortune killed my `become wikibugs` process. [15:56:16] bd808: too much playing with fire, you eventually burn yourself :-) [15:57:43] if `become` commands have normally been killable by that script I'm pretty surprised this is the first time it has hit me. [15:58:01] I think there was a way to exclude proc names from it [15:59:15] also: "Age of candidate processes in days, defaults to 3" [15:59:51] I think I see what changed. `which bash` == /usr/bin/bash on bookworm. The script excludes "/bin/bash" from the killer [16:00:02] * bd808 will send a patch [16:00:11] right, usrmerge [16:00:28] didn't I fix that when moving to buster? [16:00:35] hmmm... no we also have "/usr/bin/bash" in the SHELLS list so that's not it [16:00:54] `sudo` is not in SHELLS though, with or without usermerge [16:01:55] semi-related: the vs code remote server is non-free, right? since I see someone running that on tools-sgebastion-10 [16:02:13] maybe just excluding the become path should do the trick [16:03:01] "/usr/bin/sudo -niu tools.wikibugs" is what `ps` sees for a `become wikibugs` exec. [16:04:42] * arturo offline [16:04:56] taavi: I'm not sure. The top of https://code.visualstudio.com/license/server seems to say that the code is under the MIT license? [16:07:34] bd808: I suspect it's the same thing as with vs code itself, where the code is available under a free license but then the version microsoft distributes has additional spyware and is not freely licensed. [16:08:55] :nod: that fits with "This license applies to the Visual Studio Code Server product. Source code for Visual Studio Code Server is available at https://github.com/Microsoft/vscode under the MIT license agreement." later in that page [16:09:09] * dcaro off [16:09:48] Explaining this to the community I'm sure will be "fun". I'm sure ChatGPT has told them it is fine... [16:10:19] * bd808 is still stewing about "GPT" being cited as an authority last week [16:32:42] i sent a polite email to that person asking them not to use the non-free vscode server version on toolforge [16:35:19] 0_o they are staff [16:38:01] dhinus: any way to tell what command it's actually trying to run here? [16:38:04] https://www.irccloud.com/pastebin/jUAlBdGB/ [16:38:33] adding that '...' truncation in the logs is not very friendly [16:38:39] which log file is that? [16:39:03] cloudcumin1001:/var/log/spicerack/wmcs/toolforge/add_k8s_etcd_node-extended.log [16:40:58] the line above seems to list the command as 'sudo -i kubectl --namespace=kube-system get configmap kubeadm-config -o yaml' [16:41:30] Yeah, that's /probably/ the command it's trying to run :) [16:41:48] But that command works, so I was hoping to find some subtle difference in what it's doing on -7 [16:42:10] (I mean, that same command works fine on -7) [17:01:23] andrewbogott: you could try running cookbook --verbose but I don't think you'll find anything more useful [17:01:46] the command that is truncated is probably just identical to the one above as t.aavi mentioned [17:02:18] * dhinus logs off [17:07:31] So it must be a race, it certainly works any time I run it directly. [17:15:53] Ah yeah 'HOWELL EDITH B ' shows another thing 'over $50' [17:24:07] whoah, that was definitely in the wrong window [17:24:17] too many conversations! [17:27:37] * bd808 long lunch