[13:57:16] jbond: I went to puppet-merge, and there's a change from you in the queue as well as my pc2010 decommissioning [13:57:41] (FTR, happy for you to merge my change if you like, but thought we should co-ordinate) [14:01:26] Emperor: sorry about that have merged both yours and mine now [14:01:38] thanks :) [14:07:40] jbond: re.error: unknown extension ? jbond, probably didn't go as smoothly as expected :-) [14:08:07] I guess no impact as it is a new service, I guess [14:08:33] let me know if I can help [14:09:04] mutante disabled puppet now [14:09:22] I stopped ircecho and disabled puppet on alert1001, ack [14:09:40] let me know once/if you want it back [14:09:56] ack looking now looks like an issue with the regex [14:10:17] yep, *nod*, cool [14:15:52] volans: can i get a second set of eyes https://gerrit.wikimedia.org/r/c/operations/puppet/+/719271 [14:16:12] sure [14:16:18] thx [14:17:15] jbond: do ou have handy the path of the last_run_summary.yaml file? [14:17:23] to double check it [14:18:06] /var/lib/puppet/state/last_run_summary.yaml [14:18:08] volans: ^^ [14:18:48] you don't want to remove the ()? [14:19:20] not sure we need to but can [14:20:25] volans: updated [14:21:17] jbond: still missing split() [14:21:22] looks good otherwise [14:22:10] ack thanks should be good now :) (switching between ruby and python today :/) [14:23:58] jbond: sorry, stirp wants a string not a list ;) [14:24:04] strip('()') [14:24:13] or strip(')(') [14:24:21] based on what you like more aestetically ;) [14:24:47] ack fixing now [14:25:40] i worry about someone who finds `strip(')(') [14:25:50] more aestetically pleaseing :P [14:26:26] looks like you are trying to use a curried strip [14:27:10] thanks [14:28:55] lol [14:28:57] np [14:36:58] I'm available to help setting up a pontoon stack for testing FWIW [14:37:02] mutante: fix has been merged but will take 30 mins to roll out [14:38:46] jbond: ACK, I am keeping an eye on the Icinga numbers, thx [14:40:01] thx [14:43:27] jbond: I'm seeing errors with "TypeError: string indices must be integers" on toolforge hosts now [14:44:08] majavah: the change is still rolling out. e29ef93727 is the one that fixes things [14:44:23] I can also confirm CRITs are going down [14:44:41] ah, that was only a partial fix, thanks [14:44:52] majavah: yes i had to send a follow up :) [14:45:00] ... two follow ups actully [14:46:53] for some reason snapshot1015 creates cron spam about this.. but nothing else does [14:47:55] I'm seeing "ImportError: cannot import name 'Info'" too :/ [14:48:12] majavah: where are yiu seeing that? [14:48:22] mutante: it's now a systemd timer, so somehow absenting the old cron job might not have worked? [14:48:30] jbond: some toolforge hosts, I think they're all stretch [14:48:53] majavah: good point, probably it was disabled while it ran everywhere else.. looking [14:49:06] majavah: what hostname? [14:50:37] !log snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer [14:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:45] tools-clushmaster-02.tools.eqiad1.wikimedia.cloud is one example [14:50:55] (sorry, have to go afk for a bit) [14:52:42] mutante: 84 left \o/ [14:53:12] Amir1: :)) [14:56:32] fyi missing Info issue is due to stretch using an old library, going to backport the buster version [15:11:44] Icinga CRITs were down to 380 or so, then I rescheduled them all, now cleared up, starting icinga-wm again, re-enabling puppet on alert1001 [15:12:09] mutante: great thanks [15:12:21] yep,np [15:12:44] majavah: you may need update python3-prometheus-client on stretch nodes (or wiat for unmanaged updates to take care of it) [15:15:00] Hi SREs. I think I may have gotten the 'jenkins' k8s user auto-banned on the staging cluster. All k8s requests that I'm sending are being rejected with "Forbidden". Can someone have a look? [15:20:36] hey, dancy, I am afraid I know very little about k8s, but a good starting point would be to ping the person on clinic duty on the -operations channel, if you haven't gotten an answer yet, to get you to the right people [15:20:59] thx, I'll ask over there. [15:21:33] that person may not be the right one, but will know how to best route your request 0:-) [15:35:13] godog, arturo, marostegui: I got another CR for ::haproxy, it's a NOOP from haproxy point of view so it should be innocuous: https://gerrit.wikimedia.org/r/c/operations/puppet/+/719282/ [15:36:13] that looks good to me yeah [15:36:33] vgutierrez: +1 [15:36:37] thx <3 [15:40:04] yeah LGTM [16:39:02] I'm doing some bulk transfers with transfer.py in verbose mode atm, and when doing verbose mode there are a few steps that are failing with each transfer but I can't see what the commands are because they're abbreviated. Seems like the transfer itself is successful and there are no errors when not in verbose mode - anyone seen this before? [18:13:25] hnowlan, it is likely they are not failing [18:13:47] just cumin tells you a command returning non-0 cuming considers it as not a failure [18:14:00] that is why we don't print those on non-verbose mode [18:15:13] there are a few checks with grep and other stuff that we expect a non-zero value [18:16:43] as long as transfer.py doesn't return a non-zero, things will be ok, and by default, checksummed on origin and destination for your tranquility :-) [18:20:39] for example, some of the sanity checks is that the origin file exists, but not on the destination [18:21:44] it is implemented with the same function, so one of the 2 calls will fail "from the point of view of cumin" and that is ok, transfer.py know the right logic- I recommend you to not use the verbose mode, it is confusing, but I cannot modify it other than hide it in non-verbose mode [18:30:32] I think it could be fixed with the ok_codes parameter, but I would have to check it, not sure that option was available when it was implemented [18:35:21] the problem is, by ignoring the timeout, we won't be able to see it failed [18:35:38] so I guess we could change the commands to always return 0 when expected [18:37:29] something like "! command" [22:24:28] debian experts: is this me doing something wrong in my sources or is it something that doesn't exist yet for bullseye but will sometime soon? [22:24:31] https://www.irccloud.com/pastebin/X8lhegTc/ [22:52:24] andrewbogott: see https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#security-archive - the URL format changed [23:47:05] ooo.. that'll be useful in our frack testing also. thanks!