[08:53:16] <kormat>	 looking at https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy is pretty depressing from the point of the dbs
[08:53:57] <effie>	 because running dbs is a joyful experience ?
[08:53:58] <kormat>	 we should have all dbs migrated from stretch to buster by the end of this Q, putting us 2Qs after the end of the deprecation,
[08:54:02] <kormat>	 effie: :D
[08:54:22] <kormat>	 leaving us with a full.. 6 months of running buster before _that_ enters deprecation too
[08:56:45] <RhinosF1>	 kormat: what's not depressing from the point of dbs?
[08:56:54] <kormat>	 RhinosF1: truth
[08:57:25] <kormat>	 🥀
[09:02:17] <RhinosF1>	 kormat: my dbs are all upset with me the last few days
[09:05:57] <kormat>	 RhinosF1: you have my sympathies
[09:08:28] <Emperor>	 kormat: does suggest room for process pain-removal? Upgrades _shouldn't_ be a nightmare (ask me about the time I did RHEL7->18.04 crossgrades on systems we couldn't run the risk of losing data from some day...)
[09:09:45] <kormat>	 Emperor: 😮
[09:11:13] <effie>	  RHEL7->18.04 ?
[09:11:17] <effie>	 that actually sucks 
[09:14:14] <kormat>	 Emperor: so one of the big issues is qualifying a new mariadb release (e.g. going from 10.4 on buster to 10.5 on bullseye). this takes a Very Long Time
[09:16:49] <kormat>	 we push our dbs hard, to the point where relatively frequently we're the ones reporting bugs to upstream that no-one else has run into
[09:17:17] <kormat>	 there's no way to predict how long it will take until we're confident that 10.5 works properly for us
[09:18:06] <kormat>	 e.g. when we upgraded our first section to have a 10.4 primary, we left it running for 6 weeks before starting to upgrade any more
[09:18:52] <kormat>	 there are other issues, like mw's use of db 'groups', which make reimaging hosts a lot more painful than it would otherwise be
[09:19:13] <kormat>	 c.f. https://phabricator.wikimedia.org/T263127
[09:21:24] <Emperor>	 effie: it was quite exciting, but we only had a few of those (from a brief flirtation from RHEL); most of the hosts were still on 12.04...
[09:22:50] <Emperor>	 kormat: kittens
[09:22:57] <effie>	 I think such an experience would leave me with some chronic emotional pain 
[09:23:09] <effie>	 and I used to run gentoo servers a few jobs ago 
[09:24:14] <_joe_>	 yeah the pain is not the upgrade process per-se, it's verifying everything still works well that takes time
[09:24:20] <_joe_>	 on all layers of our infra
[09:24:25] <kormat>	 _joe_: yeah
[09:24:33] <_joe_>	 and that's pretty common for a website of a certain size
[09:24:36] <kormat>	 well, the upgrade process itself is also pretty painful for the DBs
[09:24:52] <effie>	 _joe_: still, we are entitled to complain a bit 
[09:24:55] <kormat>	 primary switchovers are no joke
[09:25:07] <effie>	 helps with relieve the pain 
[09:25:14] <_joe_>	 right; but it's not for e.g. php 7.2 to 7.4, but the overall process is painful nonetheless
[09:25:15] <kormat>	 effie: quite
[11:34:32] <kormat>	 huh. i can't edit https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests
[11:34:55] <kormat>	 moritzm: assuming you can, the link at the top of https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#Creating_new_shell_users is broken
[11:35:02] <kormat>	 should be https://github.com/wikimedia/puppet/blob/production/modules/admin/README.md
[11:36:11] <moritzm>	 let me try
[11:37:54] <majavah>	 kormat: https://wikitech.wikimedia.org/w/index.php?title=SRE/Clinic_Duty/Access_requests&curid=442028&diff=1927971&oldid=1923535
[11:38:34] <kormat>	 majavah: 👀 can you see what the acl on that page is?
[11:38:36] <majavah>	 you can't edit it due to https://wikitech.wikimedia.org/w/index.php?title=Special:Log&logid=943249
[11:38:54] <kormat>	 sigh, i see
[11:39:59] <kormat>	 jbond editted the page after that, so he must be an admin too
[11:40:45] <majavah>	 https://wikitech.wikimedia.org/wiki/Special:ListUsers/contentadmin and https://wikitech.wikimedia.org/wiki/Special:ListUsers/sysop, if you're curious
[11:40:46] <_joe_>	 yeah the admins on wikitech are set(sre) - {kormat}
[11:41:01] <kormat>	 _joe_: i can't really argue with that
[11:41:18] <_joe_>	 thankfully you understand.
[11:42:16] <kormat>	 ok, jbond+moritzm are both there. that's sufficient most likely
[11:42:49] <kormat>	 majavah: thanks!
[11:47:39] <jynus>	 "The following units failed: generate_os_reports.service" is this related to Moritz's os upgrade tool, or something else?
[11:49:58] <moritzm>	 ack, will have a look
[11:54:14] <jynus>	 not trying to people people, 0:-), just I hadn't seen that service before
[11:54:18] <jynus>	 *ping
[15:30:17] <bd808>	 kormat: you are a sysop on wikitech now. sorry ;)
[15:30:30] <kormat>	 bd808: 🙀
[15:31:01] <bd808>	 also, is that page protection really necessary? Like why did Neil decide that this particullar page needed to be locked?
[15:31:13] <kormat>	 i really have no idea
[15:31:47] <bd808>	 A whole lot more of wikitech than that page is "security-critical processes"
[15:34:25] <apergos>	 there's been a discussion about that, if not a task (which pages should we semiprotect) for ages
[15:34:38] <apergos>	 but it needs someone to drive it. meh
[15:34:58] <kormat>	 and maintain the acl group(s)
[15:35:38] <rzl>	 yeah, I can get behind locking that page, but if we're going to do it, we need to grant all SREs sysop in their onboarding checklist
[15:36:05] <rzl>	 "the public can't edit this" is very reasonable, but "some SREs can't edit it" doesn't really make sense
[15:36:12] <rzl>	 even better if we could base it on ldap/ops or something, but
[15:36:25] <kormat>	 does this qualify me? https://usercontent.irccloud-cdn.com/file/DqORyKVh/20211003_120949.jpg
[15:36:59] <apergos>	 👎
[15:39:02] <rzl>	 oh wow an old-timer
[15:39:31] * kormat creaks
[15:53:01] <bd808>	 > even better if we could base it on ldap/ops -- In theory we can do this (until I get to make wikitech a SUL wiki) -- https://www.mediawiki.org/wiki/Extension:LDAP_Authentication/Configuration_Options#Synchronizing_LDAP_groups_with_MediaWiki_security_groups
[15:54:06] <bd808>	 I think the core feature there would allow us to have anyone in the ops ldap group granted sysop on wikitech automagically, but that grant would not be revoked if the ldap membership is withdrawn
[15:54:27] <bd808>	 But it might be just as easy to add "grant sysop on wikitech" as an onboarding step
[15:54:54] * bd808 tries not to get sucked into reading laner's code
[15:55:14] <volans>	 we do have an offboarding script that might take care of the removal part, potentially
[15:58:19] <cwhite>	 I can't seem to find the source of the `kubernetes`, `k8s_event`, and `old_k8s_event` fields in logstash.  Does someone know what generates these fields?
[15:58:41] <_joe_>	 cwhite: I would assume mmkubernetes
[15:59:11] <_joe_>	 but akosiaris probably knows better
[16:00:28] <bd808>	 mmkubernetes sounds right -- https://www.rsyslog.com/doc/master/configuration/modules/mmkubernetes.html#fields
[16:01:23] <cwhite>	 _joe_: I assumed so as well, but the mmkubernetes code doesn't really explain clearly why there's three copies of mostly the same data.  It's possible I wasn't looking at the code in the right repo though (gerrit).
[16:01:58] <_joe_>	 I'm going afk now, sorry
[16:03:17] <akosiaris>	 k8s_event is eventrouter I think, not mmkubernetes
[16:03:25] <akosiaris>	 but the kubernetes struct is indeed from mmkubernetes
[16:03:54] <akosiaris>	 cwhite: if you are looking at eventrouter logs specifically, that would match my understanding
[16:04:08] <akosiaris>	 other apps shouldn't have that.
[16:05:33] <cwhite>	 akosiaris: aha!  that is the true!
[16:05:33] <_joe_>	 oh right
[16:05:39] <_joe_>	 so actual kubernetes events, right
[16:05:47] <_joe_>	 so having those fields makes sense
[16:06:17] <akosiaris>	 yeah they are part of what eventrouter emits. It's the actual data we want from that app
[16:07:15] <cwhite>	 akosiaris: That makes sense.  Thanks.
[16:38:03] <jynus>	 ganeti1009 keeps consuming more memory, a leak in some process? In theory vms have static allocations, so memory shouldn't vary much