[03:06:28] rzl: mutante: from the department of "well, there's your problem": https://i.imgur.com/Dwc5Rej.png [03:07:54] rzl: mutante: host dashboard + nic_saturation_exporter confirms we were absolutely saturating tx on that host: https://i.imgur.com/8gExh2R.png [03:09:18] oh, this is really interesting: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=grafana1002&var-datasource=thanos&var-cluster=misc&from=now-6h&to=now&viewPanel=6 [03:09:35] the high disk utilization (it's more of a queue saturation metric) is because the hardware NIC was saturated [03:09:45] and that's a mirrored virtual block device over the network [08:16:27] backups seem to be now back on track and I don't expect any more alerts about them [08:23:44] <_joe_> so any new alert is a new problem, right? [08:29:16] Hello, is someone available for an access request question? [08:31:26] I think my access request got stuck https://phabricator.wikimedia.org/T309045, I'd like to make a db export the Growth team had scheduled for yesterday [08:35:10] <_joe_> sergi0: it's waiting for Tyler's approval [08:35:20] <_joe_> every access request needs a manager to approve them. [08:35:45] <_joe_> sergi0: but I can help you making the db export in the meantime [08:39:49] ok, got you, I thought it was stuck for other reason. The export instructions are here: https://phabricator.wikimedia.org/T307451#7923806. It's fairly easy. The from/to should be for yesterday. --from=20220509000000 --to=20220526093322 [08:44:59] <_joe_> also eswiki? [08:45:48] yes [08:47:10] <_joe_> sergi0: ok, I see a lot of PII in that file; So if you can get me access to the gdrive, I'll upload it there [08:47:39] ok, let me see [08:48:23] <_joe_> I sent an access request [08:48:55] Yeah, me too I don't have access neither. Trying to figure out who needs to approve. [08:49:09] <_joe_> heh ok, let me know :) [08:49:21] Yes, asap. Thanks a lot _joe_ ! [09:08:52] Off for dentist appt., back about 1pm CEST. [10:39:50] * topranks back [12:33:32] which one of you lovely lovely SREs fancy a little side project/challenge/chance to break LDAP? T309390 [12:33:32] T309390: [Tracking] Rename LDAP/shell account samtar to theresnotime - https://phabricator.wikimedia.org/T309390 [12:34:07] (just fleshing out the idea at the moment, I know its not a simple task and have noted as such <3) [17:01:26] TheresNoTime: it's so far beyond a simple task that I can guarantee that no single person currently employed by the Wikimedia Foundation knows enough about all of the systems involved to be able to perform all of the manual changes that would be needed to accomplish it. [17:04:34] \o/ [17:05:49] is that collection of tasks at least *really nicely documented* eh? /j [17:05:58] TheresNoTime: is there a reason we shouldn't just transfer all access currently held by uid=samtar to uid=theresnotime? that would accomplish the same thing unless I'm missing something [17:06:49] hmmmmm [17:07:52] y'all know more about that than me, I've just done a little bit of reading and a whole load of guesstimating [17:09:27] "We do not rename users (Developer accounts) anymore. It can (and has) lead to various problems and errors all over the many separate systems which consume Developer accounts as their local databases and authentication methods will get out of sync. We can reconsider when we have better tooling for identity management" -- https://wikitech.wikimedia.org/wiki/SRE/LDAP/Renaming_users [17:09:29] I too would love the ability to rename cn=Majavah to cn=Taavi, but I don't see any sane way to get it done [17:10:10] Changing the cn only breaks some things ;) Changing the uid breaks pretty much everything. [17:11:08] O.ri reported a bug just today in Striker that turned out to be caused by his cn having been changed in the past [17:12:46] I blame the "little knowledge is dangerous" that I suffer from :P but figuring out which tools/systems use which part of the ldap distinguished name feels like something someone should spend some time doing :D [17:13:30] I know that much. I don't know how gerrit works internally with it's "pile of git commits as a database" bullshit [17:14:32] the problem really is N different systems using different parts of the same LDAP record plus different local representations [17:15:52] *nod* I'm not precious about that task/request *at all*, if the sum result of it is an extra 5 minutes of someone figuring something out, then it was worth the time spent logging it :) [17:16:07] gerrit as an example uses all three of uid (shell name), cn (wikitech user name), and mail as it creates the account. [17:16:54] wikitech (MediaWiki) is the only system in the mix that has a native concept of account renaming [17:17:32] and how does that interact with ldap again..? [17:17:40] I blame some smart people in the past of MediaWiki for making the Wikimedia community think that renaming accounts is a thing that happens [17:19:22] wikitech uses [[Extension:LDAP Authentication]] as its authn source. There are local MediaWiki accounts in the labswiki database but the password verification is in LDAP. [17:20:20] hmm [17:20:32] Cloud VPS/Toolforge use the same LDAP directory in their NSS stack to provide passwd + group info, plus ssh key storage [17:21:28] Gerrit uses the same LDAP record for authn similarly to wikitech [17:21:44] Phabricator uses the same LDAP record for authn similarly to wikitech [17:22:11] I'm half tempted to try to spin up a replica of this beautiful soup of tools.. a lot of it is puppetized I imagine? [17:23:46] all of it should be in some way shape or form. not necessarily in a way that can be reused outside of the prod hosting environment, but that's a different ops/puppet.git problem [17:25:57] The dev environment for Striker has a bunch of the bits. There is a rotten puppet role for that in mediawiki-vagrant and a more up to date version in Striker's git repo itself using docker-compose and some sketchy containers. That stack does not include the gerrit part. Or the IDP server that is used to front various other prod services. [17:30:39] anyway, I'll stop spewing neckbeard doom and gloom. I'm sure that it is not technical impossible, but I also know from hard one experience that it is not trivial and that I don't have enough superpowers to do an end to end change across all of the affected systems. [17:30:57] s/hard one/hard won/ [17:31:54] The last time I really tried was for https://phabricator.wikimedia.org/T171417