[11:25:32] btullis: hey, I'm just trying to push some dns updates with the sre.dns.netbox cookbook [11:25:43] system saying you have a lock on it now [11:26:12] which is fine, you may see a bunch of new entries for stuff in magru (Brazil POP) they are safe to proceed with [11:26:13] Yes, I'm running a decom and I see your changes too. Are you happy for me to proceed? [11:26:20] yep fire away [11:26:22] thanks! [11:26:24] Thanks. Doing so now. [11:27:03] just hoping the manual zone file edits I made the other day were complete, it may error on updating if not [11:27:08] let's see [11:27:24] Looks clean, as far as I can see. [11:27:42] great! [11:27:48] yeah it'd be obvious if there was a problem [11:56:25] I've been asked by a data-products member to either a) provide them with a username/password to log to the Mediawiki Server Admin Log, or b) create a new username/password for the MW SAL. Does any of you know where to look? Thanks! [11:58:21] brouberol: I've never had to do this from k8s before, but I started by looking at the way `helmfile` logs to SAL when we run it. It uses this script. https://github.com/wikimedia/operations-puppet/blob/production/modules/helmfile/files/helmfile_log_sal.sh [11:59:15] I don't think "Mediawiki Server admin log" is a thing [12:03:58] Is that referring to https://wikitech.wikimedia.org/wiki/Server_Admin_Log ? [12:05:09] My guess is either that or Mediawiki logging (kafka)? [12:05:25] or the mediawiki user activity log table [12:06:03] My understanding is that it's either some specific SAL (_à la_ Prod/RelEng/etc), or there's a misunderstanding [12:06:25] I agree with jynus. I think that we probably need to go back and make sure that the requirements are clear. It may be that someone has said 'we want this app to log to SAL' when there might be better options, like eventstreams. https://wikitech.wikimedia.org/wiki/Irc.wikimedia.org#Avoid_new_use [12:06:38] yeah, please ask what they want to achive 0:-) [12:07:06] haha, yep, my bad. I assumed I didn't know what he was asking for, but that the ask itself was valid [12:07:22] 90% of the cases the intention is legitimate, but giving the wrong implementation [12:08:55] or maybe they just want to do something like irc logging, but for a separate project? [12:12:48] The confusion came from https://sal.toolforge.org/analytics also being displayed at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:13:08] anyway, thanks, we're clarifying with them [12:56:47] topranks: I am finishing my week, as Chris is around you should be ok [12:57:09] cdanis: quiet day today [12:57:22] jynus: ok great - enjoy your weekend :) [12:58:00] cdanis: see if you can (and want) to ask urand*m about the expiration alerts for cassandra [12:58:10] Does anyone have any idea why I might be getting these CI failures on the puppet repo at the moment? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1021901 Am I doing something stupid, or is there an issue? [12:58:11] when he is around [12:59:26] btullis: without looking too indepth I would suspect something related to the new dc setup [13:00:08] there may be some noise due to some cycle dependencies [13:00:36] jynus: Ah, thanks. Makes sense. [13:01:24] jynus: Thanks! [13:06:51] btullis: indeed seems to be related somehow [13:07:00] sukhe: maybe you can make sense of it? [13:07:11] might relate to the changes you made the other day [13:09:36] more like some omission on our side but I'm not sure what [13:17:19] cdanis: I know about those aqs certificate expiry alerts. They are genuine, related to T352647 - It is a coincidence that the 30 day warning fired on all hosts just before the old were about to be replaced with PKI certificates. The codfw certs on aqs2* have been replaced this week, but eqiad certs will be done sometime next week. [13:17:19] T352647: Move Cassandra clusters to PKI - https://phabricator.wikimedia.org/T352647 [13:18:05] Should I add a silence for the aqs1* certificate alerts until next week, or would you prefer that I don't? [13:18:39] topranks: thanks, looking though I don't see just yet how it is related to us yet [13:24:37] sukhe: I seen regex's of IP ranges and may have jumped to conclusions yeah [13:30:07] btullis: Ah thanks! Yeah please do. [13:36:56] btullis: this doesn't seem to be related to the recent dc setup but modules/puppetmaster/spec/defines/puppetmaster_web_frontend_spec.rb is failing, even though the output seems to match [13:37:58] I think might be worth filing a task about this. jhathaway ^ if you are around, see if you can spot something? I [13:38:04] have to step away for a bit [14:31:18] effie: Not urgent but in case you missed it: the memcached exporter packaging needs a bit more work. The issue in https://phabricator.wikimedia.org/T350807#9722593 will show up more and more as packages get upgraded. [14:31:34] If you think the package is correct and puppet is wrong I'm happy to write a puppet patch [14:34:30] sukhe: sorry was in a meeting, looking [14:37:16] andrewbogott: thank you for finding this, I will sort it it as soon as I can, the package is a actually wrong, I should have renamed the binary as such [14:37:46] I dont know why I didnt get a notification for this [14:37:54] ok! I'll leave it to you then, thanks. [14:39:10] btullis: ci looks odd indeed, I'll try to dig in and see what I can find [14:41:49] jhathaway: thanks! [14:46:17] jhathaway: Also thanks. [18:19:08] a user has root on a production VM but not global root / no access to the puppetmaster and /sr/private. But they need to store private data somewhere (email addresses, names). we have 2 machines and want to sync that data between them. I am thinking just manual "git init" once somewhere in /srv and then puppetize rsync of that. [18:51:31] maybe object storage? [18:51:40] but then they need pw for that [18:58:16] nod... currently looking for puppet abstraction like git::clone but just for creating it locally. but I guess it's just an Exec of git init [19:00:54] could make something that inits it, sets permissions for the right group (not always wikidev) and also syncs it to another host [19:03:22] back in the old days would have been NFS mount :o [19:14:42] we still have NFS to the dumps server, not sure if that would help in your case though [19:18:30] heh! thanks. yea, I don't think I want to become that special case though [19:24:37] a long long time ago when the bastion server was called "fenari", user homes were NFS mounted from nfs1.pmtpa.wmnet, the deployment server was also the bastion and when NFS went down we couldn't login. Happy Friday [19:26:59] º╲˚\╭ᴖ_ᴖ╮/˚╱º ＹＡＹ！ [19:50:31] pwstore working for anyone today? If so then I've no doubt forgotten how to use it again [19:51:16] '.users file is not signed properly.' and when I check .users it tells me 'There is no indication that the signature belongs to the owner.' [19:51:42] is your ~/.pws-trusted-users up-to-date with https://office.wikimedia.org/wiki/Pwstore#User_database [19:52:08] Yep, I just checked it. [19:53:34] does `pws update-keyring` help? [19:54:22] nope, it won't run because '.users file is not signed properly.' [19:54:53] I seem to recall this error [19:56:22] wfm, trying to remember what had happened to me months ago... [19:56:36] what happens if you move `.keyring` to some other file name? [19:57:52] that does a bunch of things but then ultimately fails and gets me back in the state I was in before [19:58:00] andrewbogott: did you do the step with `gpg --import jmm.key jhathaway.key volans.key slyngshede.key` [19:58:02] ? [19:58:09] cdanis: I did [19:58:09] ^aha, I think that was it [19:58:12] oh :( [19:58:22] but also, this is in an existing setup that has worked for years so surely I had their keys already? [19:59:25] ~/.pw-trusted-users? [20:00:59] brett: I think we did that one already :) [20:01:06] sorry :( [20:01:07] Is it possible this is happening because /my/ key is expired? [20:01:30] * andrewbogott googles how to check that [20:02:09] andrewbogott: so what's the exact command you're running and the exact error you get? [20:02:25] nope, my key isn't expired [20:02:41] since I think 'There is no indication that the signature belongs to the owner.' just means that you've not explicitely marked that key as verified, not that the signature itself is invalid or so [20:03:11] The first sign of trouble was this: [20:03:15] https://www.irccloud.com/pastebin/vM6ZqRYs/ [20:04:49] `[GNUPG:] NO_PUBKEY ABA34714F5533665` [20:05:12] does `gpg --keyid-format long --list-key ABA34714F5533665` show jesse's key? [20:05:57] yes [20:06:22] Do I need to do gpg --keyserver --recv-keys in some form? [20:07:27] what about `gpg --no-default-keyring --keyring ./.keyring --keyid-format long --list-key ABA34714F5533665` in the pws directory? [20:07:44] Every time I want to complain about gpg I just remember that my tax dollars are paying to keep it difficult [20:08:46] taavi: when you say 'the pws' dir do you mean where the pws binary is, or in our password repo? (which for some reason is checked out on my system as 'pw') [20:08:51] the password repo [20:09:32] https://www.irccloud.com/pastebin/jVJBRC0i/ [20:10:25] ok, try `gpg --no-default-keyring --keyring ./.keyring --import keys/jhathaway.key` and then `pws update-keyring`? [20:11:36] hm, that produced a similar warning but I can use pws now [20:12:23] So I think I'm fixed, thank you! Is that maneuver in the docs and I just skipped over it by mistake, or should I add docs? [20:12:40] good question [20:12:41] (the docs being https://office.wikimedia.org/wiki/Pwstore) [20:15:24] so I think the failure mode is that you did not run anything that would have triggered a `pws update-keyring` (updating a file, basically) between Jesse's key being added and Jesse signing `.users` for the first time [20:16:11] ideally `pws update-keyring` would import the keys specified in ~/.pws-trusted-users regardless of whether it can verify the signature for .users or not [20:16:20] yeah, ideally :) [20:17:50] but shouldn't the gpg --import jmm.key jhathaway.key volans.key slyngshede.key bit have taken care of this? [20:19:51] that command wouldn't update the .keyring file in your pw repo though right? [20:20:01] sorry just read backscroll [20:20:04] yeah, for some reason pws maintains that separate keyring file that was out-of-date here [20:20:12] sorry for all the pings jhathaway :-) [20:21:02] no problem at all, taavi [20:21:47] and it makes it a hidden file, so you don't even know its there ;) [20:26:43] and of course wikitech-static has recovered on its own while I was looking for the rackspace password [20:42:55] andrewbogott: for next time, from the alert hosts: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -4 root@wikitech-static.wikimedia.org [20:42:58] ;) [20:43:59] unless you needed the rackspace management console [20:44:01] ofc [20:44:09] ssh wasn't responding so I needed mgmt to reboot [20:44:23] but now it's back [20:44:26] ineresting 20:44:18 up 95 days [20:44:42] yeah, it was maybe under load [20:44:45] OOM of mysql [20:45:04] but maybe be long time ago, checking time (was from dmesg) [20:45:43] nah, old thing [20:45:49] maybe network? [20:46:42] yeah, could've been network on my end or on wikitech-static's end. Something kept the daily sync from working (which I'm rerunning now) [20:46:50] * volans shouldn't be looking at this time, I'm clearly tired :) [20:47:45] andrewbogott: mmmh / is full [20:47:55] well that would do it [20:47:59] I'll do some cleanup [20:48:11] thanks [20:50:06] I've been wondering about a future for wikitech-static where we actually just bake a container image with all the bits and bobs in it and deploy that anywhere folks want. [20:51:00] mediawiki + apache + an sqlite db + local media files I think would be the bits and bobs [20:51:27] You mean instead of daily sync we'd rebuild the container? [20:51:33] yeah [20:52:01] and then pull it from wherever. there is some nice magic in podman for setting that up [20:52:27] that's appealing