[08:19:12] Hello, I just saw the email about holding MW deployments, but I would need to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/817195 for https://phabricator.wikimedia.org/T313382 [08:19:17] So I am wondering if I can proceed or not [08:21:02] marostegui: the current problem is with scap that doesn't restart php-fpm on the canaries [08:21:12] so I think you're totally fine [08:21:44] Roger, thank you volans [08:22:00] Also you are oncall, so I can proceed blindly [08:22:09] lol [08:22:51] in volans we trust! [08:22:58] (blindly) [08:42:28] <_joe_> marostegui: I'm working on it [08:42:42] _joe_: <3 [08:42:51] <_joe_> volans: we *need* to restart php manually on the canaries if we do deployments now [08:46:00] _joe_: yes, I'm aware, manuel's patch is not touching anything scap related, hence my good to go :) [08:46:24] volans: But I do deploy with scap sync-file [08:46:43] marostegui: the above patch is in the puppet repo... wrong link? [08:46:52] volans: oh yes :( [08:46:55] ahhhh [08:47:00] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/817194 [08:47:03] sorry :( [08:47:05] I deployed that already [08:47:37] ok, then we need to restart the php-fpm on the canaries manually, but I'm sure joe has the cumin command handy :D [08:50:37] marostegui: I *think* you can use restart-php-fpm-all as command and A:mw-canary as cumin alias [08:50:44] _joe_: is the above correct? ^^^ [08:51:07] * marostegui waits [08:53:26] I'm not sure if you need to batch them or not [09:10:12] <_joe_> volans: yes [09:10:27] <_joe_> volans: batching you can do or... you cna trust poolcounter [09:10:48] <_joe_> but if you wait ~ 15 minutes, jnuche and I got to the bottom of what is not working right now [09:10:57] yeah that's why I wasn't sure, just the canaries and they have poolcounter anyway :) [09:10:59] <_joe_> and I have a puppet fix [09:11:31] good, no prob for me. Just be aware that marostegui already merged the patch (because of the confusion with the wrong link above) [09:11:37] <_joe_> ok [09:11:49] <_joe_> marostegui: did you restart php on the canaries? [09:11:53] <_joe_> i assume not [09:12:03] I think was waiting for your confirmation on the above commands [09:12:14] <_joe_> volans: go on please :) [09:12:20] ok, doing [09:13:27] {done} [09:16:49] _joe_: Nope, I was waiting indeed [09:16:55] Thank you both [09:21:04] anytime :) [09:27:47] If I want to mention two bugs in a CR, should I use >1 Bug: pseudo-header, or Bug:Txxx,TYYY ? [09:28:15] multiple Bug: lines [09:28:23] Emperor: I normally put Bug: TTT and below Bug: TXXXX etc [09:28:44] thanks :) I couldn't manage the requisite google-fu to find the answer [09:54:12] jbond: \o I think ssh_known_hosts generation is currently broken on HEAD. At least for _some_ hosts [09:54:23] And seeing as you made changes there yesterday... :) [09:54:51] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/36380/console is a run as of HEAD a few minutes ago, against ores1001 [09:56:57] https://gerrit.wikimedia.org/r/c/operations/puppet/+/816772 was merged yesterday, possibly related [09:57:00] jbond --^ [09:57:33] sorry on a meeting now but wull take a look in a sec [09:57:38] TIA [09:58:11] elukey: that's a revert, but yes, the last commit on the file mentioned in the error. Though I wonder how a revert would break stuff. [10:00:10] Maybe https://gerrit.wikimedia.org/r/c/operations/puppet/+/816724 needs reverting as well. I figure it's possible that the (now deleted) `select` could handle Nil as an input, and created something `join` was then fine with [10:00:23] klausman: the parameters variable is probably missing when evaluating the erb file, https://gerrit.wikimedia.org/r/c/operations/puppet/+/816772/2/modules/profile/manifests/ssh/client.pp [10:01:24] You're right. That `ensure` logic is probably the linchpin [10:03:33] not super urgent, let's wait for John's feedback [10:03:35] :) [10:03:43] sorry here looking now i think i allready have a fix for this [10:04:17] klausman: just to confirm i thik this is only affecting pcc right? [10:04:24] ir did you see the issue on a server as well [10:05:14] Correct, I've only seen it on pcc [10:05:19] ack thanks [10:19:08] elukey: klausman: jelto: i think pcc should be fixed now (https://gerrit.wikimedia.org/r/c/operations/puppet/+/816850) let me know if you still see issues [10:19:33] thx! My pcc checks a minute or two ago went as expected (shwoing _my_ errrors :)) [10:22:13] great :) [10:22:56] jbond: pcc looks fine now, thanks a lot :) [10:23:11] cool :) [12:24:41] jbond: thanks! [13:57:57] Rook: happy WMFversary ;P [13:58:12] :) thank you! [14:45:00] Krinkle: This is going to be an interesting one: https://phabricator.wikimedia.org/T313811 [14:48:45] <_joe_> marostegui: it's hosting stuff that was in redis before? [14:49:15] _joe_: it is hosting all the mainstash stuff [14:49:21] <_joe_> then I can answer your question myself - it can go read-only, we did so during the dc switchover [14:49:27] I guess we can put codfw in RO [14:49:29] And not eqiad [14:49:40] <_joe_> you can also evict data randomly, no one will notice [14:49:45] I just tested dbctl and it is accepted as an option [14:49:49] so I guess we should be fine [14:49:49] <_joe_> or lose a full shard [14:50:04] <_joe_> so 1/18th of the data [14:50:06] <_joe_> it's also ok [14:50:28] <_joe_> or if your replica to codfw is completely broken and replicates random statements but not all [14:50:30] <_joe_> also ok [16:26:53] moritzm: I think we've set up everything we need to for dhinus but the sso dialog (e.g. for icinga.wikimedia.org) is still turning him away. Can you help us understand what piece is missing? [16:27:02] username FNegri [16:34:15] andrewbogott: try https://idp.wikimedia.org/logout and then logging back in? [16:34:57] cdanis: that worked, thanks! [16:35:07] \o/ [16:35:17] I think group memberships are only fetched when you auth, or something [16:37:48] optionally let him try to run a command in Icinga. often they will be able to login but are missing those privs [16:38:48] hey mutante I was just about to ping you :) There's a thing on the onboarding list about adding an email address to exim, do you know is that in the mediawiki.org file? Or something else? [16:39:02] + also what is that for? [16:39:29] andrewbogott: do you have the full text / what does it say? [16:39:46] probably means to add yourself to root@ and other aliases [16:40:08] " add to root@ alias in exim (private.git make sure to use your email username, not shell) can access cloudcontrol1003.wikimedia.org" [16:40:25] ah, yea. so this is in puppetmaster1001:/srv/private/modules/privateexim/files/wikimedia.org [16:40:28] in the private repo [16:40:37] can be used to test committing in private repo [16:40:44] great, that's where we were :) [16:41:01] you can add yourself to root@ or optionally other things [16:41:07] dns-admin@ peering@ or whatnot [16:41:38] got it, thanks mutante ! [16:41:48] if you want you can ssh to mx1001.wikimedia.org and run puppet there after a commit [16:41:55] to see the actual exim change [16:42:05] cool [16:55:16] thanks, exim change done, I also ran puppet on mx1001 [16:55:47] while I'm here, could anyone please approve my subscription requests for ops@lists.wikimedia.org and ops-private@ ? [17:17:25] dhinus: the "Administrator" account password is in pwstore if you want to approve it yourself :) [17:28:18] legoktm: thanks, will do that tomorrow! [17:34:38] do the -owner special addresses still exist? you can ping the list admins at ops-owner@lists and ops-private-owner@lists if they do [17:36:24] yes, they do [17:40:31] dhinus: those addresses specifically go to the specific list owners [17:47:21] thanks mutante! I'll send an email then :)