[02:42:13] (Bit old but might be of interest for those that haven't read it before): Etsy's work around their PHP and k8s services https://codeascraft.com/2021/06/15/improving-the-deployment-experience-of-a-ten-year-old-application/ [12:47:46] <_joe_> has anyone ever tried to run rsyslog as non-root here? [15:16:34] _joe_: for when you have time https://gerrit.wikimedia.org/r/724990 [15:16:50] _joe_: I think I tried and mostly failed [15:16:59] (missing permissions) [15:24:43] <_joe_> Amir1: yea sorry in a meeting [16:18:19] hello everyone, I'm not sure if this is the best place to ask about mailman, but I'll try :-). Is it possible to override the "hold for moderation" notice sent when a mail is sent to a moderation-enabled list? [16:27:28] urbanecm: for a specific, approved user you mean? [16:28:33] legoktm: i mean, WMCZ will now have an announcement-list, which will be moderated for all users. I want users who will send a message there to be notified that their mail is held for moderation, but I want to customize that message [16:29:31] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/mailman-templates/+/refs/heads/master/en/list%3Auser%3Anotice%3Ahold.txt [16:29:43] that's the template you want to override [16:30:04] so it is in "templates", where i override list:user:notice:hold? [16:30:05] thanks [16:30:10] yep :) [16:49:22] <_joe_> Amir1: sorry I forgot to merge your patch [16:49:38] all good. It's not urgent [16:50:36] <_joe_> just done [16:51:08] Awesome, in half an hour-ish I will make the patch to remove the absented ones [16:55:17] <_joe_> it's awesomer that absenting the systemd timer seems to do the right thing [17:13:43] _joe_: should I expect to find the working dir for every pcc test in /srv/jenkins-workspace/puppet-compiler? I have a test where I can see the results but not e.g. the git checkout it used for the run... but maybe those are only created sometimes, or cleaned up most of the time, or...? [17:16:28] <_joe_> andrewbogott: yes you should find all the runs that happened on that node IIRC [17:16:55] ok -- looks like they get cleaned up since I can see them while the run is in progress but generally not after. That's fine, though, I got a peek at the file I wanted to see [17:17:09] (and the hiera value is there, which brings me back to the original mystery of why pcc can't find it) [17:17:43] In you're curious, the new key was added https://gerrit.wikimedia.org/r/c/labs/private/+/724831/3/hieradata/common.yaml and here's a run where that key isn't found: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31389/console [17:18:08] I assume we're misunderstanding where/when common.yaml is loaded, except I see /other/ keys in common.yaml loading *shrug* [17:32:46] <_joe_> common.yaml gets loaded from the private repo the same way it's loaded from the public one - it means that a namespaced key will be searched in a specific file [17:33:21] <_joe_> so foo::bar::baz => hieradata/common/foo/bar.yaml [18:02:37] hello, i'm running the the cookbook sre.hosts.decommission [18:02:52] it is on the generate-dns-snippets step [18:02:59] it is showing me a diff, and it mostly looks good [18:03:00] except [18:03:19] it is adding a record for +cloudinstances2b-gw.openstack.eqiad1 [18:03:27] which looks like it conflicts with another record of the same name [18:03:37] +cloudinstances2b-gw.openstack.eqiad1 1H IN A 185.15.56.238 [18:03:37] cloudinstances2b-gw.openstack.eqiad1 1H IN A 185.15.56.244 [18:04:02] Might be related to https://gerrit.wikimedia.org/r/c/operations/dns/+/684864 [18:04:06] but, that was back in april [18:04:10] arturo: ^ ? [18:04:33] I'm not sure if i should proceed with this cookbook step [18:05:42] volans|off: (oh off..) [18:06:35] if it was just the addition of .238 i'd proceed, but the diff doesn't look like it is removing .244 [18:06:37] so i'm not so sure [18:06:55] elukey: you have a bit more experience with these cookbooks I think, any advice? [18:07:20] Full paste here: [18:07:24] https://www.irccloud.com/pastebin/nmb4KwZS/ [18:12:29] that seems related to T292097? [18:12:29] T292097: Netbox info missing on some WMCS elements - https://phabricator.wikimedia.org/T292097 [18:13:50] ah great thank you majavah [18:14:40] topranks: yt? [18:14:58] yeah hey [18:15:02] see ^^ [18:15:10] (hello! :) ) [18:16:30] hmm... apologies for this, certainly wasn't expecting it to cause this. [18:17:11] That name was set on the reverse entry for the IP already, so I assumed it made sense to document it in netbox the same: [18:17:17] cathal@officepc:~$ dig +short -x 185.15.56.238 [18:17:17] cloudinstances2b-gw.openstack.eqiad1.wikimediacloud.org. [18:17:25] yeah it makes sense, just not sure what to do with the .244 [18:18:26] I will back out my change from earlier I think, which should let you proceed for now. I can discuss with arturo why .244 is set up with the same name. [18:18:54] ok thank you [18:18:57] Probably some simple mix-up, but my fix earlier is not important rolling back is probably simpelst [18:19:04] yeah [18:19:20] so what do I do with this cookbook step then....hmmmm [18:19:30] i'd like to just rerun this step, but my only options are go or abort [18:19:45] Ok so that entry is gone in Netbox now, which should mean there won't be a conflict. [18:20:40] yeah but...if i hit go...will it do the right thing? [18:20:50] i guess so, the diff it is showing me is probably just the current diff. [18:20:58] i'd expect it to do the right thing.... [18:21:01] hmmm ok. re-running makes sense, avoiding the conflict, but I suspect if you hit 'go' it'll add what it's built for the diff and is in memory (from before my revert) [18:21:08] oh you think? [18:21:26] its in the middle of the cookbook and has already taken some decom actions [18:21:43] maybe the cookbook can figure it out if i abort and rerun? [18:22:24] ok then, i'll abort and just try [18:22:28] My gut feeling would be to 'go', worst case we can probably re-run [18:22:30] oh [18:22:31] oh [18:22:32] haha [18:22:34] my gut is also to go [18:22:34] (to fix duplicate entry) [18:22:41] oh i see [18:22:42] ok [18:22:50] alright...'go' then1 [18:22:51] going [18:23:02] go for it :D [18:23:12] it did say [18:23:13] wikimediacloud.org-eqiad | 1 + [18:23:14] so [18:23:22] i think it did add .238 without removing .244 [18:23:50] maybe just an authdns update on an ns server would do [18:23:51] ? [18:24:02] ok, well hopefully that won't cause any immediate issue. we can do an authdns update to remove it I think yeah. [18:24:19] let me see if I can do that. [18:24:31] oh i just started running one on ns0 to see if it would diff [18:25:39] ok cool I'll leave it [18:26:26] didn't see a diff [18:26:44] its a cookbook you need to run [18:26:50] oh [18:26:59] topranks: maybe you can run that then [18:27:06] sre.dns.netbox or sre.netbox.dns, can't rememver which [18:27:19] Yeah all the NS are returning the .238 address now when I check manually. [18:27:27] let me have a look I'm not sure which myself. [18:34:09] ok thank you [18:35:37] Ok the cookbook ran and diff looked good - removing the newly added entry, but NS boxes are still returning .238. [18:35:51] I'll run authdns-update again see if that will force it. [18:41:11] ottomata: you're good anyway? the hosts decommision cookbook completed ok otherwise yeah? [18:41:20] yup looks fine for my stuff [18:45:19] Cool, this as far as I can tell is also ok. The .238 entry created by netbox is gone on the authdns servers, but there was a manual entry for that hostname there pointing to it all along: [18:45:21] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/refs/heads/master/templates/wikimediacloud.org [18:45:31] anyway no harm done I think. [18:48:33] ya, looks good [18:48:38] thank you for your help! [18:54:53] np... sorry for the confusion! [20:28:04] anyone that understands puppet better than me know why https://puppet-compiler.wmflabs.org/compiler1002/31422/wdqs2002.codfw.wmnet/prod.wdqs2002.codfw.wmnet.err is failing with ` Error: Function lookup() did not find a value for the name 'profile::query_service::oauth_settings' (file: /srv/jenkins-workspace/puppet-compiler/31422/production/src/modules/profile/manifests/query_service/gui.pp, line: 1) on node wdqs2002.codfw.wmnet`? [20:28:21] In the patch in question I set `default_value` to try to avoid this lookup problem, but I still have that error: https://gerrit.wikimedia.org/r/c/operations/puppet/+/725104/5/modules/profile/manifests/query_service/gui.pp [20:28:44] Shouldn't setting `default_value` make puppet happy even if the lookup can't find a value by that name? [20:40:11] ryankemper: it fails to find a value before the change and does find one after the change, the error is only in "production" so before [20:40:25] and the "check experimental" fails for some other reason [20:40:37] it looks fine here: https://puppet-compiler.wmflabs.org/compiler1001/31423/ [20:40:42] mutante: oh! that makes sense, thanks [20:41:00] https://puppet-compiler.wmflabs.org/compiler1001/31423/wdqs1012.eqiad.wmnet/index.html [20:41:11] production catalog is empty because it was "pre-broken" [20:41:19] and the new change is the fix [20:42:11] it's the bracket part of "Hosts that have no differences (or compile correctly only with the change)" [20:42:55] does have a difference, but one you want. why you cant always trust the "check experimental" method vs "manually" running compiler is another story I think [20:51:51] Yeah presumably the check experimental is just zuul / integration.wikimedia.org being finicky w/ read timeouts [20:52:03] which I was also getting w/ local PCC runs, but not on the run I posted above [20:52:28] I have had a case where "check experimental" told me it was success but if you click through to the results it was failed [20:53:06] and since then I usually use the compiler form directly [20:58:15] Oh yeah I've had many of those cases, I always click into it for that reason [21:10:16] Kind of, maybe, silly question, but can we schedule for https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210930T2300 non-training patches? [21:10:32] It is not clear to me whether "normal" stuff can go there [21:14:55] if it would fit in the Morning backport window I guess it also fits here. both say "backport". one just says "and config training". shrug [21:15:22] if in doubt ask Tyler though [21:15:29] And it's linked to the normal docs [21:15:44] thanks mutante [21:15:49] nice to see you btw [21:15:54] yw, likewise [21:16:56] hauskatze: the thread "[Wikitech-l] How we deploy code" might be a good place to clarify [21:17:02] since it's going on right now [21:17:25] * hauskatze opens another brower tab [21:17:40] https://wikitech.wikimedia.org/wiki/Deployments/Train_vs_backport [21:50:08] I have removed my patches from the window as I won't be able to stay awake [21:50:27] I'll try to get them deployed next week [22:17:35] mutante: btw when it comes to running pcc locally, do you know how to combine multiple roles? `'O:wcqs::public or O:wdqs::public or O:wdqs::internal'` doesn't seem to do the trick [22:18:04] with `check experimental` I can just have a separate `Hosts` line for each role, but not sure what the equivalent pcc cli command is [22:23:41] ryankemper: see https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Host_variable_override for the syntax -- the simplified O: syntax doesn't support "or" the way cumin does [22:24:16] you could say `cumin:O:wcqs::public or O:wdqs::public or O:wdqs::internal` but then I think it would run on all 23 hosts instead of picking a representative subset [22:24:51] Ah I see [22:25:39] it does accept comma-separated hostnames, so your best bet might to just be to pick a few hosts yourself and list em out [22:25:53] s/might to/might/ [22:26:34] but there might be a better trick I don't know [23:28:03] ryankemper: I did not have a better approach, would just pick a host manually from each role.. OR .. I would compile on "C:classname" right away. I am not even running it locally [23:28:35] but maybe just the C: option, it picks a represensititive subset