[07:24:38] puppet is stopped on prometheus1003, it's been stopped for a week, is that expected? [07:24:41] godog: ^ [07:31:02] marostegui: yes expected, I'm decom'ing it [07:31:26] cool, thanks! [10:45:42] marostegui: you're fine with me merging "production.pp: Remove tendril grants (cc493cf7a0)" ? [10:53:57] jayme: yes please! [10:53:59] thanks [11:05:14] hnowlan: what's the status of restbase2009? It's marked as "staged" in Netbox, has Puppet disabled since 23 days with no reason and just now has sent a cron-spam email because of an expired cert for debmonitor [11:05:55] volans: it's awaiting a replacement - I will decommission it today [11:06:31] ack, thx [12:37:48] would some sre be willing to take a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/761009? [12:37:59] and then also at https://gerrit.wikimedia.org/r/c/operations/puppet/+/761011 [12:44:53] <_joe_> zabe: I'll take a look later, after lunch :) [12:45:10] thanks :) [12:45:28] <_joe_> there's a few apache patches lingering around, I should merge a few of them [13:27:57] jbond: I'm seeing pcc errors like '[ 2022-02-10T13:27:13 ] CRITICAL: Unexpected error running run_host on ntp-02.cloudinfra.eqiad.wmflabs: [Errno 13] Permission denied: '/var/lib/catalog-differ/puppet/yaml/cloudinfra/yaml/facts/ntp-02.cloudinfra.eqiad.wmflabs.yaml'', known/expected? [13:30:19] ahh taavi sorry thats me doing soe testing one sec let me fix [13:31:08] taavi if you re-run now should work sorry about that [13:31:24] great, thanks [13:32:40] indeed it works now [13:32:45] great :) [13:33:31] except in https://puppet-compiler.wmflabs.org/pcc-worker1001/33696/ the "no differences"/"differences" headings seem very wrong.. deployment-deploy03 is the only one with changes [13:36:26] taavi: hmm that looks like a bug, one i bet is hard to reproduce :( [13:37:04] can you raise a task for that [13:38:01] and indeed the first two attemnps i have tried dont reproduce ;( (https://puppet-compiler.wmflabs.org/pcc-worker1003/33699/ https://puppet-compiler.wmflabs.org/pcc-worker1001/33698/) [13:39:11] :/ I don't think it's a big deal if it's very rare [13:41:20] taavi: ack looks like it may be this T224977 [13:41:21] T224977: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section - https://phabricator.wikimedia.org/T224977 [13:41:52] its at least similar enough to track on that [14:36:46] I'm looking into adding probes for text/upload/ncredir in service::catalog, unlike all other services they can have multiple IPs and ipv6, and for "categorisation" purposes I'm wondering if we have sth in puppet that given an ip address will return which site and sphere it is into? essentially the "reverse" of slice_network_constants [14:39:12] mmhh actually as I'm saying it perhaps it'd be best to get the networks I'm interested in and match the addresses as I iterate over services [19:31:26] is anyone around to help with some cert stuff on deployment-puppetmaster04? TLDR is the wrong private key is 'live' in /var/lib/puppet , but the correct one is in the public/private repo. How can I safely get the correct key in place? [19:31:52] ticket for context https://phabricator.wikimedia.org/T299797 [19:37:29] (when y'all aren't busy with the known incident - when would/should https://www.wikimediastatus.net update?) [19:40:25] tn: the graphs update automatically, we'll post a status update shortly [19:40:40] was just curious ^^ <3 good luck! [19:40:57] hi tn [19:41:43] heya mafk [20:47:02] tn: we're going to do some iterating on how we use the status page, but, for now we're only posting manual updates about an ongoing outage if it has been ongoing for at least 15 minutes and we expect it to last at least another 15 or so. I did (just now) retroactively add an incident to the page [20:50:53] cdanis: ah I see :-) I wasn't aware until someone pointed a moment ago that the status page is not entirely public yet [20:51:36] I saw it mentioned on ops-l a while back - >15mins sounds like a good standard though [20:51:49] tn: it's public but not yet publicized, if that distinction makes sense :) [20:52:02] we plan to make it publicized Soon(tm) [20:53:01] ^^ [21:13:58] anyone know where the $::site variable comes from in our puppet config, my grepping is failing me [21:15:32] jhathaway: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/manifests/realm.pp#7 [21:15:41] taavi: thanks! [21:15:58] <_joe_> that file should be renamed "globals.pp" [21:17:04] should have upped my grep game, easy to find in retrospect :) [21:17:35] if only we had symbol search for puppet ;) [21:18:28] I have tried puppet's language server, but it doesn't seem to work to well out of the box [21:24:51] Can any wizards help me with the correct syntax for the `Hosts:` line in my commit message? On cumin I can run `sudo -E cumin 'R:icinga::monitor::elasticsearch::cirrus_settings_check' 'foo'` to select the nodes i'm looking for [21:25:09] but not sure what the correct syntax is to wrangle `R:icinga::monitor::elasticsearch::cirrus_settings_check` into the syntax that the `Hosts:` line wants [21:25:32] https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Host_variable_override mentions that [21:25:44] > you can use cumin: followed by a cumin query using the puppetdb grammer [21:26:11] but i'm still a bit confused [21:33:23] I tried literally doing `Hosts: cumin:R:icinga::monitor::elasticsearch::cirrus_settings_check`, not expecting it to work, and can confirm it didn't work :D (or at least I assume so, zuul has been churning for 7 minutes now) [21:35:21] ryankemper: R doesn't seem to be supported? [21:35:30] only P, C, O? [21:36:13] mafk: yeah P, C, and O are only supported as simplified cumin syntax, but in the next bullet it mentions a bit about `cumin puppetdb backend expresions` [21:36:23] unfortunately I don't know what puppetdb backend expressions actually are, lol [21:36:45] but it wouldn't surprise me if the R can't be done at all, so your hunch is probably right [21:37:33] ah if `puppetdb backend expressions` means the same thing as `puppetdb host selection` then it indeed wouldn't be possible: https://wikitech.wikimedia.org/wiki/Cumin#PuppetDB_host_selection not sure if it's just different wording or actually a different concept [21:39:06] cumin:P:elastic ? [21:39:30] I don't really speak Puppet [21:40:22] ryankemper: or don't worry about the "Hosts:" line in the commit message at all and instead go do the puppet compiler web form yourself and instead of host names enter a class name, with C:foo and it should compile it on one host of each group using this class [21:40:47] or re:elastic.*wmnet [21:41:38] mutante: I have a psychological attachment to the `Hosts` directive because I like that it implicitly documents the expected set of hosts (either explicitly or implicitly) that will be changed by the patch [21:42:06] But yeah in this case I was trying to directly match on stuff that included the `Define icinga::monitor::elasticsearch::cirrus_settings_check` but I could prob go up a level and match on a class or profile and be able to use that in the `Hosts` line instead [21:42:24] ryankemper: just keep in mind it's possible the "check experimental" claims there is no change when there actually is [21:42:41] mutante: ah, why might it do that? [21:44:08] I don't really know why but it happened to me before and then I went back to using the compiler directly. [21:47:50] sorry, I know that's not very satisfying to get a vague warning but no explanation but maybe double check the compiler output link if it's a critical change [21:48:19] if it's of the type where you want the compiler to "proof nothing changes" [21:54:43] I didn't know the Hosts: shortcuts, very useful to me from now on [21:55:30] e.g. https://puppet-compiler.wmflabs.org/pcc-worker1001/1190/ [21:57:05] not sure why it didn't pick the mw maintenance host in codfw - I assume not operative so nothing to check [22:00:12] mafk: if it realizes the same class is applied on multiple hosts. e.g. "node 'mwmaint1002.eqiad.wmnet', 'mwmaint2002.codfw.wmnet' { [22:00:21] then it picks just one of them as an example [22:00:24] to be faster [22:00:31] ah [22:00:35] if you gave it a class name that is [22:00:57] I usually check both [22:01:12] so I guess re:mwmaint.*wmnet would do the trick next time [22:01:30] With O: it checked labs too which isn't bad either [22:01:38] if it had actually compiled on both but with no change then it should show up in a "no differences" section [22:02:13] the change we deployed two weeks ago showed slightly different results for eqiad and codfw iirc [22:02:25] in the beginning it did not have this and that meant more compiling on "*" which takes a long time and quickly makes the puppet compiler instances run out of disk [22:03:23] yeah, no need to check on all unless really required [22:03:54] A:mw-maintenance [22:04:24] that is the cumin alias and it is defined as: mw-maintenance: P{O:mediawiki::maintenance} [22:04:41] that P there means it's one of those puppetdb ones [22:05:31] all hosts using role::mediawiki::maintenance [22:06:16] you can also define your own alias in modules/profile/templates/cumin/aliases.yaml.erb