[08:44:18] volans: debugging T335835 we've noticed that the varnish-restart cookbook is triggering a puppet failure due to unmet service dependencies [08:44:50] * volans reading [08:45:12] vgutierrez: is there any detail on the specific faulure? the task seems generic [08:45:19] aka we stop varnish manually, that triggers a stop for some other services like varnishmtail@default [08:45:41] then the puppet run will attempt to start varnishmtail@default, but that fails cause varnish is still stopped [08:45:47] also are you talking about the roll-restart-varnish or the run-puppet-restart-varnish one? [08:46:12] vgutierrez: that seems a puppet issue [08:46:28] run-puppet-restart-varnish [08:46:46] yeah, or "design" issue in our cookbook [08:46:47] as puppet doens't manage varnish daemon but tries to start the varnishmtail without checking if varnish is actually running [08:46:55] either that or at systemd unit level [08:47:49] not sure what you can change in the cookbook as you can't start varnish before the puppet run, it would not work for your migration [08:48:19] <_joe_> I concur with volans [08:48:45] <_joe_> you need varnishmtail to require varnish in systemd [08:49:03] <_joe_> or, alternatively, to not manage varnishmtail in puppet [08:49:50] hmmm BindsTo already implies Requires= [08:50:16] at least from systemd documentation [08:50:19] "Configures requirement dependencies, very similar in style to Requires=. However, this dependency type is stronger: in addition to the effect of Requires= it declares that if the unit bound to is stopped, this unit will be stopped too" [08:59:12] <_joe_> it's different [08:59:27] <_joe_> requires would make starting mtail also start varnish AIUI [08:59:34] <_joe_> while bindsto make it dependent [08:59:45] <_joe_> which is ok, but then don't manage the service from puppet [09:00:00] <_joe_> as all those units basically are tied to the destiny of varnish [09:00:07] <_joe_> which isn't managed by puppet [10:02:04] _joe_: are you joining the meeting? [10:02:29] <_joe_> duesen: sigh for some reason I got no notification :/ [10:02:32] <_joe_> sorry, joining [11:16:33] _joe_: hm, I am unsure I added the rate limit change to the right part of the deployment schedule. Where would it go, correctly. [11:19:14] <_joe_> klausman: it was in the right place :) [11:19:20] darn! [11:19:27] <_joe_> I was confused as it disappeared [11:19:42] It says "6 patches max" and mine was the 7th [11:20:15] So I thought it'd better be removed anyway [11:22:14] <_joe_> klausman: well add it, I'm not sure I'll be able to merge my patches, they're still waiting review [11:22:53] ah, ack. added it back [11:23:07] (yaay for second-guessing yourself :)) [12:09:54] I think you need both (requires + bindsto) [12:10:35] err sorry, no, it's After+BindsTo [12:11:17] (is the magic combo, in the unit section, to say that one service /really/ needs the other, and hook up all the right systemd-level automation to force sequencing of start/stop/etc) [12:12:11] but: if you do that, you should mirror it in puppet with a dependency between the service units at the puppet level (so that puppet acts on the lower service before it acts on the upper service) [12:12:41] unless puppet5 and ordering in the profile fixes that, which might be true [12:13:39] otherwise puppet might e.g. have a pending config change that affects the varnish service, but start varnishmtail before it does any of that, which triggers systemd to start varnish, then after puppet configures+restarts varnish. [12:37:31] klausman: your patch has actually been deployed by kamila_ just now [12:37:37] we didnĀ“t wait for the backport window [12:37:42] klausman: just deployed it [12:37:47] ^ that :D [12:38:01] you said it was somewhat urgent, so I wanted to get it deployed asap [12:39:01] sorry for the confusion [12:39:23] neat! [12:39:40] thanks a ton, I will do some testing in a moment or five [12:39:50] cool, hope it works :D [12:42:12] It does! [12:42:23] \o/ [12:42:33] thanks again :) [12:43:28] * kamila_ survived their first scap deployment \o/ [12:43:39] thank you claime for help and moral support :D [12:44:19] kamila_: happy to help <3 [12:48:54] <3 [13:50:39] bblack: fyi dependencies via catalog order is best effort. for complex and critical dependencies its best to be explicit [13:53:46] jbond: yeah I've been trying to wrap my head around that. [13:54:05] my current understanding is something like: [13:54:39] By default, things in a class are applied in evalutation order (the order they appear in the file linearly) [13:55:02] but: if there's other dependencies involved, they can override that and swap things around [13:55:13] *evaluation [13:55:16] is that roughly right? [13:55:20] yes thats about right [15:47:45] jbond: hi! pcc question [15:47:59] if I do say Hosts: P:lvs, it just picks one such node [15:48:09] how can I get it to pick all nodes matching P:lvs without passing them explicitly? [15:48:55] sukhe: IIRC PCC does pick one host per role by design [15:49:23] I can check if there is a way to force it but I'm not sure [15:49:33] volans: yeah I think it is by design and expected [15:49:47] but I was curious I can have it do all, sometimes it's kinda important for sanity, such as for the lvs ones [15:49:52] usually I do 2-3 randomized and that's fine [15:51:31] sukhe: try sudo cumin 'P:LVS' 'date' and then copy/paste the resulting list into compiler form [15:51:36] lvs[2010-2013].codfw.wmnet,lvs[6001-6003].drmrs.wmnet,lvs[1017-1020].eqiad.wmnet,lvs[5004-5006].eqsin.wmnet,lvs[3005-3007].esams.wmnet,lvs[4008-4010].ulsfo.wmnet [15:51:40] I *think* that if you pass to it the output of cumin 'P:lvs' [15:51:50] it does what you want, but you were looking for a way to avoid that [15:51:59] mutante: yep, that's what I have been doing [15:52:02] mutante: if you just want the host selection no need to pass any command [15:52:04] getting the list through cumin and outputing that [15:52:07] cumin will just print the hosts and exit [15:52:32] volans: ah, ACK! [15:52:41] what I want is something like: Hosts: P:lvs* or something [15:52:53] and it's fine if it doesn't exist but I was curious if it does [15:55:03] sukhe yuo can use the cumin syntax e.g. `Hosts: cumin:P:lvs` [15:55:14] oh really! [15:55:33] yuo can also use `Hosts: re:lvs.*` [15:55:33] so basically prefixing cumin? [15:55:37] yes [15:55:51] TIL, nice [15:56:04] doh, I forgot about that extension [15:56:14] jbond: thanks! [15:56:33] so.. P:lvs = 1 random host cumin:P:lvs = all hosts, so it's like "P:lvs --all" [15:57:31] As volans says its best to rely on the `Hosts P:lvs` method as it tries to do some sane filtering. but yes some times you do want to test more things. in those caes i genrally would use `P:lvs` untill things are pretty healthy then do the full set of hosts as a sanity at the end [15:57:32] my random feature request: can cumin's host selector have a regex option? :) [15:58:15] re:foo or whatever [15:58:37] bblack: for which selection? [15:58:38] mutante: `P:lvs` gets all hosts the have the lvs profile and then tries to filter them for uniqnes which is a bit fuzy but it shuld work out to one random host from every role that uses profile::lvs [15:59:30] `Hosts: auto` dose the same logic for every resource in the change [15:59:44] thanks jbond [16:00:06] so for instance oif you make a change to e.g. ferm::service and use Hosts: auto, it shuld end up testing one host from every role [16:00:57] volans: for cumin CLI host selection [16:02:22] it doesn't support full regex but has globbing, also you can use full puppetdb regex against the hostname or fqdn facts [16:02:38] $ sudo cumin 'F:hostname ~ "cum.n.001.*"' [16:02:42] cloudcumin2001.codfw.wmnet,cloudcumin1001.eqiad.wmnet,cumin1001.eqiad.wmnet [16:03:29] ah I didn't even know about the globbing. that's good enough most of the time I think :) [16:03:44] globbing is there since day 1 :D [16:03:45] https://wikitech.wikimedia.org/wiki/Cumin#PuppetDB_host_selection [16:04:15] then you have the clustershell expansion of ints so cp10[12-32]* [16:04:21] yeah I hate those [16:05:01] (personal preference, I just don't like that syntax) [16:05:08] how much the fact that cp clusters have odd and even hosts affects your preference? :) [16:05:17] but I never realized about the glob support. [16:05:55] well even the other day, I wanted to do a cumin for basically dns[3456]001 [16:06:17] dns?001 :) [16:06:24] I don't want [12] [16:06:42] 'dns[3-6]001*' [16:06:58] ah no [16:06:58] is that clustershell I guess? [16:07:04] damn [16:07:25] no that's correct [16:07:33] 001 there are only esams and drmrs [16:07:46] well yeah, that was half the battle too, I was simplifying :) [16:08:00] all of the non-eqiad-codfw 'dns[3-6]00*' [16:08:45] not understanding clustershell well, I tried at one point yesterday to see what 'dns[3000-7000]' would do. I ended up canceling, I think it hands up trying to iterate all those numbers or something. [16:09:00] yes it expands it [16:09:13] that's querying puppetdb for 4k hostnames :D [16:09:19] lol [16:09:39] and you need the * at the end if you don't put the fqdn part [16:10:08] yeah I guess I had .wikimedia.org, they're all the same [16:10:54] is clustershell a superset of shell glob? [16:11:00] no [16:11:56] has it's own powerful syntax [16:12:16] yeah I've seen some of it [16:12:32] I was just trying to understand where the glob support came in from and if it was like shell [16:12:46] https://clustershell.readthedocs.io/en/latest/api/NodeSet.html [16:12:54] the glob is cumin [16:13:26] so you need the F:hostname~ part to switch? [16:13:36] to switch to what? [16:13:54] how do I know if what I type will be interpreted by glob rules or clustershell rules? [16:15:05] pure host selection is clustershell grammar + * for simple globbing [16:15:30] then there are the puppetdb parts like class/resources/facts [16:15:30] ah got it [16:15:51] sorry when my brain reads "glob", I read that as a specific language, the one used in e.g. glob(3), not just the * [16:16:25] which also isn't very standard and has lots of variations lol [16:16:33] it shoul dbe all documented in https://doc.wikimedia.org/cumin/master/api/cumin.backends.puppetdb.html#cumin.backends.puppetdb.PuppetDBQuery and https://wikitech.wikimedia.org/wiki/Cumin#PuppetDB_host_selection [16:16:39] but let me know if something is amiss [16:16:40] but basically "shell glob" [16:17:13] yeah it's not a full shell glob [16:19:00] anyways, I'll stop being cantankerous :) Regex > all [16:19:33] (but yeah, I'm sure the problem is all the backends don't support it) [16:19:33] also if you run a cumin query with -d/--debug and then tail -n1 /var/log/cumin/cumin.log you get the puppetdb query [16:23:23] bblack: that said, 90% of the time you should look more at puppet resources than hostname regexes :D [16:25:21] maybe. we don't have good metadata for some cases, but maybe we should. [16:26:17] the example in my mind from the other day, is I was trying to find a way to succinctly say "I want to run this on the first of the two dns boxes in every edge site". With the random numbering and them just being cluster peers, it's hard to say that. [16:26:43] it didn't even matter I guess which was first. More that I could run a command on exactly 1/2 at 4 separate sites, and then at a later time run it on the opposite set. [16:27:39] I guess we could give them some metadata about that. Have something visible from puppet that labels them as [a, b, c, ...] within a site, regardless of the hardware-process-driven dnsX00N numbering. [16:29:14] could you use odd/even or you have cases in which the hardware-process-driven dnsX00N numbering ended up being both odd/even? [16:29:29] (the abc labels being sort of like a sub-role. if dns3001 is 'a', and you replace it with dns3005, dns3005 now becomes the a-node) [16:29:45] volans: yeah it's different all over as things evolve over the years, in the general case [16:31:13] or some general concept of consistent ordinals within a "cluster" [16:31:16] or just have that into the cumin aliases if it's the only use case [16:31:22] yeah [16:32:30] or some concept of a consistent ordering within a selected group [16:32:57] I donno [16:34:04] in the long view, the trailing part of the hostname numerals is kinda-random [16:35:04] the real answer is our operations should eventually evolve to a state where we're not running cumin commands on them as part of anything routine :) [16:36:22] ehehe [16:36:34] write a cookbook for it and will be much easier to do any reasoning about them [16:36:41] :P [16:38:08] cookbooks as a general concept, I find super-valuable, but some of the recent stuff I've seen flying around, seems a bit over-abstracted at times. Hard to wade through the complexity and be sure what's actually happening. [16:38:55] kinda reminds me of eventloop programming. you have to have a good model of the entrypoints in your head, I guess, and how they're connected. [16:39:46] you're not forced to use that abstraction if the choice is between not having a cookbook and having one that does everything from scratch not using some abstraction I still prefer the latter [16:40:14] we're trying to improve the abstraction you're referring to (the batch classes) also to have better and automated documenation on what they actually do [16:48:30] bblack: for the interim of a perfect solution, this would work for now: [16:48:37] A: 'dns[3-6]* and dns*[1,3,5].*' B: 'dns[3-6]* and dns*[2,4,6].*' [16:49:42] :) [21:10:10] jhathaway: any concerns about https://gerrit.wikimedia.org/r/c/operations/puppet/+/927795 ? [21:10:31] andrewbogott: looking now... [21:12:54] thx