[10:55:05] stupid question- is there a more git-ier way to "find a file by name in the current branch" than what I currenty do now (running find)? [11:01:27] <_joe_> git ls-files [11:02:12] <_joe_> the closest thing to what you want is git ls-files **filename [11:03:07] *filename* [11:03:38] TIL [11:04:00] I am reading the help for pattern matching [11:04:09] <_joe_> I haven't used git ls-files since I'm back on linux fwiw [11:04:25] <_joe_> gnu find is vastly superior [11:04:49] <_joe_> the advantage of ls-files if with the chance you have a lot of cruft around the repo [11:06:18] I think it doesn't do exactly what I need :-(, I will look at git-find or maybe create an alias [11:06:33] jynus: what are you looking for? [11:07:07] my number 1 use case is "I know the file I am looking for but do not know the extension (e.g. script-name.py(.erb?)" [11:07:59] $ git ls-files '*cross-validate*' [11:07:59] modules/openldap/files/cross-validate-accounts.py [11:08:27] weird, it didn't work for me, maybe I typed wrong the name [11:08:41] if it works, then it is what I was looking for [11:08:49] without quotes your bash might expand the * globbing with files in the current dirs [11:09:05] I see, I did "git ls-files 'cross-validate-accounts*'" [11:09:09] and that doesn't work [11:09:25] as ls-files if for the full path [11:09:32] I think that works for me [11:09:32] yep [11:09:40] thank you, volans [11:09:49] no prob [11:10:49] I also have an alias that lists which files a given commit touched [11:15:51] wait, how do you know I was about to send https://gerrit.wikimedia.org/r/c/operations/puppet/+/856518 ? [11:16:58] what's the relation? btw for that I replied on task and Filippo was instead adding a check in CI to ensure the ssh_keys is always there [11:17:00] I'm using fd-find (packaged in Debian) as a GNU find alternative for some time now, can recommend, with it it's as simple as "fdfind cross-validate" [11:17:46] which task? [11:17:51] indeed, I went the CI route to validate ssh_keys exists [11:18:07] the task that introduced the changes to data.yaml missing the ssh_keys [11:18:13] https://phabricator.wikimedia.org/T322795 this [11:19:04] that is only laterally related about my issue [11:19:36] let's talk on operations [14:34:53] fyi, mwmaint1002 seems to be failing to run puppet :/ [14:36:14] it's failing since last week, bblack it could be related to your reverts at first sight, not yet sure [14:36:51] see https://puppetboard.wikimedia.org/report/mwmaint1002.eqiad.wmnet/1a9725d03c8c024f2448f8f47c4f2da7ded7ca01 [14:36:54] Resource type not found: Profile::Lvs::Classes [14:38:37] yes, it was removed in 0ad867a5789d766f30b704bab8af4ef36c7c030b [14:42:04] hmm bblack ^^ [14:42:56] and/or brett & jbond [14:48:50] will have to wait for bblack im not sure what issue caused the revert [14:49:22] ill look at mwmaint to see if there is a simple work around in the mean time [14:57:40] also I think that icinga should alert if puppet is broken for a long time, even on a single host. It's just a warning in the UI currently AFAICT [14:59:09] the problematic change is https://gerrit.wikimedia.org/r/c/operations/puppet/+/841148 however it cant be simply reverted as it makes use of the changes that where reverted (cc joe) [14:59:21] i have a meeting now but can take another look after [15:03:11] <_joe_> this is the second time that things break because of the LVS refactorings. Maybe a larger pool of reviewers might help next time. [15:03:45] <_joe_> what was reverted and why? [15:04:31] this is broke because of the revert, im not sure why it was reverted [15:05:00] start of the revert chain is https://gerrit.wikimedia.org/r/c/operations/puppet/+/855691/2 [15:05:47] <_joe_> ah it's a revert of the change that had me change stuff :D [15:07:21] <_joe_> bblack: can you please explain the reason of that chain of reverts? [15:07:36] <_joe_> I'd like to understand which version of the code I should aim for [15:07:41] <_joe_> to fix my code [15:07:54] <_joe_> given I already adapted my code to the refactor [15:08:24] <_joe_> it's a bit unnerving if I have to change it again to the old structure, then again to the new one, to be honest [15:11:41] _joe_: fyi the old data structre dose not have an profile::lvs::configuration::all_class_host equivilent as lvs_class_hosts only includes data for the host/realm where the catalog is compiled [15:12:13] <_joe_> jbond: no I think the revert went further [15:12:27] <_joe_> all the way back to having $lvs_class_hosts [15:13:06] hi. I can help shed some light. not bblack obviously, but we encountered multiple issues when setting up lvs4008 as the new high-trafffic1 host [15:13:10] <_joe_> sorry, I was using $lvs_class [15:13:25] the issue was that for example, type Profile::Lvs::Class_hosts assumes there can only be one primary: [15:13:28] 'primary' => String[1], [15:13:29] _joe_: sorry thats what i ment i.e. no after the revrets there is no equiuvilent of :all_class_host [15:13:41] ahh ok yes lvs class miay be good enough [15:13:42] that's usually true but not during cases of transitions, such as what we have currently: [15:13:46] 'ulsfo' => [ 'lvs4005', 'lvs4007', 'lvs4008' ], [15:14:29] <_joe_> sukhe: the solution was to override the data structure in hiera on the host you're installing [15:14:34] * jbond gose back to meeting will read scroll back and respond when back [15:14:51] <_joe_> not reverting a chain of changes and breaking other stuff, IMHO [15:15:06] <_joe_> but ok, I'll change the code to unbreak mwmaint1001 [15:15:32] <_joe_> *1002 [15:17:57] _joe_: so sorry, I must be missing something here, but you are saying that the solutio was to override class_hosts and set high-traffic1 + primary to lvs4008 in the lvs4008 hiera? [15:18:26] <_joe_> the solution for making puppet work, yes [15:18:55] <_joe_> once you were ready to flip it, just remove the data structure there and add it to the general one [15:19:02] <_joe_> that's one option at least [15:19:43] <_joe_> anyways, sorry, let me focus on fixing puppet on mediawiki::maintenance [15:19:47] ok, I am not sure about that one [15:21:51] my understanding was that would still say that high-traffic1 + primary is just lvs4008 (assuming we override that in the hiera) but in reality what we wanted was that high-traffic1 can be more than one hosts, which the refactor didn't allow [15:22:54] <_joe_> ok, perfect, with the current status of that code, it's impossible for me to fix mine [15:23:09] I think you will need to rever the reverts? [15:23:14] <_joe_> the revert wasn't complete so I don't have the right data structure anywhere anymore [15:23:49] <_joe_> I frankly don't know, for now I'll just hardcode the currently correct values in puppet, it's the only thing I can do [15:24:04] <_joe_> unless I want to actually find what other changes I would need to revert [15:24:15] _joe_: you should definitely wait for bblack's reply. while I was there, I am not aware of the historic contexts around then [15:26:57] also I am supposed to move lvs4005 -> lvs4008 today [15:26:57] <_joe_> sukhe: I'll find a solution. [15:27:07] but I guess I should wait now, in case we want to resolve this cleanly :P [15:29:00] <_joe_> sukhe: should take me another 10-15 minutes [15:30:00] _joe_: take your time please! I am ready on the other fronts and assuming no big changes and clean PCC for lvs4008 (and I guess another "older" host), I will go ahead when we are ready [15:40:15] <_joe_> ok, puppet is running on the maintenance server [15:40:45] <_joe_> ofc I made a mistake lol [15:50:58] _joe_: looks good now? [15:52:38] <_joe_> sukhe: yes, sorry, go on! [15:53:37] thanks! I wasn't blocked on you (was looking at other things) but wanted to make sure that it resolved it :) [16:21:57] the context on the revert chain is... complicated? it depends on which level you want to talk about it [16:24:12] the terminology is terrible too (even in the existing code) because of the reuse of terms like "class" and "role" [16:24:48] but yeah, our workflows for smoothly replacing servers have generally involved multiple being "primary" in a given traffic class and DC, and that workflow was broken. [16:25:48] not that I'm defending the old data design, it's terrible and could use refacoring [16:25:54] *refactoring [16:26:28] but the refactor that happened and was reverted, the design of it wasn't right either. [16:27:25] arguably we could make some deeper changes here. Even the notion of "primary" and "secondary" are questionable, vs just using the primacy of current MED weights or whatever. [16:28:47] but the reason this turned into a sudden revert, is that it was blocking us on the ulsfo transition, and we exhausted our willingness to try to patch on top of it to get back to what we wanted. [16:30:51] the real-world shape of the data is more like: within a datacenter and "traffic-class", there can be any number of LVSes, and their "primacy" is really defined by the configured BGP meds (lowest med is "primary", but there can be more than two hosts and more than two distinct med values). [16:31:19] and for any pybal-restart workflow automation, we have to account for this as well (and ideally restart from the least-primary end of the set) [16:32:06] one host can also be in multiple "traffic-class" (as all the current "secondary" nodes are), so it's not a strictly hierarchical relationship, either. [16:32:38] it's more like each LVS node has an array of traffic classes it supports and the med weight it has within each class [16:33:06] lvs4010: [high-traffic1:100, high-traffic2:100, low-traffic:80] [16:33:15] or if you invert the data to a class-first view, it might also be: [16:33:57] high-traffic1: eqiad: [lvs4008:100, lvs4010:50, lvs4007:75, ...] [16:34:50] primary and secondary are overloaded terms in the current stuff: they were the original shorthand for configuring two fixed MED weights, but then also "secondary" implies it's in all traffic classes currently. [16:35:47] the refactor wasn't a pure refactor, it that it changed the data's semantics along the way. We do need to do that anyways, but it needs more thought and input to cover all the cases. [18:06:32] Did someone change my on-call shift? Looks like amer-day-pool1 has me only for today [18:13:06] It looks like urandom is set to take over for the rest of the week for some reason [18:15:37] maybe lmata knows? [18:17:14] No shift change from my end for on call I did add you to clinic duty [18:26:45] the Next Handoff time appears to have been changed for both amer-day-pool1 and emea-day-pool2 [18:29:54] yeah what rzl said, assuming that was done by mistake and set the next handoff to next monday [19:14:18] o.O [21:10:41] mutante: ready for me to merge phabricator: add parameter for mysql port ? [21:11:52] (done) [21:13:25] andrewbogott: not really, but I got puppet disabled :) thanks [21:13:41] oops, hope I didn't break anything [21:14:10] no, you did not. I got the agent disabled [21:14:29] I was considering to follow-up to avoid changing the phab prod config [21:14:56] adding the mysql.port parameter. dont want to break phab, but it's under control