[07:30:32] I'm going to be rebooting the codfw bastion (bast2002) in 5 minutes [07:30:47] let me know if that's inconvenient for anyone [07:33:48] good morning :) We could use a puppet-merge for the scap script which sync the deployment hosts. That is to use `rsync --new-compress` https://gerrit.wikimedia.org/r/c/operations/puppet/+/774824 [07:34:11] we already have that option set in scap itself but I have missed the `scap-master-sync` is provided by Puppet :] [07:56:33] <_joe_> hashar: can look in a few [07:57:53] \o/ :] [08:26:45] <_joe_> hashar: DRY pls :P [08:26:55] <_joe_> I know it's not your fault [08:27:41] yeah I felt like writing a rsync function or add an env variable to pass all the options [08:27:50] then felt it was way easier to just copy paste :D [08:28:58] <_joe_> hashar: yeah merging now, but yeah :) [08:39:54] <_joe_> hashar: done, btw [08:47:32] _joe_: merci beaucoup! [13:13:07] volans: for your admiration, the command at the top of https://phabricator.wikimedia.org/T303174 [13:14:02] lol [13:14:18] very impressive :) [13:14:36] * kormat bows [13:14:47] you know we also have https://debmonitor.wikimedia.org/kernels/ :-P [13:15:13] volans: can you estimate how long it would take to get the same result using that? [13:15:29] 5 minutes [13:15:35] reeeally. do tell. [13:15:49] click on kernel, filter with 'db', copy+paste [13:16:06] repeat for the few kernels you are interested [13:16:25] ah. so entirely manual copy+pasting [13:16:40] and then editing it into the phab format [13:16:49] i want something i can run multiple times a day [13:17:08] n [13:17:12] keeping track by hand of which hosts are done/not done is just too error-prone [13:17:21] Oops, wrong window. [13:18:48] yeah agree, I thought was a one-off [13:20:04] the other option, just to mess with your sanity, would be to use the json output of cumin and do it in jq :-P [13:20:41] oh dear goddess no :P [13:20:47] though i do appreciate the 'thought' [13:21:00] jq is always hard to wrap your head around :) [13:22:00] but I tend to use CLI solutions too, just because they're more-composable into bigger things. If I start at debmonitor in a browser, I've gotta paste the data back somewhere into a file eventually to do something with it. The workflow is already there on the CLI [13:22:07] character-building, is jq's UI... [13:22:22] * volans take a note for a CLI for debmonitor :D [13:23:09] that's the one thing I find most-frustrating about netbox, too. It's a GUI-first tool, or at least has GUI-first documentation for us in practice. [13:24:39] netbox APIs allow to do almost everything, but are a bit painful because of pynetbox that is not "magic" enough to convert the REST API into something totally useful [13:24:51] I end up often to play with nbshell (netbox django shell) [13:24:52] jq is a bit like sed/awk in that you read it later and go "what does that do?" [13:28:03] yeah, CLI-ness or GUI-ness tends to pervade a design in general. I can't think of great examples of tools that do both amazingly without some tradeoff in some direction. [13:28:13] or API-ness I guess, is another [13:37:18] just because I was nerdsniped into it... one-liner in the task :D [13:37:59] hahaha [13:38:14] phab-compatible output ;) [13:38:32] nice :) [13:38:59] * Emperor resists urge to golf [13:39:17] volans: impressive :) [13:40:18] ... how are you sorting the output? [13:41:08] `-o txt` does that automatically? [13:41:41] yes [13:41:49] very cute. [13:41:50] both txt and json output do sort keys [13:42:12] i'll have to remember that [13:42:19] and I'll have to document it :D [13:42:26] * kormat grins [13:58:32] does cumin and/or cookbook stuff yet have some kind of dc/cluster -aware shuffler? [13:58:37] I was thinking about that yesterday [13:58:57] something similar in spirit to the cron_splay thing (but hopefully far less ugly heh) [14:00:45] bblack: for cumin on the general use case we have the very old T164587 where I think an agreement was not found, but could be resumed [14:00:46] T164587: cumin could use randomization/splay options - https://phabricator.wikimedia.org/T164587 [14:01:05] (the idea is take a given hostlist, break it up into per-dc sublists based on the first digit of the numeric part, shuffle randomly in each sublist, then zipper them back together for a pseudo-random order that rotates between DCs fairly [14:01:06] for any wmf-specific logic it would be better to put it into spicerack that's not a general purpose tool [14:01:09] ) [14:01:24] s/based on the first digit of the numeric part/on netbox's site property/ [14:01:35] sure :) [14:02:41] I can foresee different services needing different shuffling algorithms though, like do first one dc then another, or do first replicas and then masters, etc... [14:03:00] yeah, especially stateful thing [14:03:02] s [14:03:30] you could have a generic concept of a shuffle strategy and then various common ones or whatever, too [14:04:00] I guess we could put something into the logic behind the rolling operations [14:04:14] I guess is where it would be mostly needed [14:04:32] but for stateless services, there's a common pattern of "I want to hit all these servers, but do it in the least-disruptive way by paying attention to redundancies". Commonly for us, the important subdivisions would be the site, and maybe the cluster or profile name. [14:04:53] e.g. if we operate on all cpNNNN, split by text-v-upload and per-dc, to do the rotation/shuffling. [14:05:17] trying to bring that back to something more generic and cumin-appropriate, though [14:05:38] maybe something based on IP addresses would work, in some sense [14:05:47] cumin has FQDNs, so just grouping by the way clustershell groups might be already useful [14:06:17] (you could think of something sort of like hamming distance that's specific to IPs, that gives a measure of how "close" they are to being on the same network without actually knowing all the subnet info explicitly) [14:06:46] (and then sort such that it maximizes the avg distance from each list entry to the next) [14:08:27] to practically explain my previou statement: https://phabricator.wikimedia.org/P24161 [14:08:29] hmm that actually works if you just treat IPs like raw numbers in network order and then go for average distances, too [14:09:03] note the 2 groups in ulsfo because of the missing host in the middle ;) [14:09:13] heh yeah [14:10:45] I can't think of an efficient way to numerically sort for maximizing the gaps between neighbors, though [14:10:45] also, technically speaking NodeSet is a set, so has no concept of ordering [14:10:51] probably any such sort scales terribly :) [14:43:43] volans: we need a hall of shame for awesome* one-liners [14:43:53] (*: in both senses of the word including and especially the original) [14:44:11] eheheh :) [15:11:18] cdanis: I have some properly awful ones lying around [15:11:27] same! [15:12:34] https://github.com/wtsi-ssg/ceph-disk-utils/blob/dbec53ae865196199c4f15fdc602c83efc6a57ea/ceph_remove_failed_osd.sh#L101-L105 [15:13:12] (in my defence, that is preceded by a lengthy comment explaining it) [15:18:25] the use of jq <<< '{}' to emit JSON output is kindof vile too [18:37:14] https://github.com/grafana/grafana/pull/35104 [18:37:31] Too bad the redirects are now broken. Our dashboards are linked in quite a few places, including e.g. other people's blog posts externally.