[08:47:31] akosiaris: https://github.com/Demonware/balanced [09:11:04] hmm? [09:11:22] ah, took me a bit to realize what it does [09:14:37] kinda abandoned though [09:14:44] 3 commits in a single day and then nothing [09:17:18] kube-router is a similar project, Way more mature and trustworthy https://github.com/cloudnativelabs/kube-router [10:03:27] akosiaris: o/ [10:03:43] elukey: welcome back [10:03:59] thanks! Same for you, even if I am probably a little late :D [10:06:09] :D [10:30:57] icinga is confusing me again. i merged a CR to disable notifications for a hostb (db2118). i ran puppet on the host, twice. and on alert1001, twice. and still the web ui doesn't show the host as having notifications disabled [10:31:03] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=db2118 [10:32:16] (having to run puppet on a host to tell icinga to disable notifications is still so brain-shatteringly counter-intuitive after 1.5 years here) [10:34:14] I'll take a look kormat [10:34:44] it looks like maybe the third run did the trick? [10:35:00] yeah. wtf. [10:35:31] FWIW I agree, the "monitoring" use cases solved with exported resources and puppetdb is counterintuitive [10:35:45] "third time's a charm" [10:38:19] kormat: being a codfw host I guess you might have hit T263578 (cc jbond ) [10:38:19] T263578: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 [10:39:11] * kormat stares in horror [10:41:11] Emperor: "good" news! ^ that may be what bit us when we reimaged that host together last week [10:47:47] fyi volans i have only managed to take a quick update on T263578, plan to look at it more in depth on friday and early next week if i dont get to it before then [10:47:47] T263578: puppetdb seems to be slow on host reimage - https://phabricator.wikimedia.org/T263578 [10:48:10] (: [10:55:40] jbond: ack, thx! [11:12:15] for the Emacs lovers: you can now include a new profile::emacs on roles where it is desired. for the Emacs lover and haters: yes, by default the ~ backup files are disabled :) also see: ..from 2017 https://gerrit.wikimedia.org/r/c/operations/puppet/+/377721 and now https://gerrit.wikimedia.org/r/c/operations/puppet/+/714414 [11:22:19] mutante: <3 [11:25:27] hnowlan: :) just include it in the role class of your choice and that way we don't have to agree on standard_packages or not :p [11:48:16] well that's a new and exciting failure for wmf-auto-reimage: https://phabricator.wikimedia.org/P17079 [11:48:18] volans: any idea? [11:54:49] I had this earlier today: https://phabricator.wikimedia.org/P17080 but the reimage continued and finished correctly [11:55:14] Not sure if it is somewhat related to yours [11:55:50] ah. _you_ broke it. now i understand. [11:57:15] :p [11:59:48] nngh. ah. i ran it from cumin2001, instead of cumin2002. that's probably the problem with mine [12:00:11] hurray for having a choice of 2 partially-supported hosts in codfw :( [12:01:35] I used cumin1001 [12:03:05] T276589 [12:03:06] T276589: migrate services from cumin2001 to cumin2002 - https://phabricator.wikimedia.org/T276589 [12:04:07] mutante: my point is that the db tools are only supported on cumin2001, and other more general sre tools are only supported on cumin2002, and some work fine on both [12:05:10] kormat: yea, I was just wondering what the difference is, if that is just hardware refresh or buster->bullseye or something. then found the racking ticket as resolved and then this linked from it [12:05:22] mutante: it's buster vs bullseye [12:05:35] ah, seems like both then [12:05:38] *nod* [12:05:55] with an added sprinkle of the cumin setup only reaaally supporting 2 hosts total [12:06:05] due to how data gets synced between sites [12:06:27] aha, well, I will stick to 1001 for now :) [12:06:47] yeaah [12:07:26] the part that is known to me is just the "icinga downtime failed" but then the rest finishes normally and I just have to "manually" repeat the downtime with the other cookbook [12:09:32] whatever went wrong for my run, it looks like the host isn't properly enrolled into puppet [12:11:35] (the host key isn't showing up in https://config-master.wikimedia.org/known_hosts.ecdsa, and icinga isn't picking the host either) [12:12:14] I can confirm you have it in site.pp [12:12:37] mutante: it's a reimage, so everything should be fine on that end [12:13:14] ok... it's finally shown up in eqiad's puppetdb :/ [12:13:37] (extrapolating from the fact that it's in the config-master url now) [12:14:22] [puppetmaster1001:~] $ sudo puppet cert list --all | grep db2118 [12:14:32] + "db2118.codfw.wmnet" [12:36:01] kormat: what's up? [12:36:27] * volans back from lunch [12:37:11] wmf-auto-reimage failed from cumin2001, paste above [12:37:25] i've been kinda guessing at what steps needed to be manually done to finish it [12:37:27] the host was not in puppetdb [12:37:38] when the reimage was expecting it [12:37:47] might be the above mentioned puppetdb issue in codfw [12:37:48] ah. so it is purely the puppetdb codfw slowness issue, then [12:38:02] volans: it took maybe 15mins for it to show up in eqiad [12:38:11] :/ [12:38:12] not nice [12:38:39] no actually, make that more like 30 minutes [12:38:45] what the hell [12:40:29] first puppet run finished around 11:26 i think. it didn't show up in eqiad puppetdb until 12:12 [12:40:45] that's "not good" [17:24:18] Something has changed in our buster docker base image that removes manpages from /usr/share/man which breaks stuff like our openjdk packages installing. Any ideas what it might have been? https://phabricator.wikimedia.org/T289694#7309377 Manpages are absent from /etc/alternatives/ etc too (unsurprisingly) [17:26:04] if I had to guess, the cause is https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/KSJUFA4F7YP7R6RCYRZHSQ3YPC3UJUKN/ [17:26:44] ftr we have a similar issue with toolforge images, worked around it with https://github.com/wikimedia/operations-docker-images-toollabs-images/blob/d6adb6fc2b9f8f27d4b7c494d86f54d96b97a628/jdk17-sssd/base/Dockerfile.template#L6 [17:26:51] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955619 seems like a bug in the package [17:27:46] I guess manually creating the directory is an appropriate workaround [17:29:09] yeah, spose so. I'm mostly curious as to what changed in the image itself [17:31:30] it means that no manpages get installed for any packages, which I can totally see as a benefit for keeping docker images small but it's just unusual