[07:35:36] hi folks [07:36:01] just sent another email to Singtel, the uslfo - eqsin transport is still flapping afaics [07:37:16] elukey: thanks, they followed up to me over the weekend, I'll check what are the latest [07:37:57] XioNoX: ah snap ok, I quickly checked on cr4-ulsfo and the bfd session was flapping so I thought to just ping them again [07:38:21] elukey: it's fine, don't worry! thanks for keeping an eye on it [07:58:51] elukey: replied and CCed noc@ on the ticket [08:23:53] I updated the Singtel NOC contacts on Netbox as well [08:28:13] super thanks [12:19:16] <_joe_> if you had to detect from within some code if you're running inside a docker contianer or not, what would you do? [12:22:00] _joe_: I would probably start by looking at virt-what: https://packages.debian.org/bullseye/virt-what - This detects a docker environment as well as hypervisors. [12:22:24] <_joe_> btullis: yeah I should look at what they do [12:29:25] Best guess: looking closely at DMI info [12:30:52] virt-what just seems to be testing for the presence of a `/root/.dockerinit` file: https://salsa.debian.org/libvirt-team/virt-what/-/blob/debian/sid/virt-what.in#L340 [12:31:27] and just had a look at what systemd-detec-virt does; it tests for the presence of /.dockerenv [12:31:29] Not sure if that's still reliable these days though. https://superuser.com/a/1021925/38404 [12:35:20] <_joe_> it's not [12:48:52] yeah it seems that people in addition look for the names in /proc/1/cgroup like https://stackoverflow.com/a/69860299 [12:53:55] <_joe_> volans: that's implementation-dependent [12:54:25] great [12:54:30] <_joe_> but yeah, this is mostly an academic question, I'll just detect the user we're running as is "wikimedia" and we're running in "/wikimedia" as cwd [13:14:39] fyi on podman+crun, virt-what returns "lxc", while systemd-detect-virt returns "podman" [13:15:16] (I was toying with that a few months ago) [13:17:31] (that's with bullseye) [14:45:14] Am I right that if I want to make /path/to/somewhere/file.ext in puppet I have to make the directory /path/to/somewhere myself (and likewise recursively up to / if the dirs won't be there otherwise)? [14:46:03] Emperor: see mkdir_p ;) [14:46:09] does what it sunds [14:46:11] *sounds [14:46:18] correct, although there's a helper wmflib::dir::mkdir_p that might be helpful [14:48:09] thanks :) [16:26:52] miss you razzi ;) [16:27:20] :) good to meet up in the sre meeting [16:29:17] Does anybody know why this haproxy command would be giving permission denied: `echo 'set server clouddb1018.eqiad.wmnet state drain' | sudo socat /run/haproxy/haproxy.sock stdio` gives `Permission denied` [16:30:11] razzi: if tha'ts related to teh cookbook, it does run as root currently, so no need to add sudo if that helps [16:30:42] volans: ah yeah I was just testing the command manually, the cookbook has no sudo [16:31:36] on what node is the command being executed ? [16:31:40] I mean the cumin target [16:32:06] elukey: dbproxy1018.eqiad.wmnet [16:33:00] I already ok'd the updating of the views from data-persistence so now's a fine time to try to depool it manually, then update the cookbook [16:35:06] was the command used before? I mean, it is in any guide used by the cloud team? [16:36:59] not yet, it is supposed to be replacing the manual process of editing hieradata/hosts/dbproxy1018.yaml and reloading haproxy [16:37:49] but was it tested somewhere with haproxy? Or is this the first time that you are running it? (trying to get the context) [16:38:19] also I have never used dbproxy1018, what is the blast radious if it goes down or haproxy gets into a weird state? [16:39:16] (for example, netcat or similar tools were tested instead of socat? etc..) [16:40:25] This is the first run, was perhaps overly optimistic to run the whole cookbook [16:40:47] I am very ignorant with socat, but I am puzzled by the stdio at the end.. this is why I asked if it ran somewhere before prod :) [16:40:57] the command was never tested on a test instance of haproxy?!? [16:41:44] it was not, this was my oversight. Fortunately, nothing happens, the host that is supposed to be depooled is still pooled [16:42:43] ack, but please make sure to test all the untested commands in some test/sandbox environment in isolation, so that you're sure they are correct [16:43:26] If haproxy on dbproxy1018 gets into a weird state, queries to the public wikireplicas would fail, so toolforge bots and quarry (https://quarry.wmcloud.org/) would not work [16:43:27] sorry gotta go afk now for a bit [16:44:02] sg, nothing urgent here [16:45:53] razzi: I'd suggest to go with the puppet version of the procedure, that is safer (and IIUC already battle tested) and then experiment with haproxy on a local set up or similar [16:45:59] anybody have any experience making grafana dashboards for ephemeral jobs? [16:46:23] (I mean for depooling clouddb1018) [16:46:25] i finally have metrics in prometheus via push gateway, but making dashboards is going to be a litle weird because the value in pushgateway only changes after the next job run [16:46:36] sounds good elukey, I'll do the manual way for this round of updates [16:46:53] razzi: ack let us know if you need help [16:47:53] razzi: lemme try one thing [16:49:02] <_joe_> sorry I just read the backlog [16:49:33] <_joe_> if we want to make the pooled/depooled state of haproxy backends programmable, I'd try to think if there is a way to do it using conftool [16:52:34] +1 --^ [16:52:50] razzi: the command is wrong, I see on various guides that things like `echo "show stat" | socat unix-connect:/run/haproxy/haproxy.sock stdio` work [16:54:01] elukey: `echo 'show stat' | sudo socat /run/haproxy/haproxy.sock stdio` works as well; it's not the socat part [16:54:49] good signs that testing is needed then :) [16:54:53] let's go with the puppet way [16:56:04] <_joe_> in general, we can't keep the pooled/depooled state of a backend solely in the proxy; that introduces easily inconsistencies and more importantly they might not be persisted across restarts [16:57:36] it is a very good point that needs some follow up (the state is already in puppet and can be changed, so conftool seems a very nice follow up) [16:57:39] razzi: --^ [16:58:07] indeed, it sounds like conftool is the way to go [16:58:08] Is there some environment that already has a test haproxy, or should I go about setting it up in cloud services? [16:59:12] <_joe_> razzi: ofc going with conftool means we have to see how to integrate it with haproxy, that might take some elbow grease I fear :) [17:01:23] razzi: for the current issue - haproxy's unix socket doesn't accept unauthenticated admin commands unless explicitly told to in the config [17:02:21] got it elukey [17:03:20] Thanks _joe_ volans elukey for the input, back to the drawing board I go, this time with more information [17:03:20] But first, I'll finish off this round of updates the manual way [17:13:10] razzi, elukey: thanks for loving the wiki-replicas <3 [18:30:03] answering my q earlier about ephemeral job dashboards in grafana: TIL about State timeline visualizations! [18:30:18] they only change the viz when the value itself changes! [18:58:34] I have one last task on my onboarding: " Add to Exim mail aliases (root via private.git:modules/privateexim/files/wikimedia.org)" - any idea how I can do this? [19:05:40] inflatador: have you made changes in the private puppet repo before? [19:07:07] rzl Indeed! [19:07:17] oh good! this will be easy then :) [19:07:47] "modules/privateexim/files/wikimedia.org" is the path of a file in private puppet -- in that file, the line starting "root:" is the config for the alias you're looking for [19:08:04] as long as you're comfy editing private puppet, all you have to do is add yourself there [19:08:19] Ah, thanks for the direction! Will give it a shot [19:30:35] inflatador: lgtm :) [19:31:53] rzl awesome, I can now close my onboarding ticket (let's not think too much about why it's taken 3 months ;P )