[06:32:07] hello folks, there are some BFD alerts for cr{3,4}-uslfo, all related to dns4004 [06:32:27] ah still in setup, okok so nothing to worry about [06:48:51] jbond: o/ ok to re-enable puppet on bast1003? [07:36:12] hello, can someone change the -operations topic to set clinic duty to me? thanks [07:39:53] XioNoX: done [07:41:15] thx [08:55:36] elukey: yes done sorry about that [09:37:42] jbond: <3 [12:11:58] it is puppet question time! I need to select hostnames from an hash like this https://phabricator.wikimedia.org/P35393 one per rack+row, I tried within puppet though I can't mark a given rack/row as "seen" since I can't modify existing hashes, my next avenue would be in a ruby function, thoughts ? [12:22:07] <_joe_> godog: you need one per rack/row? [12:22:36] <_joe_> ok so my solution would be an ugly map/reduce chain in puppet I think [12:23:44] <_joe_> something like [12:24:33] _joe_: correct yeah, one per rack/row [12:25:34] <_joe_> $inverted = $hash.map |$k,$v| {{$v['rack'] => $k}}.reduce({}) |$memo, $val| {$memo.merge($val)} [12:25:56] <_joe_> this gives you an hash of rack (which is really rack/row) => list of hosts [12:26:23] <_joe_> then if you want you can just get the first element of each key [12:27:24] <_joe_> $inverted.map |_,$v| {$v[0]} [12:27:34] <_joe_> ofc this is not a random list [12:28:04] sigh, ok I think that'd work and I'll try it [12:28:06] thank you _joe_ [12:28:21] <_joe_> godog: ehhh I just realized [12:28:31] <_joe_> .merge will overwrite the hash keys IIRC [12:28:31] godog: we can also modify the exported data to the format needed.. [12:28:42] <_joe_> so you don't even need to extract the first element [12:29:38] volans: that's true we could, I'd like to first try and bend puppet to the data rather than the other way around [12:36:10] <_joe_> godog: added a paste [12:37:56] cheers _joe_ ! that works indeed [12:41:55] hi, git/beta cluster question if anyone has a moment — can I cherry-pick https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Phonos/+/841488 to the beta cluster? If so, would I pick to `/srv/mediawiki-staging/php-master/extensions/Phonos` ? [12:42:31] (tl;dr, I don't want to +2 it as it needs testing on the beta cluster to know if it fixes the issue) [12:47:25] _joe_: godog: fyi yuo can make that hash.map.reduce look a bit nicer with: [12:47:28] Hash($hash.map |$k,$v| {[$v['rack'], $k]}).values [12:47:32] * jbond added to paste [12:49:16] +++++++++++++++++++++ [12:49:30] * jbond cat ^^^ [12:50:03] cat: '^^^': No such file or directory [12:50:09] the software or the animal? :-P [12:50:09] lol thank you jbond [12:51:08] hytg [12:51:27] the animal apprently she wants to chat today [12:51:36] cat the animal (or feline) would be 'f' [12:53:52] hnowlan, urandom: restbase1028 is unresponsive, the host is up, but logins fail over both SSH and the serial console, known issue? (nothing in system event log) [12:56:19] TheresNoTime: tl;dr is that it's possible but hard and disruptive [12:57:54] taavi: I crossed my fingers and +2'd :) [13:17:19] moritzm: no, not known - I will restart it [13:17:39] Will have a look at it first but that doesn't sound good [13:19:34] when I tried the serial console it would still initiate the login, but then stalled, so sounds like extremely high load or so [13:22:50] it's dropped out of cassandra connectivity since it became unresponsive in icinga and hasn't come back, not sure there's much to be done other than reboot and try to figure out what happened [13:23:25] hav eyou checked HW logs/errors? [13:23:46] https://wikitech.wikimedia.org/wiki/SRE/Dc-operations/Platform-specific_documentation/Dell_Documentation#Show_logs [13:25:12] yeah, there was nothing in SEL [13:39:54] not sure there's much that can be done for it at this point [13:45:18] <_joe_> reboot it? [13:51:34] Sorry to trouble you. Since yesterday I've suddenly started having trouble building from our base golang docker images. apt signatures fail like this... [13:51:38] https://www.irccloud.com/pastebin/Hrtplgop/ [13:52:23] btullis: sounds like your local copy of the base image (docker-registry.wikimedia.org/bullseye or similar) is outdated [13:52:24] Could anything have changed here, or am I doing something wrong? [13:52:29] btullis: there is also [13:52:31] 09:52:13 <+icinga-wm> PROBLEM - puppet last run on restbase1028 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [13:52:46] can this be lreaed? [13:52:51] related [13:52:54] nvm, I don't think this is [13:53:38] OK, thanks. purging all base bullseye images and trying again. [13:53:55] of course replace bullseye with the actual distro that image is using [13:55:30] Thanks. Yep, it is using bullseye. [13:55:33] btullis: haven't seen that before, but my hunch is that maybe too restrictice seccomp filters restrict utime() or similar? [13:55:45] that would at least explain the fallout [13:56:14] btullis: hmm.. I didn't look properly and thought you were building the image and not using it. so probably not what I said :/ [13:56:26] although checking that the image is up to date is still a good first step [13:59:48] +1 Yeah, my paste wasn't very clear but it's areduction of the problem. I'm building a new image *from* the golang1.15 image but it's consistently failing at `RUN {{ "git curl" | apt_install }}`since yesterday. [14:00:22] Purging all my local images for good measure. [14:02:55] <_joe_> btullis: let me try to reproduce [14:04:26] <_joe_> btullis: cannot reproduce [14:04:34] _joe_: Thanks. If this turns out to be a 'reboot workstation' issue I will kick myself (and the workstation). [14:04:35] <_joe_> with docker-registry.wikimedia.org/golang1.15:latest [14:05:54] <_joe_> docker run --rm -ti --user root --entrypoint /bin/bash docker-registry.wikimedia.org/golang1.15:latest [14:06:05] <_joe_> remove the local image first [14:10:45] Yep, after purging all local images it's working again. I had tried removing the golang and bullseye named images without effect, but now it's fine. Thanks again and sorry for the noise. [14:44:51] there is a question on slack for serviceops [16:00:17] as head's up, we're troubleshooting eqiad row D networking issue [17:04:15] you're meant to use systemd::unit with override => true for unit override files, but that systemd::unit forces the use of $title as the service unit name, which means you're effectively limited to one override file per service [17:04:54] grumble grumble [17:05:17] there should really be a separate systemd::override resource type [17:50:26] ori: you can use the `override_filename` parameter to change have multiple overrides [17:51:23] https://github.com/wikimedia/puppet/blob/production/modules/systemd/manifests/unit.pp#L69-L73 [18:19:07] do we have any shared configs or templates for rsylog & logrotate, it seems most modules just roll their own bespoke config? [18:28:20] jbond: I don't think that would work, because the override dir must include the unit name, which (currently) must be the resource title. [18:31:50] ori: its perhas a badly named paramter but it is expected to take just a filename not a fullpath. it then builds the path using "${override_dir}/${override_filename}" where $override-is the correct patch e.g. $override_dir = "${systemd::override_dir}/${unit_name}.d" [18:32:22] *where $override_dir is the cprrect path [18:35:21] maybe I'm missing something, but for 'bar.conf' to override something for unit 'foo', it must be placed in /etc/systemd/system/foo.service.d/, and for it to be placed there, the resource title must be either 'foo' or 'foo.service' [18:39:48] ori: ahh i think i see what you mean, it wont work because you will get a deplicat resource definitions. [18:40:00] duplicate e.g. https://phabricator.wikimedia.org/P35415 [18:41:52] ill take a proper look tomorrow but i think something like this should fix the issue https://gerrit.wikimedia.org/r/c/operations/puppet/+/841577 [18:59:24] * jbond will also add a systemd::override helper (https://gerrit.wikimedia.org/r/c/operations/puppet/+/841577/2/modules/systemd/manifests/override.pp) [23:29:44] Any known issues with PCC? This change shouldn't be a noop: https://puppet-compiler.wmflabs.org/pcc-worker1002/37511/ [23:59:36] cwhite: not a bug but a limitation -- PCC doesn't diff file contents copied with `source =>`, it just diffs the paths