[09:29:35] errand+lunch [13:15:26] o/ [13:21:33] dcausse pfischer I'm hijacking our pairing today to go over some OpenSearch migration stuff w/the DPE SREs. Y'all are welcome to join, but just a heads-up since it might not be that relevant [13:22:10] inflatador: np [13:27:34] had to restart opensearch_1@relforge-eqiad-small-alpha.service on relforge1003, somehow it got stuck not willing to accept relforge1008 [13:31:42] dcausse I one-offed that one to test what would happen when a non-existent host was added to master config, guess I forgot to put it back ;( [13:32:02] no worries! [13:59:26] \o [14:00:35] .o/ [14:01:58] o/ [14:08:20] rewrote the mjolnir bits in airflow...mixed feelings :P On the one hand it's much more like everything else, but there was a small niceity in the old way that it was very clear how data flowed and how we could change things like swap dbn for a different labeling algo, or norm_query for a different clutering algo [14:08:28] but on the other hand, we've never done that...perhaps useless featre [16:02:32] workout, back in ~40 [16:25:13] taavi are you making changes to switches in CODFW? I'm having issues with a reimage/vlan move and p-apaul that `user taavi` was in the output. If you wanna join us in #wikimedia-dcops we're having the discussion there [16:25:29] oops, wrong room ;( [16:25:51] just pinged taavi in #sre [16:38:18] OK, the reimage stuff is sorted...working out for real now ;) [17:25:33] back [17:28:32] yet another reimage failed due to bad regexes :( [17:46:48] ryankemper I started on the ferm changes we talked about yesterday but haven't had time to do much, LMK if you wanna take over https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1137045 [17:54:00] we should probably force shard reallocation after that [18:57:33] back [19:07:14] next batch starting (elastic[2063,2077,2079]) [19:08:18] * inflatador may have won the prize for most linting errors in 4 lines of code [20:19:01] ryankemper I polished up https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1137045 but it's still pretty terrible. I might try and spin up that new spicerack REPL they mentioned at the last SRE mtg and give it a whirl [20:21:29] I’ll take a look in 10’ [20:23:14] if ferm is not resstarted something else might be wrong as ferm::rule notifies service[ferm] as per modules/ferm/manifests/rule.pp [20:26:34] agreed, something is wrong with the ferm notifier. I've seen it happen a few times in the past, but I dunno why it's happening every single time [21:41:20] volans: I think the issue is the ferm config itself isn't changing; rather the hostname that it's resolving is [21:42:22] e.g. ferm has the equivalent of `@resolve(cirrussearch2055.codfw.wmnet elastic2055.codfw.wmnet)` and until our rolling-operation cookbook runs the rename for elastic2055->cirrussearch2055 it's not going to resolve cirrussearch2055 [23:03:04] ryankemper patch up for conftool https://gerrit.wikimedia.org/r/c/operations/puppet/+/1137086 . I also noticed that dc ops checked out cirrussearch2091 which had a hw failure (ref T391639 ), so I'm trying to reimage it now [23:03:05] T391639: Comm Error: Backplane 0 on cirrussearch2091 (Row/Rack A7) - https://phabricator.wikimedia.org/T391639 [23:18:58] The PXE was set to boot off the wrong NIC. I set it to the right NIC and it found the TFTP server, but threw an error. Trying again with HTTP [23:44:14] yup, it can't seem to download from the install server...just keeps saying the PXE file is zero bytes [23:44:57] I can download the file from the install server, to the install server...not that that proves much [23:45:17] maybe a NIC firmware update is in order, but I'm pretty sure I already did that. Anyway, will hit it tomorrow. G'nite! [23:45:32] Server IP address is 208.80.153.105 [23:45:32] NBP filename is http://208.80.154.10/efiboot/snponly.efi