[07:49:36] fyi, we're going to depool esams in 10-ish minutes, that will be the final depool before the migration [07:51:23] please don't forget to not use bast3006/3007 anymore, next working one will be 3001 or 3008 I guess [08:06:11] bast3007 was still WIP and not fully setup, I decommed it yesterday and will reinstall it on one the new knams Ganeti clusters [08:08:26] noted [08:46:05] FYI folks esams depool is in 15 mins time... we got our UTC mixed up here in netops :) [10:54:47] I am getting prometheus puppet errors like this: "parameter 'network_devices' entry 'asw-a-codfw' unrecognized key 'manufacturer'" [Profile::Netbox::Data]. I am worrying this could affect further prometheus deployments. Not sure if I sould ask netops or foundations? [10:55:26] it started failing around 19h UTC yesterday [10:56:03] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=prometheus4002&service=puppet+last+run [10:56:35] maybe something on netbox got updated and some puppet code needs change? [10:56:56] jynus: ill take a look [10:57:51] only surfacing it because it may be affecting some fixes dcaro is doing on monitoring (unrelated) [10:58:22] I saw that on a PCC run today, for pormetehus1005 [10:59:12] dcaro: between pormetehus and harpoxi, not sure if on mobile, dislexya or typing too fast :-D [10:59:26] :-P [10:59:29] that's just me xd [11:00:02] https://puppet-compiler.wmflabs.org/output/948092/42841/prometheus1005.eqiad.wmnet/prod.prometheus1005.eqiad.wmnet.err <- the failed pcc [11:00:31] to me it sounds like some new key was added somewhere but code handling it wasn't [11:00:59] just checking but i think it shuld be fixed i had forgot to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/931930 [11:01:06] but I didn't see anything obvous on hiera [11:01:23] ah, nice catch, jbond! [11:02:27] thanks! [11:02:39] yes looks working now [11:02:42] it wasn't a big issues but I think it was blocking further deployments on that cluster [11:02:49] thank you! [11:03:47] and because it was a missing parameter, my git grep didn't find that line [12:31:42] anyone able to review a small change to a pybal related file https://gerrit.wikimedia.org/r/c/operations/puppet/+/948127/1/modules/pybal/files/pybal-eval-check.py ( _joe_ perhaps?) [13:19:39] jbond: +1ed, I was a bit puzzled while checking the umask, 0o077 vs 0o77 but they seem to be the same [13:24:11] elukey: cheers [14:56:19] jbond: any way to easily filter log levels on puppetboard? can't seem to figure it out from the interface [14:57:02] or how to do an exact match for "err" I guess :) [14:58:00] sukhe: i didn;t see a way, yuo can sort by status which puts all the err ones at the top but thats not always that usefull either as you loose the time order [15:00:07] jbond: yeah :( [16:54:11] I could use an extra pair of eyes if someone wants to on why we are seeing monitoring for ns2-v4 when it seems to have gone from everywhere [16:58:47] sukhe: We got page 3942 that may be related to it. Is there any way I can help? [16:59:08] denisse: thanks, cdanis ACKed it and downtimed as well [16:59:28] the help I am guessing I need is to figure out why is it still showing up in icinga [16:59:33] when even the puppet runs confirm it was removed [16:59:46] downtiming is fine but this should not be there [17:01:04] Let me check... [17:01:07] thank you [17:06:54] denisse: can the source of this be some other repo? [18:13:57] the fix for this was to reload icinga, which I assumed happened already but a manual one fixed it [21:28:55] !log rebooting wikitech-static-ord via rackspace UI [21:28:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log