[08:48:06] I've been a bit of an idiot and set my Hosts: tag in a commit to a way too large set, it's going to take ages to run, is there something on how I can kill a job [08:48:08] ? [08:49:28] Ah, I wasn´t logged in [08:50:29] claime: hit the small x button on the UI :D [08:51:06] volans: Yeah I wasn't logged into jenkins and was frantically looking for it lol [08:51:21] :) [09:31:26] claime Amir1 FYI I'm merging the decom of mgmt interfaces from icinga at https://gerrit.wikimedia.org/r/q/topic:bug%252FT310266 I'm expecting a worst case of icinga failing its configuration [09:31:50] wait I'm on call? [09:31:53] ack [09:31:55] thanks [09:32:02] Amir1: Yes, we are :D [09:32:12] lol [09:32:18] I should check the rooster more often [09:32:39] You took over for v.gutierrez [09:33:07] godog: ack, thanks for the heads up [09:33:08] no, roster, rooster is something else [09:33:22] Amir1: Yes, rooster is a male hen [09:33:25] A cock if you will [09:33:29] :P [09:34:36] 🤭 [09:36:40] Back in a minute, coffee mainline running low [09:41:32] back [10:32:02] hashar: looks like contint1001 crashed again, I'll powercycle it and !log cc mutante [11:37:16] godog: was it a hardware crash? [11:37:31] well if it required a powercycle I guess it was [11:45:22] DIMM_A1 seems bogus [11:45:42] but the host is also way out of warranty [11:46:01] and also way over our usual refresh period [11:47:53] replacement hardware is already around: https://phabricator.wikimedia.org/T313832 [12:40:31] hashar: correct yeah, what volans said [13:04:03] * claime afk lunch [14:56:44] PSA: haven't had time to investigate yet but I think that our fix to https://phabricator.wikimedia.org/T313603 is no longer working?? filed https://phabricator.wikimedia.org/T324466 [14:57:03] (for clarity, that's off-hours pages hitting batphone ~immediately instead of only after 5 minutes) [15:06:35] cdanis: :( [15:17:55] sobanski: I found out: update is scheduled of backup1001 & backup2001, which is not surprising [15:18:38] Yeah, I misread the spreadsheet as the hosts had a line with an unrelated host between them [15:19:07] I was thinking it was something else on codfw I didn't know about [18:03:09] volans: ACK, thank you. finally I could answer questions like that and learned about superset dashboard [18:05:01] yw :) [19:07:28] godog: btw, if there's a desire to reduce these kinds of rules, the resourceloader ones we probably don't need anymore now that we use Thanos. https://gerrit.wikimedia.org/g/operations/puppet/+/6c1ebc884b8b90645934ad3c5c7bbe6d90679064/modules/profile/files/prometheus/rules_ops.yml#236 [19:07:54] Before actual removal, I'll need to double check if we have any alert rules or secondary dashboards using it, but the main ResourceLoader dashboard I've converted to not rely on these anymore.