[08:27:55] <_joe_> Emperor, Amir1: do you need anything to start working on https://phabricator.wikimedia.org/T408062 on my side? [08:41:13] _joe_: I'm reading about Asana, presumably this comes under "Engineers might need access ... They have been asked to provide or contribute to a status update in Asana (e.g., an Engineer is providing a hypothesis status update for a Product Manager who is on vacation)." ? [08:43:10] ( from https://office.wikimedia.org/wiki/Product_%26_Technology/Asana_Guide ) [08:43:44] <_joe_> Emperor: yeah as I said, your manager should be able to get you access [08:43:47] in terms of the work itself rather than Asana, no I think I'm good ATM [08:44:23] <_joe_> yeah Asana is just so that you can write the reports directly instead of having to use me as a proxy, but either way is ok really. [08:48:03] OK, I've sent techsupport a mail and cc'd kwaku.ofori [10:01:34] marostegui: if you are happy with it I'm ready to run the decommission cookbook on es2026 unless we want to use the host for something else [10:18:23] federico3: let me check [10:32:57] Amir1: discussion elsewhere reminds me, are you OK for me to roll-restart the ms frontends as yet? [11:16:52] marostegui I mentioned to tappof that losing the text of MySQL alerts about what went wrong with mysql checks was probably a big issue for you, the dbas, but I told him to talk to you directly to discuss it, as mine was a subjective appreciation and in the end not the person on the receiving end of those [11:19:27] marostegui: to further illustrate what jynus is talking about, you can take a look at these alert instances generated by the NRPE wrapper https://w.wiki/Fpky [11:20:07] here you'll find a log link that will take you to the NRPE plugin output [11:21:49] (Clearly specific to this alert) [11:27:50] So my suggestion was to at least see if it could be exposed by bots [11:36:49] federico3: you can go ahead with es2026 [11:37:13] Emperor: some were done but not all, let me check [11:37:14] ok [11:37:23] jynus tappof thanks I am checking now [11:38:28] tappof: where is the link? [11:38:45] I keep failing at finding alert manager UI any usable :( [11:39:54] marostegui: At the bottom of every alert, you should see two labels: 'logs' and 'runbook'. The 'logs' one is the one you're interested in [11:40:58] https://usercontent.irccloud-cdn.com/file/UNCouUvE/grafik.png [11:42:32] Emperor: codfw is done [11:42:40] feel free to reboot those frontends [11:43:13] Amir1: ack, thanks. [11:43:45] tappof: ah yes thanks [11:45:12] unfortunately no examples on the last 30 days [11:45:32] Anyway, where is this discussion happening? I am jumping into it without much context other than what jaime provided earlier [11:47:36] marostegui: I can see entries for both alerts in the last 15 minutes: https://w.wiki/FqLX [11:47:49] marostegui: I'll share you the task [11:48:35] tappof: [11:48:35] No results found [11:48:43] that's what I get and with your above link too [11:50:32] That's interesting... meanwhile, here's the link to the comment describing the current POC/MVP implementation: https://phabricator.wikimedia.org/T350360#11139874 [11:51:32] thank you [11:52:25] FIRING: [2x] SystemdUnitFailed: swift_dispersion_stats.service on ms-fe2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:54:49] [consequence of the reboot, I've given them a kick] [11:57:25] RESOLVED: [2x] SystemdUnitFailed: swift_dispersion_stats.service on ms-fe2009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:57:38] et voila [12:02:39] sorry marostegui, the previous short link was wrong: could you try this one instead https://w.wiki/FqLs ? [15:05:54] marostegui: do you have a preference on what to do on https://phabricator.wikimedia.org/T391581 ? [15:16:12] I'll reply in a bit