[00:28:49] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:28:49] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:37:36] 10netbox, 10Infrastructure-Foundations: Netbox custom validator: don't require a cable ID on "planned" cables - https://phabricator.wikimedia.org/T357259 (10ayounsi) [08:10:02] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) 05Resolved→03Open p:05Triage→03Unbreak! The revert shoul... [08:10:14] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) [08:21:02] hey folks, I've been getting puppet failure emails from the `puppet-dev` cloud vps project for the last few days, is anyone working on fixing that? [08:29:46] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:56:55] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ayounsi) >>! In T300152#9514644, @bking wrote: > @ayounsi Apologies for the trouble, I didn't realize `sretest2005` was in active use. Unfortunatel... [08:57:04] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) p:05Unbreak!→03Medium [08:57:14] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) [08:57:37] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) If you disagree with a change, it doesn't mean it has to be rever... [09:02:08] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) [09:02:27] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) p:05Medium→03Unbreak! This is **BROKEN** and causes **DATA L... [09:04:58] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) It doesn't lead to data loss... [09:06:34] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) Then how do you call the fact that the database write necessary... [09:11:52] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) If that's data loss, we are having data loss since T29884 [09:35:08] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) No, because bot edits used to be correctly highlighted on Specia... [09:44:42] 10CAS-SSO, 10Infrastructure-Foundations: OpenID Connect logout does not log out of IdP on idp.wmcloud.org - https://phabricator.wikimedia.org/T356784 (10SLyngshede-WMF) So what's happening is exactly what is suppose to be happening. The idp.wmcloud.org simply delegates authorization to idp.wikimedia.org, whi... [09:44:53] 10CAS-SSO, 10Infrastructure-Foundations: OpenID Connect logout does not log out of IdP on idp.wmcloud.org - https://phabricator.wikimedia.org/T356784 (10SLyngshede-WMF) 05Open→03In progress [09:54:58] moritzm jhathaway sent a few puppetdb/puppetserver patches your way re: pontoon and puppetserver, there's a couple of more "meaty" ones I'm happy to discuss more in depth too [09:55:54] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Joe) p:05Unbreak!→03Medium @Tacsipacsi I would ask you to keep emotions... [09:56:02] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10IKhitron) I've asked to retrieve my bot flag, which I temporarily gave away... [11:25:05] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10Patch-For-Review, 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) I don't think what you're describing will happen. Will try it aft... [12:06:11] slyngs, moritzm: FYI https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem&q=name%3DPuppetPendingCertificateRequest [12:07:30] I was going to point out that that was weird as that's not new, but it's also a month old. I'll take a look [12:07:47] thx [12:08:11] if the metric is no more we need to either adjust the alert or remove it if it's obsolete and there is another one replacing it [12:09:36] The alerting rules is kinda wonky [12:11:10] Like someone merged it wrong [12:11:44] :( [12:12:06] moritzm: for puppet certs about to expire do we take care of them or we tell the service owner to take care of them? (stat1005 in this case0 [12:12:28] IIRC the cookbook is puppet5/7 aware and should work fine [12:13:53] that's up the individual service owners usually (unless it's some emergency), so quick ping to the -analytics channel should be fine I think [12:14:19] stat1005 is still on Buster and thus Puppet 5 [12:14:29] k thx [12:15:04] {done} [12:15:09] the ping not the fix [12:16:28] Hm, local pint seems to think everything is fine [12:17:16] local pint probably doesn't check if the metric has data in prometheus in prod [12:17:30] pint can work in 3 modes IIRC [12:19:35] Oh, right and puppet_ca_pending_certificate is empty [12:29:46] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:10:12] 10CAS-SSO, 10Infrastructure-Foundations: OpenID Connect logout does not log out of IdP on idp.wmcloud.org - https://phabricator.wikimedia.org/T356784 (10CCicalese_WMF) 05In progress→03Invalid Thank you, that makes perfect sense. And, we wouldn't want to log out of all services when we log out of Catalyst.... [13:26:21] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Ladsgroup) 05Open→03Resolved I call this resolved. Let me know if you have any concerns. I con... [15:20:51] 10SRE-tools, 10Infrastructure-Foundations, 10Wikimedia-Mailing-lists, 10serviceops: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10joanna_borun) p:05Triage→03Medium [15:24:13] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410 (10joanna_borun) p:05Triage→03Medium [15:29:20] 10SRE-tools, 10Infrastructure-Foundations: Upgrade BGPAlerter to 1.33 - https://phabricator.wikimedia.org/T354998 (10joanna_borun) p:05Triage→03Low [15:29:52] 10Packaging, 10Infrastructure-Foundations: Build and package gnmic - https://phabricator.wikimedia.org/T347461 (10MoritzMuehlenhoff) p:05Triage→03Medium [15:31:02] 10netbox, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team: Netbox device location information not available on the first Puppet run of a device - https://phabricator.wikimedia.org/T347375 (10joanna_borun) p:05Triage→03Medium [15:33:02] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Netbox to 3.7.x - https://phabricator.wikimedia.org/T336275 (10joanna_borun) p:05Triage→03Medium [15:36:44] 10SRE-tools, 10Data-Persistence, 10Infrastructure-Foundations, 10Patch-For-Review: Automation to change a server's vlan - https://phabricator.wikimedia.org/T350152 (10joanna_borun) p:05Triage→03Medium [15:43:35] 10netbox, 10Infrastructure-Foundations: Netbox custom validator: don't require a cable ID on "planned" cables - https://phabricator.wikimedia.org/T357259 (10ayounsi) p:05Triage→03Low a:03ayounsi [15:47:12] 10Mail, 10Infrastructure-Foundations, 10SRE: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10joanna_borun) a:03lmata [15:47:38] 10Mail, 10Infrastructure-Foundations, 10SRE: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10joanna_borun) @lmata is it still valid issue? [15:50:08] 10Mail, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172 (10joanna_borun) 05Open→03Invalid [15:51:21] 10Mail, 10Infrastructure-Foundations, 10Observability-Alerting, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Improve outbound mail service alerting - https://phabricator.wikimedia.org/T197172 (10joanna_borun) Works with current setup if there are any outstanding issues please reopen or create... [15:52:23] 10netbox, 10Infrastructure-Foundations: Netbox CSV dumps run at the same time - https://phabricator.wikimedia.org/T262678 (10Volans) 05Open→03Declined We're not doing anymore CSV dumps, see T310615. Closing it. [15:54:14] 10SRE-tools, 10netbox, 10Infrastructure-Foundations: Netbox support for svc allocation - https://phabricator.wikimedia.org/T263429 (10joanna_borun) p:05High→03Medium [15:57:31] 10SRE-tools, 10Ganeti, 10Infrastructure-Foundations: Update makevm to include completion of the installation with the puppet runs - https://phabricator.wikimedia.org/T306661 (10joanna_borun) 05Open→03Resolved [16:04:01] thanks godog, I'll take a look at them today [16:29:46] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:59:36] 10Mail, 10Infrastructure-Foundations, 10SRE: CA App Synthetic Monitor Mail (SMTP): Connection timed out; connect(): -2 - https://phabricator.wikimedia.org/T240906 (10lmata) 05Open→03Resolved >>! In T240906#9534209, @joanna_borun wrote: > @lmata is it still valid issue? it shouldn't be, watchmouse has be... [20:07:04] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984 (10Tacsipacsi) >>! In T356984#9532727, @Joe wrote: > @Tacsipacsi I would ask you to keep emotions in... [20:29:46] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:51:50] jhathaway: nice callout on that patch, I've never seen that before personally but it does sound ugly [20:53:15] thanks