[02:00:35] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:49:17] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:59:17] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:44:17] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:05:35] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:04:17] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:20:03] 10netops, 06Infrastructure-Foundations: BGP status (instance cr2-eqord) - April 2024 - Equinix peering AS15830 - https://phabricator.wikimedia.org/T363895#9959755 (10ayounsi) 05Open→03Resolved a:03ayounsi [08:28:07] We would like to inform you, contact(s) of us.wmf5, that your LIR is entitled to free tickets to attend and participate at the RIPE 89 Onsite Meeting. RIPE 89 will be held in Prague, Czechia from 28 Oct – 1 Nov 2024 [08:30:13] wow nice [08:48:17] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9959960 (10MatthewVernon) Is this important enough for us that we should think about spending WMF time on updating the version in Debian? [honest question] [08:53:45] elukey: last year (or the year before I don't remember) was in Rome, but we didn't go :D [08:54:26] buuuuu [09:06:53] no pad for today's meeting? [10:24:24] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9960309 (10ayounsi) > Is this important enough for us that we should think about spending WMF time on updating the version in Debian? I'm not familiar with Debian packaging, but I guess it depends on how mu... [10:36:32] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9960366 (10elukey) I am in favor of spending time to update the Debian's version, but reading https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=985047 doesn't give a lot of hope (the package seems somehow a... [10:56:19] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9960412 (10MatthewVernon) We have a number of WMF SREs who have uploading rights to Debian (who could, assuming the existing maintainer isn't opposed, upload a new version), be that as a one-off or as a des... [10:59:34] 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9960423 (10ayounsi) A one-off would be good enough for us, as we especially need the few latest commits. I agree we shouldn't "own" it as it's not a critical part of our infra. [11:03:50] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#9960439 (10LSobanski) There is {T369341} but the switch to new outgoing MX servers only happe... [11:44:19] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9960538 (10ayounsi) @papaul it should be fixed now The issue is with https://github.com/wikimedia/operations-puppet/blob/bce727b5d3031a5f8850e22c38cdbe0416b246a2/modules/profile/manif... [13:25:33] 10netops, 06Infrastructure-Foundations, 06SRE: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches - https://phabricator.wikimedia.org/T366941#9960830 (10Papaul) @cmooney the 18th works for me thanks. [13:27:08] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Management routers to 22.4R3-S2 - https://phabricator.wikimedia.org/T369504 (10cmooney) 03NEW p:05Triage→03Medium [14:05:39] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#9961121 (10MatthewVernon) [it might be worth making sure we emit emails with a deliverable re... [14:34:18] volans I'm trying to run `cookbook -sre.wdqs.reboot --query 'search-loader1002.eqiad.wmnet' --task-id T366555 --reason 'security updates'` but I'm getting `"Selected hosts ({hosts}) must be all be query service hosts for the same dataset` ... any suggestions? [14:36:17] inflatador: probably not passing this? https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/__init__.py#L22 [14:36:36] That error is broken [14:36:48] XioNoX thanks, it's a PEBKAC error [14:36:57] I'm trying to run this cookbook against a non-wdqs host [14:37:03] ;( [14:37:07] sorry to bug everyone [14:37:16] also yeah, the error message is problematic https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/__init__.py#L30 [14:37:28] also I'm not here : [14:37:33] missing f-string, as well as `{hosts}` is not defined [14:37:39] (as I'm in data-persistence unti end of october ;) ) [14:37:51] doubly sorry for buggin' ya then ;) [14:37:56] and the cookbook doesn't use the current batch classes but is an old one-off one [14:38:11] Agreed, this cookbook needs help...will mark a task for that as well [14:38:38] I think we already have some examples of using the batch classes but if anyone has a ticket or CR feel free to link [14:40:06] XioNoX: it's using a format() rather than f-string [14:40:15] Which technically should work. [14:41:06] The error message is kind of baffling, but that might just be me coming back from a 4-day weekend ;) [14:41:23] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#9961299 (10jhathaway) p:05Triage→03High [14:41:57] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#9961304 (10jhathaway) I'll take a look at this today [14:41:59] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#9961305 (10jhathaway) I'll take a look at this today [14:42:03] inflatador: it's weirdly worded [14:42:31] It means the selection must be all wcqs hosts or all wdqs hosts [14:42:41] Not a mix of wc/dqs [14:42:49] Or not either of them [14:43:01] Getting off train [14:43:24] yeah, I get it now. Will still add rewording to the ask [15:23:17] Tbh it only makes sense cause I read the code. Definitely could be better. [15:36:31] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410#9961563 (10Volans) @elukey do you know how much of an effort would it be to change library ba... [16:02:03] 10netops, 06Infrastructure-Foundations, 06SRE: Move asw-c-codfw and asw-d-codfw CR uplinks to Spine switches - https://phabricator.wikimedia.org/T366941#9961749 (10cmooney) [16:05:26] XioNoX: oooh https://github.com/netbox-community/netbox/issues/14554 [16:06:10] niiiice!!!! [16:06:31] you're now an official Netbox developer [16:06:35] it's all yours from now on! [16:06:36] :D [16:06:47] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9961762 (10cmooney) [16:07:19] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9961779 (10cmooney) [16:07:46] hahaha, nah I'm good [16:08:02] wasn't an offer, was an affirmation ;) [16:08:17] but it should be there the day we upgrade prod to 4.0.7 [16:15:35] FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:14:17] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [17:37:22] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, and 2 others: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9962238 (10jhathaway) I agree that decoupling makes sense and that it is worth the effort to try and run the current script on the p... [17:51:36] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, and 2 others: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9962378 (10CDanis) +1 [18:01:32] 10SRE-tools, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 10Spicerack, and 2 others: Migrate puppet merges to a cookbook - https://phabricator.wikimedia.org/T366355#9962431 (10CDanis) >>! In T366355#9954359, @elukey wrote: > I've also checked what puppet-merge does behind the scenes, and the gist... [18:14:17] RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed