[01:17:46] 10netops, 06Infrastructure-Foundations: codfw: upgrade routers (2026) - https://phabricator.wikimedia.org/T417871#11832284 (10Papaul) [09:18:01] 10CAS-SSO, 10netbox, 06Infrastructure-Foundations: Unable to log in to Netbox - https://phabricator.wikimedia.org/T373702#11832777 (10MoritzMuehlenhoff) 05Open→03Resolved >>! In T373702#11801704, @Southparkfan wrote: > As a member of the `netbox-readonly-access`, I can still not log in to NetBox. Loo... [09:25:04] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11832785 (10brouberol) a:03brouberol [09:26:13] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11832790 (10brouberol) @elukey I'd appreciated guidance as to how to build the new `docker-report` deb package, and deploy it. T... [10:14:25] FIRING: SystemdUnitFailed: netbox_ganeti_eqsin02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:36:13] 10netbox, 10netops, 10SRE-tools, 10bacula, and 2 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the configured directory - https://phabricator.wikimedia.org/T423689 (10jcrespo) 03NEW [10:36:53] 10netbox, 10netops, 10SRE-tools, 10bacula, and 2 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the configured directory - https://phabricator.wikimedia.org/T423689#11833112 (10jcrespo) Let me know if this box or any other requires investigation. [10:38:56] 10netbox, 10netops, 10SRE-tools, 10bacula, and 2 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory - https://phabricator.wikimedia.org/T423689#11833117 (10jcrespo) [11:34:04] Weird netbox validator bug: if I set this cable to "connected", with label 1099, it fails with "Error: unable to find cable's site". While both side's have a site properly defined. - the errors is from https://github.com/wikimedia/operations-software-netbox-extras/blob/master/validators/dcim/cable.py#L48 maybe topranks, or elukey, have an idea on what the issue is? [11:40:55] XioNoX: what is the cable in Netbox? [11:41:12] topranks: mr1-codfw- to the OOB circuit [11:42:31] topranks: the previous cable got deleted when the uplink got moved from ge-0/0/5 to 0/0/7, so I've re-created it [11:43:00] XioNoX: hmm ok [11:43:30] I was speculating it was a repeat of the weird bug we hit before with cables, but that was on ones that had been "moved", a brand new one shouldn't hit that [11:44:25] FIRING: [7x] SystemdUnitFailed: netbox_ganeti_codfw_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:46:53] yeah, I'm running out of ideas.. [11:49:25] FIRING: [8x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:54:25] FIRING: [8x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:59:25] FIRING: [8x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:03:18] XioNoX: that one is a bit of a head scratcher [12:03:40] you probably already did but trying in nbshell suggests it should be ok [12:03:44] https://www.irccloud.com/pastebin/uSZpIXZG/ [12:04:18] That old task - T410455 - was to do with terminations somehow still referencing an "old" device but yep shouldn't be related here as this is a brand now cable. [12:04:18] T410455: lsw1-d6-eqiad outage Nov 18 2025 - https://phabricator.wikimedia.org/T410455 [12:08:37] thx, I'll keep diging [12:09:25] FIRING: [7x] SystemdUnitFailed: netbox_ganeti_codfw_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:14:25] RESOLVED: [3x] SystemdUnitFailed: netbox_ganeti_eqsin02_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:16:09] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11833335 (10elukey) I already had the docker-report repo checked out in my home dir on build2002, so I pulled your changes and r... [12:25:57] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11833361 (10elukey) The docker-report run now works, but it happened the same also tonight EU time (no growthbook error reported... [12:40:06] topranks: I manually fixed the cable in nbshell, but yeah no idea at what's going on [12:58:51] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11833429 (10brouberol) I had modified the script on disk with `PYTHONPATH=''` to run the whole command, to ensure our change wou... [12:59:17] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11833432 (10brouberol) Thank you @elukey for your assistance in the review, build and release process! I think we can now close... [13:01:35] 10SRE-tools, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): debmonitor-client crashes for growthbook image - https://phabricator.wikimedia.org/T423413#11833440 (10elukey) 05In progress→03Resolved Thank you for the code fixes! [13:40:06] 10netbox, 10netops, 10SRE-tools, 10bacula, and 2 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory - https://phabricator.wikimedia.org/T423689#11833550 (10ayounsi) We're not doing Netbox CSV dumps anymore. So you can remove that directory from backups. [13:56:55] 10netbox, 10netops, 10SRE-tools, 10bacula, and 3 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory - https://phabricator.wikimedia.org/T423689#11833584 (10jcrespo) That was the only thing being backed up. ` bacula::director::fileset { 'netbox':... [14:00:28] 10netbox, 10netops, 10SRE-tools, 10bacula, and 3 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory - https://phabricator.wikimedia.org/T423689#11833590 (10ayounsi) Yeah, Postgres is where all the data are. So +1 to not backup anything on the frontends. [14:02:43] 10netbox, 10netops, 10SRE-tools, 10bacula, and 3 others: netbox2003 backups (maybe others?) are missconfigured or failing to find the backup directory - https://phabricator.wikimedia.org/T423689#11833597 (10jcrespo) p:05Triage→03Medium a:03jcrespo [16:39:02] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN:Switch refresh diagram and wiring - https://phabricator.wikimedia.org/T423724 (10Papaul) 03NEW [18:59:38] 10Mail, 06Infrastructure-Foundations, 10MediaWiki-Email, 10MediaWiki-extensions-EmailAuth, and 4 others: Could not send confirmation email: Unknown error in PHP's mail() function. - https://phabricator.wikimedia.org/T383047#11834654 (10TAndic) Hi all, I just experienced this error as well as a few communit... [20:30:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:35:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed