[03:11:41] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:41] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [06:20:07] FIRING: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:34:25] RESOLVED: SystemdUnitFailed: netbox_ganeti_magru03_sync.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:25] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:18:40] FIRING: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:35:04] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11516063 (10elukey) >>! In T250367#11511124, @ayounsi wrote: >> Is sretest2003 the only one that shows this behavior, or do we have others? I am particularly i... [13:18:25] RESOLVED: SystemdUnitFailed: update-ubuntu-mirror.service on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:56:04] 10netops, 06Infrastructure-Foundations, 06Traffic: magru hosts (erronouesly) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473 (10ssingh) 03NEW [14:56:21] 10netops, 06Infrastructure-Foundations, 06Traffic: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11516987 (10ssingh) [15:13:42] 10netops, 06Infrastructure-Foundations, 06Traffic: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11517055 (10taavi) [15:14:46] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: eqiad: rows C/D Upgrade Decom Asw Switches in Rows C & D - https://phabricator.wikimedia.org/T412525#11517056 (10cmooney) @Jclark-ctr I went to do this but it turns out we need to disconnect all the switch - switch links before the de... [15:31:37] o/ anything I can do to move https://gerrit.wikimedia.org/r/c/operations/puppet/+/1211651 & friends ahead? [15:38:11] taavi: sorry for not getting back to you on those changes, they are on my todo list, but kept getting pushed down, i'll take another look at them today [15:38:38] last time I worked on them I had trouble duplicating the issue in a test environment [16:20:43] 10netops, 06Infrastructure-Foundations, 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Epic: SDS 1.3.8 Review network constraints of 100% sampled instruments - https://phabricator.wikimedia.org/T414487 (10Milimetric) 03NEW [16:24:43] same here, also haven't found the time to properly look into these [17:34:37] 10netops, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11518247 (10JAllemandou) [18:28:41] 10netops, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11518529 (10cmooney) Huh yeah this is quite odd alright. Taking dse-k8s-worker1011 and dse-k8s-worker1013 as two example hosts to... [19:45:35] 10netops, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11518808 (10CDanis) I took a quick look at the state of sockets on dse-k8s-worker1010, since FIN_WAIT_1 is //not// supposed to stic... [19:52:09] 10netops, 06Infrastructure-Foundations, 06Data-Platform-SRE (2026.01.05 - 2026.01.23): Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11518838 (10cmooney) Thanks @cdanis, yeah in terms of the TCP state machine I wasn't quite sure how the apparent packet loss transl...