[09:21:53] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [10:54:02] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10cmooney) >>! In T344547#9108360, @ayounsi wrote: > Some downsides I can think off: additional config, more complex to troubleshot (more prefixes in the routing t... [13:58:49] 10netops, 10Infrastructure-Foundations, 10SRE: Announce internal/core routes from CRs to L3 switches - https://phabricator.wikimedia.org/T344547 (10ayounsi) Cool, thanks for the details, makes sens to use `prefix-limit` with `teardown` then, maybe some timeout so it automatically recovers and double check ou... [14:26:42] (SystemdUnitFailed) firing: anycast-healthchecker.service Failed on doh1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:30:24] ^ will resolve [14:31:42] (SystemdUnitFailed) resolved: (2) anycast-healthchecker.service Failed on doh1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:42:13] (SystemdUnitFailed) firing: (6) anycast-healthchecker.service Failed on doh1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:42:28] (SystemdUnitFailed) resolved: (5) anycast-healthchecker.service Failed on doh1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:59:12] (SystemdUnitFailed) firing: (6) anycast-healthchecker.service Failed on doh4001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:02:27] (SystemdUnitFailed) resolved: (6) anycast-healthchecker.service Failed on doh4001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:24:55] 10Traffic: Investigate why Traffic SLO Grafana dashboard has negative values on combined SLI - https://phabricator.wikimedia.org/T341606 (10BCornwall) 05In progress→03Stalled Setting as stalled: Any opinions on how to go about this with my above comment in mind? Thanks! [19:37:09] 10Traffic, 10SRE, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10BCornwall) I'm not sure that a smaller period does fix things. Attached is a 5m and 2m. Switching to irate() is showing similar things, too. {F37627399} {F37627398} [20:50:53] 10Traffic, 10Thumbor: Cannot download large (3GB) PDF files from commons - https://phabricator.wikimedia.org/T341755 (10BCornwall) I am able to reproduce. My work laptop with a throughput of ~300Mbps was able to download it just fine: ` < last-modified: Wed, 12 Jul 2023 07:30:13 GMT < accept-ranges: bytes <...