[00:25:13] 10Traffic: LVS hosts have missing metrics even though PyBal never goes down - https://phabricator.wikimedia.org/T353760 (10Reedy) [00:27:16] 10Traffic: LVS hosts have missing metrics even though PyBal never goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) The old title was more descriptive IMO - The metric, pybal_monitor_down_results_total, is missing specifically when PyBal never goes down. [00:29:42] 10Traffic: LVS hosts have missing metrics when PyBal never goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) [00:37:05] 10Traffic: LVS hosts have missing metrics when PyBal never goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) a:05BCornwall→03None [00:51:52] 10Traffic: LVS hosts have missing metrics when PyBal never goes down - https://phabricator.wikimedia.org/T353760 (10Reedy) >>! In T353760#9417365, @BCornwall wrote: > The old title was more descriptive IMO - The metric, pybal_monitor_down_results_total, is missing specifically when PyBal never goes down. It doe... [01:04:02] 10Traffic: pybal_monitor_down_results_total metric only created when PyBal goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) [01:04:11] 10Traffic: pybal_monitor_down_results_total metric only created when PyBal goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) Hopefully this helps! [07:59:03] moritzm: I'm ready for restarting pdns-recursor on wikidough and dns hosts [08:38:17] ack, thx [08:41:54] 👍 [09:58:40] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10Fabfur) [10:00:35] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10Fabfur) [10:05:11] moritzm: pdns-recursor restarted on all hosts [10:15:54] thx! [10:38:13] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10Volans) The cookbook defines the restart of bird in the `post_action()` that is called after `action()`, but the check for icinga being optimal is part of the `action()` o... [13:10:31] 10Traffic, 10SRE, 10SRE-swift-storage, 10MediaWiki-Platform-Team (Radar), 10Performance Issue: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10TheDJ) >>! In T211661#8377883, @Ladsgroup wrote: > The best part: We don't even pre-generate thumbnails for these... [16:23:15] 10Traffic, 10PyBal: pybal_monitor_down_results_total metric only created when PyBal goes down - https://phabricator.wikimedia.org/T353760 (10Vgutierrez) [17:24:38] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10BCornwall) 05Open→03In progress p:05Triage→03Medium a:03BCornwall [18:25:10] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977 (10cmooney) > it seems this limitation does not apply to 22.2 which we are using in codfw. An update on this. It seems that we do have this bug in 22.2, but we don't... [18:35:21] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977 (10cmooney) [18:42:45] 10Traffic, 10Data-Engineering, 10Observability-Logging, 10Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Ottomata) To do this migration plan ^, we'd need Kafka jumbo to support 2x webrequest volume while we migrate. Let's check with Data Platform... [19:18:39] hello all, i am working to roll out a new community-crm service. it is installed on a prod vps (crm2001) and will have traffic routed through the CDN. [19:18:59] as this is my first time doing this, i want to make sure i'm not missing anything. [19:20:02] i have the following patchsets that i think will cover most of what is needed, but looking for advice on what to do or what i may have missed: https://gerrit.wikimedia.org/r/c/operations/puppet/+/983951 https://gerrit.wikimedia.org/r/c/operations/dns/+/983950 [21:49:56] 10Traffic, 10PyBal: pybal_monitor_down_results_total metric only created when PyBal goes down - https://phabricator.wikimedia.org/T353760 (10BCornwall) It might be worth investigating if we can exclude this alert from the linter; There shouldn't be any adverse effects with this problem (which is somewhat pedan... [22:18:27] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10BCornwall) Some time ago we discussed stopping/starting bird.service via systemd dependencies - T336792 is related. @ssingh Do you recall why we didn't implement BindsTo=... [22:18:30] 10Traffic: sre.dns.roll-restart-reboot-wikimedia-dns cookbook sometimes cannot remove downtime - https://phabricator.wikimedia.org/T353779 (10BCornwall) 05In progress→03Open