[06:53:30] 06Traffic, 10conftool, 13Patch-For-Review: FY 24/25 WE 4.3.11 Define a policy for maintenance of requestctl rules - https://phabricator.wikimedia.org/T393381#10957256 (10Joe) 05Open→03Resolved I will resolve this task because the policy is established and we did the first round of cleanups. I will st... [10:14:11] 06Traffic, 06Experimentation Lab: Block requests to /evt-103e/v2/events with no Edge Unique - https://phabricator.wikimedia.org/T398181#10957866 (10Vgutierrez) a:03Vgutierrez [10:43:39] o/ I have some cleanup for thumbor/kartotherian that requires an LVS restart. Would some time soon suit? https://gerrit.wikimedia.org/r/c/operations/puppet/+/1161485 [11:00:01] * vgutierrez looking [11:00:23] hnowlan: yeah [11:09:43] thanks! \ [11:43:05] 06Traffic, 10MediaWiki-Core-AuthManager, 06MediaWiki-Platform-Team: [WE5.5.3] Decide how to expose session information to infrastructure layers in front of MediaWiki - https://phabricator.wikimedia.org/T394012#10958204 (10Tgr) a:03Tgr [12:54:13] 06Traffic: Append requestctl rule name to X-Analytics header in HAProxy - https://phabricator.wikimedia.org/T397917#10958469 (10Vgutierrez) p:05Triage→03Medium [13:41:39] hnowlan: are you taking care of restarting pybal on lvs2013|lvs2014? [13:43:32] I'm asking cause I'm gonna need to restart pybal in lvs2014 as well [13:43:48] there is a pending restart alert too fwiw. [13:43:56] I think that is probably from hnowlan's run [13:45:30] yep [13:56:52] hnowlan: ping? :) [13:59:18] vgutierrez: https://puppetboard.wikimedia.org/report/lvs2014.codfw.wmnet/5078c21b15f402a619e8c844b0e177adb3df4dda I would say restart it? [13:59:29] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/0c8a9e1a20070d92f8abb3ba11677b5bf88c0ddc%5E%21/#F0 [14:04:34] yeah... I'll take care of that [14:10:42] (done) [14:10:54] thanks [14:11:34] I'm proceeding with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164466 [14:12:12] cool, gl :) [14:12:36] I got cumin! :D [14:14:56] I'll reload the config in a secondary LVS in both liberica and LVS and I'll check that the rate of healthchecks going via haproxy healthcheck backend is the same [14:16:20] and that the cluster is still flagged as healthy of course :D [14:17:41] sounds like a plan [14:19:59] https://www.irccloud.com/pastebin/wuQ2ekPG/ [14:20:04] all healthy [14:20:12] let's see if haproxy metrics are still happy [14:22:09] hmmm nope [14:22:10] https://grafana.wikimedia.org/goto/VTZsFZsHg?orgId=1 [14:22:20] vgutierrez: agh, apologies - I was on lunch [14:22:34] hnowlan: no problem [14:22:39] I indeed forgot codfw, my bad [14:24:26] sukhe: something is wrong with this chunk of config https://www.irccloud.com/pastebin/j0BZlwBo/ [14:24:35] lol of course [14:24:43] vgutierrez: hmm acl hc-path path_beg /varnish-fe [14:24:43] acl hc-unique-path path_beg /varnish-fe-hc-5ebea9 [14:24:44] it isn't hc-host but hc-allowed-src [14:24:45] * vgutierrez dummy [14:27:09] sukhe: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1165035 [14:29:03] ! [14:34:25] ran puppet in cp4045 [14:34:30] let's see if the metric recovers tehre [14:34:32] *there even [14:34:46] https://grafana.wikimedia.org/goto/fidh5ZsHR?orgId=1 [14:34:48] sweet [14:35:34] nice [14:35:48] I'll trigger a puppet run on A:cp-upload [14:36:04] and let text catch up at the regular pace [15:24:39] 06Traffic, 06SRE Observability, 13Patch-For-Review, 07sre-alert-triage: Alert in need of triage: AlertLintProblem (instance localhost:9123) - https://phabricator.wikimedia.org/T396321#10959156 (10tappof) 05Open→03Resolved a:03tappof [15:24:51] 06Traffic, 06SRE Observability, 13Patch-For-Review, 07sre-alert-triage: Alert in need of triage: AlertLintProblem (instance localhost:9123) - https://phabricator.wikimedia.org/T396320#10959159 (10tappof) 05Open→03Resolved a:03tappof [15:28:39] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: network and DNS configuration for new eqiad frack pay-lb servers - https://phabricator.wikimedia.org/T397865#10959219 (10Jgreen) [15:52:35] 10netops, 06cloud-services-team, 10DNS, 06Infrastructure-Foundations, and 2 others: Cloud: define relationship between wikimediacloud.org domain, CIDR prefixes and netbox automation - https://phabricator.wikimedia.org/T266331#10959423 (10ayounsi) 05Open→03Declined Closing for now, please reopen if... [20:05:14] 06Traffic: 429 Error from cp5022 when accessing Wikimedia project - https://phabricator.wikimedia.org/T397804#10960728 (10Aklapper) Hi, which Wikipedias is this about? [22:01:39] 06Traffic, 10Mobile-Content-Service, 06Wikipedia-Android-App-Backlog: [[2025 Coeur d'Alene shooting]] showing old version in Android app - https://phabricator.wikimedia.org/T398243#10961137 (10jeremyb-phone) more tags because this probably isn't an Android bug. I guess something isn't being purged somewhere?... [22:02:27] 06Traffic, 10Page Content Service, 06Wikipedia-Android-App-Backlog: [[2025 Coeur d'Alene shooting]] showing old version in Android app - https://phabricator.wikimedia.org/T398243#10961140 (10jeremyb-phone)