[00:36:25] 06Traffic, 10MobileFrontend, 06Reader Experience Team: Decide how to configure $wgMobileUrlCallback during mobile domain sunset - https://phabricator.wikimedia.org/T400855 (10Krinkle) 03NEW [01:07:55] 06Traffic, 10MobileFrontend, 06Reader Experience Team: Decide how to configure $wgMobileUrlCallback during mobile domain sunset - https://phabricator.wikimedia.org/T400855#11048762 (10Krinkle) In reviewing [text-frontend.inc.vcl.erb](https://gerrit.wikimedia.org/g/operations/puppet/+/c334b0dc5e1284be8c6796d6... [05:28:16] 06Traffic, 06SRE: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11048975 (10Joe) >>! In T400119#11047688, @DavidBrooks wrote: > @Joe I wasn't addressing AWB used as a bot, but as an interactive Windows app. Still, the rest of your comment seems applicable. T... [07:37:59] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11049050 (10jeremyb-phone) please send advanced notice of changes like this to wikitech-ambassadors and make sure it gets tagged #User-notice on phab. thank you! [11:21:10] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11049531 (10Joe) [11:25:26] 06Traffic, 06SRE, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881 (10Joe) 03NEW [11:26:13] 06Traffic, 10MediaWiki-File-management, 06SRE, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049563 (10Joe) a:05Joe→03None [12:06:18] 06Traffic, 10MediaWiki-File-management, 06SRE, 07User-notice: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049785 (10Peachey88) [12:09:37] 06Traffic, 10MediaWiki-File-management, 06SRE: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11049796 (10taavi) (#user-notice, not relevant to editors of Wikimedia wikis.) [12:13:42] 10netops, 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 06SRE: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114#11049843 (10taavi) 05Open→03Resolved I /think/ this is done for cloudvirts and ceph nodes are tracked separately... [12:41:46] 06Traffic: New software: ProxyTester - https://phabricator.wikimedia.org/T400244#11049956 (10Fabfur) >>! In T400244#11044566, @CDanis wrote: > Just a note that the existing config validity tests in puppet `modules/profile/files/cache/haproxy/tests` have been broken by `bullseye-backports` no longer existing on m... [12:55:09] 06Traffic, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06SRE: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11050004 (10A_smart_kitten) (assuming this will also apply to QuickInstantCommons, rem... [14:35:50] 06Traffic, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06SRE: Make InstantCommons and other uses of ForeignApiRepo use policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11050371 (10Bawolff) > Provide a way to override the UA used by ForeignFileRepo to add... [14:43:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:48:40] FIRING: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [14:48:48] oh hello [15:00:27] expensive requests? [15:02:27] yeah pretty much. a specific IP but should be recovering soon. [15:03:34] not a specific IP [15:03:49] the whole cluster got impacted [15:04:41] but only cp5032 was completely overwhelmed [15:04:41] oh OK, I was just looking at cp5032 since it alerted above [15:04:46] to the point that it got depooled [15:04:56] vgutierrez@lvs5005:~$ journalctl -u liberica-cp.service --since=today [15:04:56] Jul 31 14:31:13 lvs5005 libericad[438785]: time=2025-07-31T14:31:13.208Z level=INFO msg="detected healthcheck state change" service=upload-httpslb_443 hostname=cp5032.eqsin.wmnet address=10.132.0.16 healthcheck_name=HTTPCheck healthcheck_id=3133991392 healthcheck_result_old=true healthcheck_result=false [15:04:56] Jul 31 14:31:13 lvs5005 libericad[438785]: time=2025-07-31T14:31:13.208Z level=INFO msg="detected healthcheck state change" service=upload-httpslb6_443 hostname=cp5032.eqsin.wmnet address=2001:df2:e500:101:10:132:0:16 healthcheck_name=HTTPCheck healthcheck_id=141935780 healthcheck_result_old=true healthcheck_result=false [15:04:56] Jul 31 14:31:18 lvs5005 libericad[438785]: time=2025-07-31T14:31:18.211Z level=INFO msg="detected healthcheck state change" service=upload-httpslb6_443 hostname=cp5032.eqsin.wmnet address=2001:df2:e500:101:10:132:0:16 healthcheck_name=HTTPCheck healthcheck_id=141935780 healthcheck_result_old=false healthcheck_result=true [15:04:56] Jul 31 14:31:18 lvs5005 libericad[438785]: time=2025-07-31T14:31:18.213Z level=INFO msg="detected healthcheck state change" service=upload-httpslb_443 hostname=cp5032.eqsin.wmnet address=10.132.0.16 healthcheck_name=HTTPCheck healthcheck_id=3133991392 healthcheck_result_old=false healthcheck_result=true [15:04:58] and within that the specific IP I was talking about [15:05:20] vgutierrez: interesting. should we have an alert for this I wonder? [15:05:26] an alert for what? [15:05:31] a single cp host being depooled? [15:05:31] I know we don't do that for Pybal [15:05:46] not single, maybe some threshold [15:06:02] meeting :D [15:06:44] essentially the only visibility there is to this is through the journal for liberica-cp. [15:06:54] that's not true [15:06:55] but I also don't know if there is value in alerting specifically [15:06:56] you got metrics as well [15:07:24] yeah but right now in the absence of alerts using those metrics [15:07:55] anyway let's talk later, enjoy the meeting [15:18:40] FIRING: [2x] VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:38:40] RESOLVED: VarnishHighThreadCount: Varnish's thread count on cp5032:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/wiU3SdEWk/cache-host-drilldown?viewPanel=99&var-site=eqsin&var-instance=cp5032 - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:39:28] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11050767 (10DavidBrooks) >>! In T400119#11048975, @Joe wrote: > Oh sorry, I've never used AWB (or windows, in the last few decades). If it's a Windows application, as in used by... [16:11:43] 06Traffic, 10HaproxyKafka, 13Patch-For-Review: HaproxyKafka alert on too many dropped messages - https://phabricator.wikimedia.org/T400684#11051042 (10Fabfur) 05Open→03Resolved a:03Fabfur [16:12:51] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11051059 (10Alien333) Where does UAs like `MediaWiki-JS/1.45.0-wmf.12`, the defaults used by a plain `new mw.Api()` in an on-wiki script, stand with this? (If they're going to... [16:26:08] 06Traffic, 06MediaWiki-Platform-Team, 10WikimediaDebug, 13Patch-For-Review: X-Wikimedia-Debug cookie not routed correctly in Kubernetes on POST requests - https://phabricator.wikimedia.org/T397439#11051132 (10Krinkle) [16:59:57] 06Traffic, 13Patch-For-Review: Migrate MarkMonitor redirection services over to ncredir - https://phabricator.wikimedia.org/T400731#11051321 (10BCornwall) 05Open→03In progress [17:02:04] 06Traffic, 10MediaWiki-extensions-QuickInstantCommons, 10MediaWiki-File-management, 06SRE: Make InstantCommons and other uses of ForeignApiRepo use WMF policy-compliant user agents - https://phabricator.wikimedia.org/T400881#11051324 (10Reedy) [17:04:18] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11051328 (10jeremyb-phone) >>! In T400119#11051059, @Alien333 wrote: > Where does UAs like `MediaWiki-JS/1.45.0-wmf.12`, the defaults used by a plain `new mw.Api()` in an on-wik... [17:31:17] 06Traffic, 10DNS, 06FR-donorrelations, 06SRE: Custom URL for survey pop-up - https://phabricator.wikimedia.org/T400278#11051448 (10CDobbins) [17:35:06] 06Traffic, 06MediaWiki-Platform-Team, 10WikimediaDebug: X-Wikimedia-Debug cookie not routed correctly in Kubernetes on POST requests - https://phabricator.wikimedia.org/T397439#11051473 (10Krinkle) p:05Triage→03Medium a:03Tgr [17:36:45] 06Traffic, 06MediaWiki-Platform-Team, 10WikimediaDebug: X-Wikimedia-Debug cookie not routed correctly in Kubernetes on POST requests - https://phabricator.wikimedia.org/T397439#11051483 (10Krinkle) `lang=irc,name=wikimedia-sre Krinkle: I can simply merge that one if that's fine. (doing now) <... [17:38:38] 06Traffic, 10Phabricator, 06SRE: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051487 (10CDobbins) [19:23:52] 06Traffic, 10Phabricator, 06SRE: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051804 (10ssingh) Hi, thanks for reporting @Novem_Linguae. The issue should be resolved now; I did a quick test but let us know if there is an... [19:28:35] 06Traffic, 10Phabricator, 06SRE: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051814 (10Michael) Woohoo! Can confirm! Thank you so much @Novem_Linguae, @ssingh and Traffic Team! 🏆 [20:10:52] 06Traffic, 10MobileFrontend, 06Reader Experience Team, 13Patch-For-Review: Decide how to configure $wgMobileUrlCallback during mobile domain sunset - https://phabricator.wikimedia.org/T400855#11051896 (10Krinkle) [20:11:48] 06Traffic, 10MobileFrontend, 06Reader Experience Team, 13Patch-For-Review: Decide how to configure $wgMobileUrlCallback during mobile domain sunset - https://phabricator.wikimedia.org/T400855#11051897 (10Krinkle) [20:44:50] 06Traffic, 10Phabricator, 06SRE: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11051974 (10AntiCompositeNumber) Still no cards on Discord, including brand new tasks. [21:29:21] 06Traffic, 10Phabricator, 06SRE: traffic from Discord and Slack unfurler service is blocked by phabricator.wikimedia.org - https://phabricator.wikimedia.org/T400540#11052148 (10ssingh) >>! In T400540#11051974, @AntiCompositeNumber wrote: > Still no cards on Discord, including brand new tasks. Sorry about th... [22:24:48] 06Traffic, 06SRE, 07User-notice: Block traffic from user-agents not honoring our policy - https://phabricator.wikimedia.org/T400119#11052410 (10Alien333) >>! In T400119#11051328, @jeremyb-phone wrote: > if they're requests in response to user clicks then they should be fine. And if not? For instance, for scr...