[09:16:57] 06Traffic, 06Data-Persistence, 10MediaViewer, 10SRE-swift-storage, 10Thumbor: FY 25/26 WE 5.4.7 Standardize thumbnail sizes - https://phabricator.wikimedia.org/T408062#11532571 (10MatthewVernon) There is probably further documentation improvement; the former I've updated with the new set of standard... [09:19:45] 10netops, 06Infrastructure-Foundations, 06SRE, 06Data-Platform-SRE (2026.01.05 - 2026.01.23), 07Essential-Work: Socket leaking on some dse-k8s row C & D hosts - https://phabricator.wikimedia.org/T414460#11532578 (10JAllemandou) Unfortunately the problem is not solved as shown in [[ https://grafana.wikime... [09:32:38] 10netops, 06Infrastructure-Foundations, 06SRE: Offline script - adjust to work with fundraising - https://phabricator.wikimedia.org/T414321#11532621 (10ayounsi) a:05cmooney→03Jclark-ctr @Jclark-ctr we had a look at the decom cookbook and offline script without seeing any smoking gun on why it would misbe... [09:44:51] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations, 13Patch-For-Review: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11532749 (10brouberol) @MoritzMuehlenhoff re `cn=ops,ou=groups,dc=wikimedia,dc=org` understood! The LDAP/Ai... [09:52:04] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations, 13Patch-For-Review: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11532864 (10brouberol) [10:41:59] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Servers exposing incorrect LLDP info - https://phabricator.wikimedia.org/T250367#11533161 (10ayounsi) > Just to understand this point - do you mean that their firmware doesn't expose them because it is old etc.. or because they are supermic... [11:26:48] 06Traffic, 06Infrastructure-Foundations, 06serviceops: Ownership of the sre.deploy.hiddenparma cookbook - https://phabricator.wikimedia.org/T383809#11533270 (10MLechvien-WMF) AFAICT this does not fall in the scope of cookbooks Serviceops maintains so removing that tag, but please reach out to me if any doubts. [11:26:59] 06Traffic, 06Infrastructure-Foundations: Ownership of the sre.deploy.hiddenparma cookbook - https://phabricator.wikimedia.org/T383809#11533271 (10MLechvien-WMF) [11:36:10] 06Traffic, 06Data-Engineering, 06Infrastructure-Foundations: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11533322 (10brouberol) https://airflow-sre.wikimedia.org has been deployed! {F71570205} Give it a good hour for the ATS change... [13:53:03] dzahn was on PTO this week right? [13:58:37] FYI, I're upgrading Bird on the remaining hcaptcha-proxy* hosts to 2.18 (the pops on routed ganeti already use that version) [13:58:38] yes vgutierrez [13:59:00] ack... I've cleaned up nft prometheus leftovers on tcp-proxy instances [13:59:10] `sudo -i cumin 'A:tcpproxy' 'rm /var/lib/prometheus/node.d/firewall-running.prom'` [13:59:36] I'll forward the info, thanks! [13:59:37] it was triggering a stale prom alert [14:07:01] moritzm: thanks! [14:07:13] let us know when you are done, I can help check quickly [14:08:02] codfw and ulsfo are done and look good to me so far [14:08:31] (based on the anycast dashboards) [14:22:01] moritzm: looks good, thanks! [14:22:33] ack, I'll resume with eqiad/eqsin/drmrs shortly [14:33:39] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE: cp5022 is unreachable - https://phabricator.wikimedia.org/T414411#11534114 (10ssingh) Hi @RobH: Thanks for following up on this. Any update from the `eqsin` folks? [15:28:23] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: rest gateway: implement cost-based rate limits - https://phabricator.wikimedia.org/T412586#11534332 (10matmarex) [15:28:29] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: Enforce API rate limits (WE5.1.3c) - https://phabricator.wikimedia.org/T412585#11534335 (10matmarex) [15:28:39] 06Traffic, 06MW-Interfaces-Team, 06serviceops, 07Epic, and 3 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11534337 (10matmarex) [15:44:17] 10netops, 06Traffic, 06Infrastructure-Foundations: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11534422 (10cmooney) p:05Triage→03Medium [17:03:09] 06Traffic, 06MW-Interfaces-Team, 06ServiceOps new, 07Epic, and 3 others: Epic: API Rate Limiting Architecture - https://phabricator.wikimedia.org/T399291#11534778 (10MLechvien-WMF) [17:06:35] the hcaptcha update is complete, I'll prep patches to also initially enable 2.18 for the DNS nodes in ulsfo, and when that works fine we can also extend this to DNS at large, ok? [17:07:11] none of these are on routed Ganeti, but I think it makes sense if we use the same Bird release across all services [17:08:38] yep [17:08:39] thanks! [17:59:36] 10netops, 06Traffic, 06Infrastructure-Foundations: magru hosts (erroneously) reported down due to TTL exceeded - https://phabricator.wikimedia.org/T414473#11534914 (10ssingh) No new incidents of this happening observed here as well, thanks! [18:22:04] 06Traffic, 06SRE, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11534944 (10ssingh) @cmooney: Per the discussion above with Arzhel, we think that `2a02:ec80:53::1/128` is better for readability and consistency with over v6 records, than the current `2a02:ec80:... [18:58:53] 06Traffic, 06SRE, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11535000 (10cmooney) >>! In T81605#11534944, @ssingh wrote: > @cmooney: Per the discussion above with Arzhel, we think that `2a02:ec80:53::1/128` is better for readability and consistency with oth... [19:01:20] 06Traffic, 06SRE, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11535001 (10ssingh) Many thanks @cmooney 🙏! I will go ahead with `2620:0:860:53::1/128` for `ns1` and update that everywhere in the current CRs. [19:43:40] 06Traffic, 06SRE, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11535037 (10cmooney) As discussed I think a good way to bring this live might be: # Update the puppet repo to make the authdns boxes announce the new IPs at all sites # Merge the patch to enable... [20:02:01] 06Traffic, 06SRE, 13Patch-For-Review: Offer AuthDNS service over IPv6 - https://phabricator.wikimedia.org/T81605#11535054 (10ssingh) >>! In T81605#11535037, @cmooney wrote: > As discussed I think a good way to bring this live might be: >[...] Sounds like a plan and it makes sense -- we can test everything o... [20:57:36] 06Traffic, 06Commons: HTTP 429 error on original image requests on Commons (iOS app by default hiding the Referrer header) - https://phabricator.wikimedia.org/T413570#11535105 (10Jonesey95) I corresponded with the e-mail address listed on the error page, and I received a response from Giuseppe Lavagetto that t... [20:58:14] 06Traffic, 10Wikimedia-Site-requests, 07Logos: logos/manage.py failing due to 429 (thumbnail steps) - https://phabricator.wikimedia.org/T414048#11535107 (10Jonesey95) I corresponded with the e-mail address listed on the error page, and I received a response from Giuseppe Lavagetto that this was a bug. It was... [22:00:32] 06Traffic, 10MediaWiki-Debug-Logger, 06SRE, 06MediaWiki-Platform-Team (Q3 Kanban Board): Pass through information about the client from the CDN to MediaWiki to Logstash - https://phabricator.wikimedia.org/T412396#11535200 (10Tgr) Code-wise this is done. Should probably update some dashboards.