[02:54:55] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp3073:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [06:06:19] 06Traffic, 06SRE: Regression: Reading spam blacklists of all projects suddenly returns status 429 on fifth consecutive read - https://phabricator.wikimedia.org/T369414#9964011 (10Count_Count) @bd808 Thank you! I have started using the Wikimedia REST API instead (see https://phabricator.wikimedia.org/T369414#99... [06:54:55] FIRING: VarnishPrometheusExporterDown: Varnish Exporter on instance cp3073:9331 is unreachable - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/000000304/varnish-dc-stats?viewPanel=17 - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [07:42:01] FIRING: PurgedHighBacklogQueue: Large backlog queue for purged on cp3073:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=esams%20prometheus/ops&var-instance=cp3073 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [07:44:41] ^^ expected [07:49:14] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Do we need ping offload servers at all POPs? - https://phabricator.wikimedia.org/T345809#9964152 (10ayounsi) 05Open→03Declined Closing this task as afaik we haven't seen any issue in esams, and the proper path forward is tracked in {T367973}... [07:49:49] 06Traffic, 06SRE: reprovision ping VM in esams - https://phabricator.wikimedia.org/T345743#9964168 (10ayounsi) 05Open→03Declined Closing this task as afaik we haven't seen any issue in esams, and the proper path forward is tracked in {T367973}. Please re-open if you disagree. [07:58:39] 06Traffic, 06Data Products, 06SRE: Data Quality - requestctl not getting set - https://phabricator.wikimedia.org/T342577#9964185 (10Joe) 05Open→03Resolved a:03Joe I'll boldly consider this task resolved, please reopen it if the problem is still present. [08:06:09] 06Traffic, 10conftool, 10Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009#9964221 (10Joe) [08:06:10] 06Traffic, 10conftool: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks. - https://phabricator.wikimedia.org/T369480#9964222 (10Joe) [08:42:01] RESOLVED: [2x] PurgedHighBacklogQueue: Large backlog queue for purged on cp3073:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=esams%20prometheus/ops&var-instance=cp3073 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [09:30:20] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606 (10Joe) 03NEW [09:56:49] 06Traffic, 10conftool: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#9964737 (10Vgutierrez) please note that we won't have visibility via `X-Requestctl` of traffic impacted by these rules in the TLS termination layer till T351117 is completed [11:55:24] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965147 (10Marostegui) databases are ready [12:55:46] Hi folks! Just a reminder that some of your cloud-vps VMs are about to age out. Please have a loot at https://phabricator.wikimedia.org/T360710 and clean up as needed. [12:56:13] (Also the entire project is not yet claimed on https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2024_Purge ) [12:58:02] thx for the heads up andrewbogott [14:33:47] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965612 (10hnowlan) kubernetes* and mw* are ready [14:54:07] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6a298ae5-e736-4051-8220-9ec4f352950a) set... [15:00:26] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965719 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=39fcbcd0-8c16-4208-ac06-f4b442e55a54) set... [15:03:19] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965737 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2a5cb43e-793c-4103-9499-369354315479) set... [15:23:29] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965800 (10cmooney) Upgrade complete, all looks good network side at first glance, all online hosts are pingable again. [15:30:16] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965849 (10cmooney) Switch upgrade completed without issue. All connected hosts are back online and responding to p... [15:30:40] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965851 (10MatthewVernon) Swift looks OK, thanks. [15:32:22] 06Traffic, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'traffic' cloud-vps project - https://phabricator.wikimedia.org/T360710#9965865 (10Vgutierrez) [15:34:02] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965869 (10Marostegui) Repooling databases [15:58:20] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966009 (10Scott_French) Thanks for the excellent / detailed write-up, @ssingh! I have a couple of concerns with reuse the node entity type to represent geodn... [16:15:03] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966053 (10ssingh) Thanks for the feedback, @Scott_French! Comments in-line: >>! In T369366#9966009, @Scott_French wrote: > Thanks for the excellent / detaile... [17:22:24] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966347 (10CDanis) Is your only concern about a new type the bad UX of needing to pass `--object-type`, @ssingh ? If so, I think that we could and should do s... [17:47:42] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966434 (10ssingh) >>! In T369366#9966347, @CDanis wrote: > Is your only concern about a new type the bad UX of needing to pass `--object-type`, @ssingh ? > >... [18:00:55] andrewbogott: Thanks for the reminder. Regarding puppetmaster, it seems we're still running buster in prod: What is the expectation for cloud instances? [18:02:13] you have an old 5.x puppetmaster and a new 7.x puppetserver (which I think I built.) Probably VMs are already migrated over to the new one, let me check... [18:03:02] Oh yeah, I remember you doing that. Thanks [18:03:29] yeah, looks like. If you find things using the old one it's easy to switch it over, just add puppetmaster: traffic-puppetserver-bookworm.traffic.eqiad1.wikimedia.cloud in hiera [18:03:50] Is it breaking your apples:apples comparison with prod to be running puppet 7 in cloud-vps? [18:04:24] heh, I don't imagine we're actually apples:apples. It was more a curiosity [18:04:27] it's a bit of a mess in there [18:04:54] It sounds like we can just can the buster puppetmaster at this point? [18:05:04] (not sure who's on today so I'm not tagging anyone) [18:05:25] brett: are we sure that all the other VMs are pointing to the bookworm one? [18:05:34] It's set in the hiera prefix [18:05:43] Yeah, shut it down at least and see if anything breaks [18:05:48] ok then we should purge that as well [18:05:56] I already removed the dnsbox one since we don't use it [18:06:59] The cpupload and cptext instances I made long ago aren't pointing to it (they're not pointing to... anything? I think I didn't know what I was doing). Those can probably just be killed off [18:17:40] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966509 (10ssingh) Reworking my own thought process on this, the current definition for this change (assuming Node) looks like: ` eqiad: geodns: generic... [18:19:35] andrewbogott: The two buster instances we still have are shut down and will be deleted pending final confirmation with v that it's not needed any more. [18:19:43] Thanks again for the poke - I had been meaning to do it but it slipped [18:20:09] thanks for taking care of it! [18:24:16] 06Traffic, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'traffic' cloud-vps project - https://phabricator.wikimedia.org/T360710#9966519 (10BCornwall) 05Open→03In progress p:05Triage→03Medium a:03BCornwall @Vgutierrez: traffic-cache-atstext-b... [18:27:13] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966558 (10Scott_French) Thanks, @ssingh! Ah, interesting - I wasn't aware of the prior art with dnsbox. Indeed, reusing node for a fundamentally "host shaped... [19:29:40] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9966803 (10ssingh) >>! In T369366#9966558, @Scott_French wrote: > Thanks, @ssingh! > > Ah, interesting - I wasn't aware of the prior art with dnsbox. Indeed,... [20:38:49] 06Traffic, 10DNS, 10fundraising-tech-ops, 06SRE, 13Patch-For-Review: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9967143 (10Dzahn) [20:39:20] 06Traffic, 10DNS, 10fundraising-tech-ops, 06SRE, 13Patch-For-Review: Cleanup unused DNS subdomains - https://phabricator.wikimedia.org/T367012#9967145 (10Dzahn) Done with the part I could handle. Just leaves benefactors now. [21:26:19] 07HTTPS, 06Traffic, 10MediaWiki-Action-API, 10MediaWiki-REST-API, and 4 others: Proposal: fail explicitly and revoke relevant API keys over plain-text HTTP connection for all Wikimedia APIs - https://phabricator.wikimedia.org/T368344#9967335 (10Tgr) [21:27:46] 07HTTPS, 06Traffic, 10MediaWiki-Action-API, 10MediaWiki-REST-API, and 4 others: Proposal: fail explicitly and revoke relevant API keys over plain-text HTTP connection for all Wikimedia APIs - https://phabricator.wikimedia.org/T368344#9967332 (10Tgr) I think this wouldn't be very useful as a security measur... [21:49:05] 06Traffic, 06SRE, 13Patch-For-Review: Migrate DNS depooling of sites from operations/dns (git) to confctl - https://phabricator.wikimedia.org/T369366#9967398 (10Scott_French) Thanks, @ssingh! In short, and I realize this doesn't help much, my understanding is that what makes sense as an object name vs. an o... [21:59:09] 06Traffic, 10DNS, 10fundraising-tech-ops, 06serviceops, 06SRE: redirect benefactors.wikimedia.org (was: Cleanup unused DNS subdomains) - https://phabricator.wikimedia.org/T367012#9967449 (10Dzahn) [22:01:59] 06Traffic, 10DNS, 10fundraising-tech-ops, 06serviceops, 06SRE: redirect benefactors.wikimedia.org (was: Cleanup unused DNS subdomains) - https://phabricator.wikimedia.org/T367012#9967466 (10Pppery) T367012#9874025 - the original title of this ticket was to redirect benefactors before I expanded it so we'... [22:03:06] 06Traffic, 10DNS, 10fundraising-tech-ops, 06serviceops, 06SRE: redirect benefactors.wikimedia.org (was: Cleanup unused DNS subdomains) - https://phabricator.wikimedia.org/T367012#9967470 (10Dzahn) hah! ;) not wrong though. The other stuff is done and want to clarify what's left:)