[00:29:41] swfrench-wmf: looks good for the failoid bits (I do remember we did it before but not sure for what). I think we can discuss the "hard turn down" bit to avoid alerts. specifically I think we should follow the steps in https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service [00:30:27] like for example when you remove the DYNA's from the wmnet and the associated mocks, there is an authdns-update step to push the change out [00:31:14] and then once you remove it from service.yaml (production -> lvs_setup), that then again requires a run on A:dnsbox but an agent run (for the templates) [00:32:25] anyway, some minor details but no concerns I think for setting the swift-ro failoid bit. and what about swift-rw though? [00:33:40] nvm, I see the commit message mention and it being a/p and already failoid-ing. [00:34:30] sukhe: thanks! yeah, we used it for the appservers-ro, and indeed the a-p service is already resolving to failoid :) [00:35:05] also yeah, happy to discuss the turndown procedure a bit - it's not quite the same as the LVS turndown procedure, since indeed no LVS service is being turned down [00:35:11] (it's just the discovery service) [00:35:32] meaning, just with proper sequencing in terms of puppet runs we should be able to avoid any alerts [00:36:42] that is, aside from the authdns-update in step #1 of the task description in https://phabricator.wikimedia.org/T376237 [00:36:54] that clearly has to happen first :) [00:37:56] ah, then I misunderstood the final intent. then yeah, no state change required in the service definition at least. [00:38:37] swfrench-wmf: yeah I think the steps look good but for example I just wanted to make sure that authdns-update is run in #1 since it's not explicitly mentioned. rather, step 2 mentions puppet run on A:dnsbox which is correct [00:39:02] but step #1 doesn't mention authdns-update explicitly I think and I just wanted to make sure that that was part of it [00:39:43] thanks for checking! yeah, I took that as implicit that if I'm merging operations/dns changes, then an authdns-update is needed :) [00:39:55] cool, fair enough. [00:40:38] there was once upon a time where I did get this wrong and previously someone else did [00:40:45] hence this https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/refs/heads/master/utils/mock_etc/discovery-geo-resources#1 :) [00:41:34] and yes, I did skip over this when I made the mistake but hopefully never again. hence the caution. [00:43:25] ah yes, re: the service definition - the only changes there are to remove the discovery entries that drive the creation of the confd resources consumed by gdnsd [00:43:25] so yeah, as long as we do the "reverse" of the turnup where we remove the DYNA records and mocks, and then authdns-update, and only after that do we remove the discovery entries from the service catalog, we should be good [00:43:56] no, I really appreciate you confirming! it's super easy to get some of the sequencing wrong, so it's appreciated :) [00:45:21] * swfrench-wmf makes authdns-update runs explicit in the task description [00:47:44] thanks and no worries, also we have an alert for this anyway now in a way and a very aggressive one at that [00:48:00] essentially if changes are merged to ops/dns but authdns-update is not run, you will know for sure :) [00:48:27] but yes I wast mostly being explicit given how thorough the rest of your instructions are, I just wanted to make sure this is captured since the order is important [00:49:18] ah, right! yes, as someone who has had to roll back operations/dns changes before due to finding not-yet-deployed changes, I'm glad to hear that alert exists [00:50:29] :D [00:50:43] anyway gl for the change -- seems certainly worthwhile! [02:52:54] 06Traffic, 10Community-Tech (Sea Lion Squad), 10MediaWiki-Platform-Team (Radar), 07SEO: Suppress mobile redirect for Googlebot Smartphone on Commons - https://phabricator.wikimedia.org/T397267#10940606 (10tstarling) Our webrequest metrics confirm that we're not sending redirects to Googlebot anymore. There... [07:44:02] 10Domains, 06Traffic, 06cloud-services-team, 07IPv6: Add IPv6 glue records for WMCS Designate-hosted domains - https://phabricator.wikimedia.org/T397185#10940955 (10taavi) 05Open→03Resolved a:03ssingh Thanks! Everything looks fine from my end so closing. [08:07:12] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10941004 (10Vgutierrez) it looks like the latest update of the description of the task is wrong, eqiad was flagged as upgraded when actually codfw was upgraded [08:07:17] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10941005 (10Vgutierrez) [08:31:19] 06Traffic, 06Experimentation Lab: libvmod_wmfuniq: add stats counter for cookie values of incorrect length - https://phabricator.wikimedia.org/T394862#10941066 (10Vgutierrez) 05Open→03Resolved this is currently being deployed in the CDN, all DCs got this but eqiad at this point [09:59:16] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Requestctl should use x-provenance header - https://phabricator.wikimedia.org/T396621#10941438 (10ops-monitoring-bot) Deployed hiddenparma to alert[1002,2002].wikimedia.org with reason: x-provenance support - fabfur@cumin1002 - T396621 [10:17:05] 06Traffic, 10Phabricator, 06Release-Engineering-Team: Phabricator videos fail in Firefox ("Range" request gets 503 from Varnish) - https://phabricator.wikimedia.org/T397661#10941540 (10Vgutierrez) FWIW I can't reproduce on Linux with firefox or chrome. With the provided curl reproducer I can see how the requ... [10:19:06] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10941548 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1d1b3cdd-4a2d-4663-a715-fdbe776f5534) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [10:33:22] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Requestctl should use x-provenance header - https://phabricator.wikimedia.org/T396621#10941679 (10ops-monitoring-bot) Deployed hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix previous release - fabfur@cumin1002 - T396621 [12:05:55] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786#10942023 (10Aklapper) a:05Clement_Goubert→03None @Clement_Goubert Removing task assignee as this open task has been assigned for more than t... [12:07:39] 06Traffic, 06SRE: Wikidough: Support EDNS(0) Padding: RFC 7830 and RFC 8467 - https://phabricator.wikimedia.org/T274431#10942073 (10Aklapper) a:05ssingh→03None @ssingh Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2025-05-22. Please assign this t... [12:21:10] 06Traffic, 10Beta-Cluster-Infrastructure, 06Data-Persistence, 06SRE: ATS isn't caching documents in deployment-cache-upload07 - https://phabricator.wikimedia.org/T322575#10942159 (10Aklapper) a:05Vgutierrez→03None @Vgutierrez: Removing task assignee as this open task has been assigned for more than two... [12:25:36] 10Wikimedia-Apache-configuration, 06serviceops, 06SRE: Incorrect handling of ETags taking precedence over timestamps in conditional requests - https://phabricator.wikimedia.org/T320241#10942309 (10Aklapper) a:05jijiki→03None @jijiki: Removing task assignee as this open task has been assigned for more tha... [12:28:37] 10netops, 06Infrastructure-Foundations, 06SRE: Store network users in Bitu/LDAP - https://phabricator.wikimedia.org/T335870#10942412 (10Aklapper) a:05SLyngshede-WMF→03None @SLyngshede-WMF: Removing task assignee as this open task has been assigned for more than two years - See the email sent on 2025-05-2... [12:35:26] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10942585 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=33987842-29ae-497b-b1d4-33ecbdd1ee31) set by vgutierrez@cumin1002 for 1 day, 0:00:00 on 1 ho... [12:55:38] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10942679 (10Vgutierrez) [13:06:43] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786#10942721 (10Clement_Goubert) a:03Clement_Goubert [13:08:28] 06Traffic, 10Liberica, 13Patch-For-Review: Switch to katran as forwarding plane on non-core DCs - https://phabricator.wikimedia.org/T396561#10942740 (10Vgutierrez) [13:23:06] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10942799 (10ssingh) [13:32:19] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786#10942871 (10Clement_Goubert) 05In progress→03Resolved [14:18:32] 06Traffic, 10Hiddenparma, 13Patch-For-Review: Requestctl should use x-provenance header - https://phabricator.wikimedia.org/T396621#10943078 (10Fabfur) [14:53:24] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10943215 (10ssingh) [15:18:14] 06Traffic: wmfuniq-keygen: Install to /usr/bin, not /usr/sbin - https://phabricator.wikimedia.org/T392937#10943377 (10BCornwall) 05In progress→03Resolved [15:29:41] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10943406 (10ssingh) [16:27:07] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10943711 (10BCornwall) 05Open→03In progress a:03BCornwall [16:54:12] 06Traffic: varnish 7.1.1-2~bpo11+wmf1 crash - https://phabricator.wikimedia.org/T396581#10943900 (10Vgutierrez) 05In progress→03Resolved [18:02:20] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10944212 (10ssingh) [18:30:18] greetings, traffic friends - as usual, I come to you with news of me doing weird things :) [18:30:18] tl;dr - some time after 13:00 UTC tomorrow, I'd like to begin work on the first part of T352245: migrating the TLS proxy in front of etcd to cfssl/PKI certificates [0]. [18:30:18] while I've chatted with v.gutierrez about the liberica control-plane bits, this also involves pybal and various confd restarts, so I wanted to broadcast more widely here. [18:30:18] in any case, feel free to reach out if you want to chat about any of this or tell me to get lost. as usual, I'll be loud about how things are progressing tomorrow. [18:30:18] [0] https://phabricator.wikimedia.org/T352245#10935894 [18:30:19] T352245: Migrate the etcd main cluster to cfssl-based PKI - https://phabricator.wikimedia.org/T352245 [18:34:46] swfrench-wmf: thanks for checking, there should be no work on our end that should interfere with this but yeah, check once before starting if you had like [18:37:26] sukhe: great, thank you very much [20:57:15] 06Traffic: Upgrade to ATS 9.2.11 - https://phabricator.wikimedia.org/T397456#10944931 (10CDobbins)