[00:00:37] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10005374 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=23e26d8b-bf98-4528-9f4f-f796eb123261) set by cmooney@cumin1002 for 0:15:00 on 1 host(s) and th...
[00:02:19] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10005377 (10ops-monitoring-bot) VM netflow2003.codfw.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[04:48:23] <wikibugs>	 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging: New software: haproxykafka - https://phabricator.wikimedia.org/T370668#10005622 (10Fabfur) >>! In T370668#10003489, @Ottomata wrote: > I might be out of my league here, but have yall considered the [[ https://www.haproxy.com/blog/exte...
[04:55:13] <wikibugs>	 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging: Remove Benthos from ulsfo hosts - https://phabricator.wikimedia.org/T370741 (10Fabfur) 03NEW
[06:05:16] <wikibugs>	 06Traffic, 10conftool, 13Patch-For-Review: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#10005674 (10Joe)
[07:20:29] <wikibugs>	 06Traffic, 10conftool, 13Patch-For-Review: Allow integrating requestctl rules into haproxy - https://phabricator.wikimedia.org/T369606#10005723 (10Joe) >>! In T369606#9985617, @CDanis wrote: > As @Fabfur points out, in haproxy 3.0+ (but not haproxy 2.8.x) we have the option of evaluating many ACLs together w...
[07:28:21] <wikibugs>	 06Traffic, 10conftool: Integrate requestctl haproxy rules into our  TLS terminator - https://phabricator.wikimedia.org/T370745 (10Joe) 03NEW
[07:42:07] <wikibugs>	 06Traffic, 10conftool: Integrate requestctl haproxy rules into our  TLS terminator - https://phabricator.wikimedia.org/T370745#10005760 (10Fabfur) If the requestctl rules are defined in a separate backend, they obviously need to be evaluated strictly after the ones in frontend (so, they are necessarily last on...
[08:28:35] <wikibugs>	 10Wikimedia-Apache-configuration, 06collaboration-services, 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥), and 3 others: Apache 2.4.61 throws a 403 Forbidden for links containing %3F - https://phabricator.wikimedia.org/T370110#10005864 (10hashar) Is the `B` flag the reason the issue trigge...
[08:58:09] <wikibugs>	 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#10005922 (10OSefu-WMF) Despite disabling prefetch using Google's methodology, we continue to receive ~150-200k requests per day that have Google's prefetch header. Many of these requests come from...
[09:07:12] <wikibugs>	 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#10005950 (10OSefu-WMF)
[09:10:10] <wikibugs>	 06Traffic, 06Movement-Insights: Disable Chrome Private Prefetch Proxy - https://phabricator.wikimedia.org/T364126#10005956 (10OSefu-WMF) 05In progress→03Resolved Closing this task as implementation is complete. Continuing impact monitoring here - T370750
[09:36:18] <wikibugs>	 06Traffic, 10conftool: Integrate requestctl haproxy rules into our  TLS terminator - https://phabricator.wikimedia.org/T370745#10006006 (10Vgutierrez) requestctl integration could be a great candidate to write a custom SPOA (Stream Processing Offload Agent) that handles all the requestctl rules and returns an...
[11:18:44] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006345 (10cmooney) So, we hit a bit of a speed-bump in codfw with the gnmic stats once the new switches were made live there.  We now have 36 active gnmic subscriptions...
[13:11:50] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006695 (10ops-monitoring-bot) VM netflow1002.eqiad.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:16:50] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006699 (10ops-monitoring-bot) VM netflow3003.esams.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:24:10] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006727 (10ops-monitoring-bot) VM netflow4002.ulsfo.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:24:27] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006728 (10ops-monitoring-bot) VM netflow5002.eqsin.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:30:48] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006766 (10ops-monitoring-bot) VM netflow6001.drmrs.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:33:50] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006779 (10cmooney) In Eqiad our netflow VM was also running a little hot, and swapping to disk.  I've now increased the resources for it and also the other netflow VMs i...
[13:34:50] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10006783 (10ops-monitoring-bot) VM netflow7001.magru.wmnet rebooted by cmooney@cumin1002 with reason: increase VM RAM
[13:37:47] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Set Leaf switches in Codfw rows C & D to active and make new vlans live - https://phabricator.wikimedia.org/T370629#10006786 (10cmooney) 05Open→03Resolved All actions complete.  @papaul, @Jhancock.wm please note that after this change if running the netb...
[14:33:00] <wikibugs>	 06Traffic, 06SRE, 13Patch-For-Review: Show a better error page when returning an HTTP 429, not the "Our servers are currently under maintenance" one for 5xxs - https://phabricator.wikimedia.org/T354718#10007033 (10CDobbins) This has been deployed as of 14:25 on 7/23/24,  with CR #1041705.  1. I added a n...
[14:33:18] <wikibugs>	 06Traffic, 06SRE, 13Patch-For-Review: Show a better error page when returning an HTTP 429, not the "Our servers are currently under maintenance" one for 5xxs - https://phabricator.wikimedia.org/T354718#10007036 (10CDobbins) 05Open→03Resolved a:03CDobbins
[14:46:25] <wikibugs>	 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) - https://phabricator.wikimedia.org/T370068#10007060 (10ssingh) On `dns6001`, we have anycast-hc 0.9.8 running with the patch to change the logging level to WARN for when a service is dow...
[14:49:00] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007067 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=85a0a04b-e091-4107-9bc3-7c9ca22300c8) se...
[14:57:07] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007073 (10MatthewVernon) @cmooney Swift (ms-be) and Ceph (moss-be) ready when you are.
[15:01:15] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007080 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=71f4229e-483c-4848-9bc3-6926b62b02ae) se...
[15:01:45] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007081 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=18d9056a-9166-4006-b516-a07496523fd2) se...
[15:21:36] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007228 (10cmooney) Upgrade complete, things look ok network wise and all host are back pinging again.  Thanks all f...
[15:26:44] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007276 (10MatthewVernon) Both Ceph and Swift back to normal, thanks.
[16:05:41] <swfrench-wmf>	 hello traffic - FYI, in about an hour, I'll be starting to turn down the api-https and appservers-https LVS services following the instructions in https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service.
[16:05:41] <swfrench-wmf>	 I plan to use the restart-pybal cookbook in step #4, which looks pretty straightforward, but I might pop in here occasionally to confirm some things as I go :)
[16:06:29] <sukhe>	 swfrench-wmf: works for us, happy to
[16:18:07] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10007582 (10Scott_French) Silenced ProbeDown for api-https:443 and appservers-https:443 for 24h: * f6f67d8d-6381-43b3-9262-9a8cf58f2b19 * ed0d352b-fb83-4bd4-...
[16:22:09] <sukhe>	 swfrench-wmf: I updated the instructions a bit, please try them and let me know if they can be improved
[16:22:20] <sukhe>	 so please refresh the page above (https://wikitech.wikimedia.org/wiki/LVS#Remove_the_service_from_the_load-balancers_and_the_backend_servers) :)
[16:23:32] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10007599 (10cmooney) 05Open→03Resolved
[16:24:27] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#10007601 (10cmooney) 05Open→03Resolved
[16:30:06] <swfrench-wmf>	 sukhe: ah, thanks for letting me know! I'll take a look now
[16:36:31] <swfrench-wmf>	 sukhe: cool, so you transposed it, so that it's only one DC at a time (i.e., rather than both backups in both DCs, then primaries in both DCs)
[16:36:42] <swfrench-wmf>	 that makes sense and SGTM :)
[16:37:45] <swfrench-wmf>	 would it be alright to flip the order, so that I'm doing codfw first, then eqiad?
[16:37:47] <sukhe>	 swfrench-wmf: yep.
[16:38:01] <sukhe>	 swfrench-wmf: as long as you hit the secondary/backup first, I think either is fine
[16:38:17] <swfrench-wmf>	 sukhe: great, thanks!
[16:40:47] <swfrench-wmf>	 sukhe: one additional question, for the `ipvsadm` cleanup step (#6), presumably it makes sense to do this with the same ordering, right?
[16:41:13] <sukhe>	 swfrench-wmf: yes, good point. I will add it to the notes
[16:41:52] <swfrench-wmf>	 sukhe: oh, great - thanks! I can also do so after the dust settles :)
[16:42:25] <sukhe>	 so in theory, it shouldn't matter as long as the agent run and pybal restart has been completed already for that host/site but might as well be consitent
[16:42:49] <swfrench-wmf>	 right, yeah - the only trick is remembering to vary the service IP across DCs
[16:42:56] <sukhe>	 yep
[16:43:11] <swfrench-wmf>	 I just wasn't sure if the `ipvsadm` invocation was risky enough that it warranted the same sequencing
[16:43:20] <sukhe>	 https://sal.toolforge.org/log/JtwswI8BhuQtenzvYXCr example
[16:43:50] <swfrench-wmf>	 nice, thank you!
[17:26:11] <swfrench-wmf>	 sukhe: FYI, I'll be starting the pybal restarts soon - currently waiting for run-puppet-agent on LVS hosts
[17:26:34] <sukhe>	 ok!
[17:50:37] <wikibugs>	 10Wikimedia-Apache-configuration, 06collaboration-services, 10Phabricator, 13Patch-For-Review, and 4 others: Apache 2.4.61 throws a 403 Forbidden for links containing %3F - https://phabricator.wikimedia.org/T370110#10007941 (10Dzahn) >>! In T370110#10005864, @hashar wrote: > * **The `phorge` module in Pupp...
[17:54:31] <swfrench-wmf>	 sukhe: any guidance on how long to wait between the secondary and low-traffic primary to make sure things are "good" ?
[17:55:22] <sukhe>	 swfrench-wmf: I think you should feel free to move on to lt primary
[17:55:31] <sukhe>	 if secondary looks good
[17:56:49] <swfrench-wmf>	 sukhe: is there a better litmus test for "good" than the service seems to have successfully restarted? (e.g., pybal on lvs1020 is up per grafana)
[17:57:22] <sukhe>	 swfrench-wmf: in this case, pybal successfully having restarted with no other errors and the IPVS diff check indicating that the service was removed is more than enough
[17:57:34] <swfrench-wmf>	 great, thank you!
[17:57:53] <sukhe>	 there is another WARNING on icinga for lvs1020
[17:57:55] <sukhe>	 Check if Pybal has been restarted after pybal.conf was changed
[17:58:01] <sukhe>	 but this is expected
[17:58:14] <sukhe>	 it is saying that pybal has not been restarted after the conf file was changed and so it should be
[17:58:24] <sukhe>	 which you already did so running this again to see if it clears up is all we need to do 
[17:58:37] <sukhe>	 sukhe@lvs1020:~$ /usr/local/lib/nagios/plugins/check_pybal_restart --service pybal.service --file /etc/pybal/pybal.conf
[17:58:40] <sukhe>	 OK: pybal.service was restarted after /etc/pybal/pybal.conf was changed.
[17:58:58] <sukhe>	 so all good IMO
[18:01:34] <swfrench-wmf>	 sukhe: great, thank you very much
[18:02:23] <swfrench-wmf>	 I'll move ahead with the `ipvsadm` cleanups
[18:03:03] <sukhe>	 thanks :)
[18:05:56] <sukhe>	 happy to review and +1 here if need be since I always get icky doing manual ipvs cleans as well :P
[18:09:43] <sukhe>	 looks good fwiw, saw SAL
[18:26:57] <wikibugs>	 06Traffic, 13Patch-For-Review: Improve HAProxy unexpected restart alert - https://phabricator.wikimedia.org/T362833#10008131 (10BCornwall) 05In progress→03Resolved
[18:28:29] <mutante>	  <+jinxer-wm> FIRING: [8x] ConfdResourceFailed: confd resource _srv_config-master_pybal_codfw_api-https.toml has errors 
[18:28:38] <mutante>	 ^ this one is usually happening after pybal restarts
[18:28:45] <mutante>	 or it can. in some cases
[18:29:08] <mutante>	 we fixed them in the past by manually deleting some .err files
[18:29:22] <sukhe>	 yeah it's definitely related to the recent cleanup
[18:29:29] <sukhe>	 swfrench-wmf: ^ or I can take care, let me know
[18:29:36] <sukhe>	 but since this is confd, all yours :)
[18:30:31] <mutante>	 if you can find the path to the .err files...
[18:30:40] <mutante>	 then deleting them should fix that
[18:31:39] <swfrench-wmf>	 ah, missed that! what channel was that in?
[18:31:39] <swfrench-wmf>	 I can take a look, but I'm trying to debug a puppet issue as a result of the cleanup =/
[18:31:47] <mutante>	 -operations
[18:31:57] <sukhe>	 swfrench-wmf: what kind of issue? 
[18:32:07] <sukhe>	 puppet one
[18:32:32] <swfrench-wmf>	 puppet failures on the bare-metal mwdebug hosts
[18:32:42] <sukhe>	 looking
[18:33:33] <wikibugs>	 06Traffic: prometheus-lvs-realserver-mss crashed on ncredir2002 - https://phabricator.wikimedia.org/T354721#10008157 (10BCornwall) 05Open→03Stalled
[18:34:18] <swfrench-wmf>	 mutante: ack, thanks! I was looking at the backscroll, and didn't realize it had just started :)
[18:34:31] <swfrench-wmf>	 I can take care of that once I sort the puppet issues
[18:35:19] <mutante>	 swfrench-wmf: thanks, one by one. puppet is fixed though!
[18:35:25] <mutante>	 mwmaint1002 works 
[18:35:39] <mutante>	 oh, mwdebug
[18:35:39] <sukhe>	 I mutante it's a differnet one, this https://puppetboard.wikimedia.org/report/mw2268.codfw.wmnet/5278161e99f7c075e854022d63ea5f5867c9a636
[18:35:53] <sukhe>	 swfrench-wmf: we are including the lvs profile, simply removing that should fix it
[18:35:55] <swfrench-wmf>	 it's a result of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050382
[18:36:04] <wikibugs>	 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging: New software: haproxykafka - https://phabricator.wikimedia.org/T370668#10008166 (10Ottomata) > I would need a serious help w/ C. Ya, me too!  Perhaps the SPOE go lib @Vgutierrez mentioned might be easier?  > ATM we decided to go down...
[18:37:04] <wikibugs>	 06Traffic, 10observability, 06SRE: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000#10008167 (10BCornwall) 05In progress→03Stalled
[18:39:15] <swfrench-wmf>	 sukhe: yeah, I think you're right
[18:39:33] <swfrench-wmf>	 let me try to figure out a safe place to peel that out
[18:39:52] <mutante>	 I can take the icinga alert
[18:39:59] <mutante>	 you do that other thing
[18:40:44] <swfrench-wmf>	 mutante: thank you!
[18:46:25] <wikibugs>	 06Traffic, 10Sustainability (Incident Followup): cp3050  seemd more affected then otheres in recent incident - https://phabricator.wikimedia.org/T330682#10008202 (10BCornwall) @CDanis Friendly ping.
[18:48:12] <mutante>	 16x RESOLVED - fix documented  at  https://wikitech.wikimedia.org/wiki/Confd#Stale_template_error_files_present
[18:48:27] <mutante>	 no issues I can see in actual confd.log on alert1001
[18:48:37] <mutante>	 so it was broken but now it's not
[18:49:03] <sukhe>	 swfrench-wmf: so in case you haven't fixed it already
[18:49:11] <sukhe>	 modules/role/manifests/mediawiki/appserver.pp has 
[18:49:17] <sukhe>	 which has     include ::profile::mediawiki::webserver
[18:49:22] <wikibugs>	 06Traffic, 10Sustainability (Incident Followup): Experiment with single backend CDN nodes - https://phabricator.wikimedia.org/T288106#10008209 (10BCornwall) 05Open→03Stalled
[18:49:28] <sukhe>	 and then that has a require on lvs::realservers
[18:49:49] <sukhe>	 which is then controlled by has_lvs
[18:49:59] <sukhe>	 as in hieradata/role/common/mediawiki/appserver/api.yaml, has_lvs: true
[18:50:05] <sukhe>	 so it needs to be false here
[18:50:24] <sukhe>	 and similarly for the rest of the stuff that was removed
[18:50:30] <mutante>	 +1
[18:50:38] <swfrench-wmf>	 sukhe: thanks! yes, that's exactly the patch I'm writing a commit message for :)
[18:50:53] <sukhe>	 nice
[18:51:21] <sukhe>	 btw, I find it weird that it's just "has_lvs" and not profile::mediawiki::appserver::has_lvs or something but I digress
[18:51:38] <swfrench-wmf>	 ditto :)
[18:52:57] <mutante>	 it's bad puppet style but it's so old we didnt have a style guide when it was added.. afaict
[18:53:05] <sukhe>	 mutante: yep
[18:53:11] <sukhe>	 I guess that's it
[18:54:43] <mutante>	 every time we do some kind of style improvement across all modules .. it's like "yea, i'm fine merging all these, but I won't touch LVS "
[18:57:35] <wikibugs>	 06Traffic, 10Sustainability (Incident Followup): cp3050  seemd more affected then otheres in recent incident - https://phabricator.wikimedia.org/T330682#10008250 (10Vgutierrez) 05Stalled→03Invalid cp3050 is now longer being used, definitely this task can be closed now
[19:03:07] <swfrench-wmf>	 thank you both for the reviews - initially, I misread an instance of has_lvs: false elsewhere in hieradata, thinking it provided a default :)
[19:04:54] <sukhe>	 IMO the PCC failure is expected since the catalog is failing to compile
[19:04:57] <sukhe>	 merging should fix it
[19:05:01] <swfrench-wmf>	 exactly, yeah
[19:05:18] <swfrench-wmf>	 all the hosts are in "or compile correctly only with the change"
[19:10:06] <swfrench-wmf>	 alright, that seems to do the trick
[19:10:28] <swfrench-wmf>	 sukhe: mutante: thank you both again for your help :)
[19:10:52] <mutante>	 yw, this was an epic one
[19:10:57] <sukhe>	 np glad to see it all done :)
[19:37:21] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008340 (10Volans)
[19:41:34] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008343 (10Volans) I took the liberty to add a cleanup item to the task description. If that should be part of another task feel to move it around.
[19:57:14] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE: Add data to automation for new switches in codfw C/D - https://phabricator.wikimedia.org/T369106#10008422 (10cmooney) 05Open→03Resolved
[20:06:08] <wikibugs>	 06Traffic, 06SRE: Research and respond to Let's Encrypt's intent to deprecate OCSP in favour of CRLs - https://phabricator.wikimedia.org/T370821 (10ssingh) 03NEW
[20:12:26] <wikibugs>	 06Traffic, 06SRE: Research and respond to Let's Encrypt's intent to deprecate OCSP in favour of CRLs - https://phabricator.wikimedia.org/T370821#10008529 (10BBlack) Firefox has historically been the reason we've been stapling OCSP for the past many years.  If our certificate has an OCSP URI in its metadata, th...
[20:14:29] <wikibugs>	 06Traffic, 06SRE: Research and respond to Let's Encrypt's intent to deprecate OCSP in favour of CRLs - https://phabricator.wikimedia.org/T370821#10008544 (10BBlack) Note also Digicert's annual renewal is coming soon in T368560 .  We should maybe look at whether the OCSP URI is optional in the form for making t...
[20:56:18] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008652 (10Scott_French) Many thanks, all who helped get this out the door.  At this point, the LVS service turndown is done, and we've shaken out a handful...
[20:56:41] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Spin down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#10008653 (10Scott_French)
[21:13:51] <wikibugs>	 06Traffic: prometheus-lvs-realserver-mss crashed on ncredir2002 - https://phabricator.wikimedia.org/T354721#10008696 (10Vgutierrez) 05Stalled→03Resolved This has been solved a long time ago but I've never got to close the task.
[21:41:16] <wikibugs>	 06Traffic, 06SRE, 13Patch-For-Review: purged issues while kafka brokers are restarted - https://phabricator.wikimedia.org/T334078#10008822 (10BCornwall) 05Open→03Stalled
[21:41:30] <wikibugs>	 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536#10008833 (10Krinkle)
[21:43:12] <wikibugs>	 06Traffic, 06SRE: Webrequests live data shows traffic without TLS on varnish for upload.w.o - https://phabricator.wikimedia.org/T340097#10008836 (10BCornwall) 05In progress→03Stalled
[21:43:35] <wikibugs>	 06Traffic: Clean up Varnish VCL - https://phabricator.wikimedia.org/T370200#10008842 (10BCornwall) a:03BCornwall
[21:55:06] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#10008870 (10Ladsgroup) I'm repooling the replicas now.