[06:37:21] <_joe_>	 can someone with some basic understanding of how kubernetes deployments work and how mediawiki works take a look at https://wikitech.wikimedia.org/wiki/MediaWiki_On_Kubernetes/How_it_works and leave me some feedback?
[06:37:47] <_joe_>	 the goal of that page is to allow any SRE to understand how the whole thing is built and eventually modify it if they need to
[07:43:03] <Emperor>	 _joe_: I may not be that person, but it seems reasonably clear to me; though you say the php7.4-cli image installs excimer and that php7.4-fpm-multiversion-base also does so? and "we don't need a rolling restart of the pods" appears twice in the mediawiki-mcrouter section, once seemingly in the middle of a link
[08:06:46] <_joe_>	 Please be bold and fix the obvious mistakes :)
[08:07:00] <_joe_>	 but yeah on the excimer thing, I need to clarify it
[08:24:44] <_joe_>	 dcausse: can I ask you for a +1 for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/939702/ ?
[08:24:59] <_joe_>	 would you be ok with deploying this change today in case?
[08:26:13] <dcausse>	 _joe_: sure, looking
[08:26:30] <_joe_>	 it's the move of rdf-streaming-updated to use the k8s api
[08:26:42] <_joe_>	 s/ed/er/ :)
[08:29:32] <dcausse>	 _joe_: thanks! I'll take care of the rdf-streaming-updater deploy once ready, no problem
[08:29:46] <_joe_>	 oh great :)
[08:29:53] <_joe_>	 yeah I need to some prep first
[08:40:32] <_joe_>	 dcausse: at your earliest convenience, we can merge the change
[08:41:30] <dcausse>	 _joe_: please merge you want
[08:41:35] <dcausse>	 *when
[08:50:07] <dcausse>	 _joe_: I forgot about the test we run in dse-k8s to test the new flink deployment model (https://gerrit.wikimedia.org/r/940870)
[08:50:23] <_joe_>	 ah
[08:50:27] <_joe_>	 sorry I wasn't aware :)
[08:50:31] <dcausse>	 my bad
[08:53:47] <_joe_>	 dcausse: I +1'd it
[08:54:32] <dcausse>	 _joe_: thanks! do I need to wait for the change to propagate or can I go ahead?
[08:54:54] <godog>	 I'm seeing a (straightforward) +1 on https://gerrit.wikimedia.org/r/c/operations/puppet/+/940868 if someone has two minutes
[08:55:04] <_joe_>	 dcausse: what do you mean?
[08:55:54] <_joe_>	 we have already set up the puppet part, let me make sure of it
[08:56:46] <_joe_>	 dcausse: yeah go ahead at your earliest convenience :)
[08:56:57] <dcausse>	 _joe_: thanks! deploying
[08:58:47] <godog>	 cheers _joe_ 
[08:59:07] <_joe_>	 godog: hehe I was about to tell you I did give you the +1
[08:59:39] <godog>	 yeah! looking forward to wrapping this up
[09:13:17] <_joe_>	 dcausse: https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&refresh=30s&var-dc=codfw%20prometheus%2Fk8s&var-service=mediawiki&var-namespace=mw-api-int&var-release=main&var-container_name=All requests flowing to codfw as well 
[09:13:39] <_joe_>	 so we're now serving rdf-streaming-updater more efficiently :)
[09:13:47] <dcausse>	 yes seems to work well, latencies dropping as well, thanks! :)
[09:15:16] <_joe_>	 cool :)
[09:17:21] <dcausse>	 will deploy the change in dse-k8s and we should be good 
[09:18:06] <_joe_>	 great, thanks again fro the help :)
[09:19:14] <dcausse>	 np!
[09:38:24] <dcausse>	 can't see envoy telemetry from k8s-dse but looking at the envoy config deployed via configmaps I see the new endpoint being used, going to assume that it's working as expected but please let me know if not
[09:43:56] <_joe_>	 https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?orgId=1&var-datasource=thanos&var-site=codfw&var-prometheus=k8s&var-app=flink-session-cluster-taskmanager&var-destination=mw-api-int-async-ro&var-destination=mwapi-async&viewPanel=6 interesting
[09:44:38] <_joe_>	 p90 plummeted but p99 went up
[09:58:31] <dcausse>	 not sure to understand the p99s but from my end this way better overall
[10:01:18] <_joe_>	 I would say that some specific requests are slower than they were before, while on average you get the advantage of being dc-local
[10:02:19] <_joe_>	 dcausse: eqiad's situation is less rosy
[10:02:29] <_joe_>	 so I'll take a better look at what's going on
[10:03:32] <dcausse>	 thanks!
[10:15:59] <_joe_>	 dcausse: are the rdf-stream-updaters in codfw and eqiad listening to the same topics?
[10:16:10] <_joe_>	 and each updating the local WDQS cluster?
[10:21:07] <dcausse>	 _joe_: yes they do exactly the same thing
[10:21:26] <_joe_>	 ok then we have some checks to perform I guess
[10:22:25] <_joe_>	 dcausse: ahhh snap, in eqiad we have twice the requests because of the dse cluster
[10:22:44] <_joe_>	 which might in turn have some limitations of its own
[10:24:16] <dcausse>	 I can stop our test running from dse-k8s if this helps
[10:24:21] <_joe_>	 but most importantly, seems like it's causing some cpu throttling int he eqiad mw-api-int cluster
[10:24:28] <_joe_>	 dcausse: nah it's ok