[09:58:58] <dcausse>	 lunch
[12:42:37] <inflatador>	 <o/
[12:44:11] <dcausse>	 o/
[13:55:04] <topranks>	 inflatador: are we ok for the switch upgrade in E2?
[13:55:44] <inflatador>	 topranks checking
[13:56:15] <topranks>	 it's elastic1091 & elastic1092, plus wdqs1018 and wdqs1020
[13:57:51] <inflatador>	 topranks OK, we're ready. Sorry for not getting to that sooner
[13:58:01] <topranks>	 no probs at all thanks!
[14:23:56] <topranks>	 upgrade is done if you want to repool
[15:00:33] <gehel>	 office hours on https://meet.google.com/vgj-bbeb-uyi
[15:00:57] <gehel>	 dcausse, pfischer, ebernhardson, dr0ptp4kt ^
[16:04:48] <inflatador>	 picking up my cat, back in ~20
[16:08:09] <dcausse>	 Trey314159, pfischer uploaded https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1051798 if you have time please feel free to +1 if you don't see any issues and I'll try to ship this tomorrow
[16:08:25] <dcausse>	 dinner
[16:20:02] <inflatador>	 back
[18:12:20] <inflatador>	 lunch, back in ~45
[18:50:35] <cdanis>	 hi, are there any Extension:CirrusSearch understanders in the chat? :)
[18:51:14] <gehel>	 dcausse is probably already away, ebernhardson is on vacation. Maybe dr0ptp4kt knows enough.
[18:51:21] <gehel>	 Or very unlikely me...
[18:51:31] * inflatador is also in the "unlikely" camp
[18:52:02] <gehel>	 pfischer might know some as well, but should also not be around at this time.
[18:52:24] <cdanis>	 I just noticed, there are a large number of request traces that look like this one: https://trace.wikimedia.org/trace/385340f9223ae3ef94a18ec5c5f3f336
[18:52:52] <cdanis>	 a user call to Special:Search with a query, then five parallel calls back the Mediawiki API for the same cirrus-config-dump
[18:53:05] <cdanis>	 I can't imagine that's intentional behavior
[18:53:53] <inflatador>	 does this seem like a new issue, or has it being going on for awhile?
[18:54:02] <cdanis>	 very likely going on for a while
[18:54:07] <cdanis>	 (at a guess)
[18:54:16] <cdanis>	 the new thing here is having distributed tracing :)
[18:54:24] <inflatador>	 I know, it's awesome~
[18:54:35] <gehel>	 interesting... but this will need to wait until tomorrow
[18:54:39] <cdanis>	 no worries, I can file a task
[18:54:48] <inflatador>	 ACK, was gonna offer but you have the most context
[18:54:52] <gehel>	 please do! and tag Discovery-Search
[18:54:58] <cdanis>	 I'm quite confident they are separate requests though -- you can dig into each one and see that it was routed to a different mw-api-int instance
[18:55:01] <cdanis>	 ok will do :)
[18:55:33] <cdanis>	 oh, I also owe you a task about Extension:CirrusSearch not propagating the headers we need to get its queries to elastic included in the traces
[18:55:34] <gehel>	 cdanis: are requests parallelized?
[18:55:59] <cdanis>	 gehel: according to the trace, these are, yes
[18:56:23] <gehel>	 yeah, that's what I'm seeing on the trace, but that surprises me even more
[18:56:27] <cdanis>	 right?
[18:56:58] * gehel is looking forward to having elastic included on the distributed traces
[18:57:55] <cdanis>	 right now we can see traffic towards it, but it's not connected to any parent requests: https://trace.wikimedia.org/search?service=search-omega-eqiad
[18:58:01] <gehel>	 is the distributed tracing something we could extend to more systems? I'm specifically thinking Blazegraph. With internal federation, that might become useful
[18:58:19] <cdanis>	 gehel: yes, assuming the systems in question support propagating the usual opentelemetry headers
[18:58:48] <cdanis>	 anything that propagates `traceparent` and `tracestate` should 'just work' when running with our mesh on k8s
[18:59:02] <cdanis>	 there's some work for bare-metal I haven't done yet, but the pieces are there and it's "just" a matter of putting them together
[18:59:35] <gehel>	 blazegraph is definitely in the baremetal category. Adding opentelemetry to it should not be too complicated (but some work)
[19:00:02] <cdanis>	 (to be clear, by propagate I mean "copies the incoming request header to any outgoing request headers made for service calls 'on behalf' of that incoming request")
[19:00:10] <cdanis>	 (like we generally already did with x-request-id)
[19:00:39] <cdanis>	 okay, I need a snack now but I'll file tasks before this all swaps out of my brain
[19:00:45] <cdanis>	 thanks for the quick look!
[19:01:23] <gehel>	 actually, now that I think about it, it's going to be a mess for Blazegraph. Too many thread pools, so no way to easily match incoming requests to outgoing requests without being quite invasive
[19:01:53] <cdanis>	 gehel: surely there's a per-unit-of-work piece of context somewhere being passed from pool to pool?
[19:03:27] <gehel>	 yes, of course, but that's blazegraph internals, so it would need some deep changes. In the more common case of single thread, there is a thread context that can be used, independently of whatever the application is doing. So it becomes very easy to add that kind of behaviour, without touching application code.
[19:05:31] <cdanis>	 can you shape the top-level federated queries such that you inject a sparql comment with some machine-readable tags in each of the subqueries?
[19:06:33] <cdanis>	 some prior art: https://google.github.io/sqlcommenter/
[19:06:57] <cdanis>	 anyway, actually going AFK for a bit now :)
[19:37:40] <dcausse>	 these are requests made to fetch the config of sister wiki to perform interwiki searches, they should be heavily cached... 200ms to fetch the mw-config does not right tho :( 
[19:52:50] <dcausse>	 hm... this should use the wan object cache but not finding the cache key group in https://grafana.wikimedia.org/d/lqE4lcGWz/wanobjectcache-key-group?orgId=1 ...
[19:54:38] <dcausse>	 ah maybe not, it might be using the local server cache but the code is a bit messy so that might be a mistake
[20:11:33] <inflatador>	 workout, back in ~40
[20:52:31] <inflatador>	 back
[21:18:35] <dr0ptp4kt>	 g.ehel and c.danis, apologies - i was taking a certification exam (🤞). i'm winding down shortly, for holiday and vacation. wishing you and everyone here well.
[21:18:58] <inflatador>	 dr0ptp4kt wishing the best for you and your CKA status!
[21:19:55] <dr0ptp4kt>	 thanks inflatador - i'm suspecting borderline. i had one process i was proficient enough in, but i misread part of the instruction, and so burned about 10 minutes more than i wanted to on that. caveat emptor! talk to you later, be well!