[08:04:24] <_joe_>	 btullis: is it possible that the high rate of memcached errors from the dumps jobs is related to your change
[08:07:27] <_joe_>	 the timing seems suspiciously similar https://logstash.wikimedia.org/goto/2a745ba1802ed97e53c76349982b519a
[08:07:44] <btullis>	 _joe_: I doubt that it's related to yesterday's change. That was only flink related. I can't immediately think why it would have started.
[08:08:34] <_joe_>	 btullis: in any case, it seems to be a consistent problem for dumps since yesterday at 13:00Z
[08:09:29] <btullis>	 Yep, I will check it out with high priority. 
[08:11:00] <_joe_>	 and yes I also don't see how your change could cause this, tbh
[08:14:16] <btullis>	 but it's definitely in our wheelhouse, as the saying goes. 
[09:04:23] <btullis>	 I still have no idea why they suddenly started, but we now have lots of failed requests to memcached on 127.0.0.1:11213 - e.g. https://logstash.wikimedia.org/goto/d6588eb85ff96a5e9724b530a14ba9b0
[09:05:08] <btullis>	 We don't run an mcrouter container in our mediawiki pods and we never have. But perhaps we should.
[09:10:38] <btullis>	 https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/dse-k8s-services/mediawiki-dumps-legacy/values.yaml#L199
[10:22:31] <_joe_>	 or run them as daemonsets, yes
[10:31:01] <btullis>	 Thanks. Can we silence or filter out these alerts while we work on this? I'm still unclear on why they suddenly started, seeing as we have been running these dumps for ~ 3 months now. They have been working fine without a connection to memcached, so I would like to understand the benefit of adding mcrouter to the setup, other than reducing the noisy errors. 
[14:16:10] <swfrench-wmf>	 thanks for looking into this, btullis. indeed, if these have been running all along, then it's puzzling as to why we're only seeing the errors now / so suddenly.
[14:16:10] <swfrench-wmf>	 I believe the only way to suppress the errors in the alert would be to update the expression to exclude dumps temporarily. is there a task I can reference in a TODO adjacent to where I would do that? (in which case, I can do so)
[14:16:10] <swfrench-wmf>	 although, oddly, it looks like this stopped around 9:20 UTC today?
[14:21:21] <cdanis>	 swfrench-wmf: hmm.. might it have to do with per-wiki configs?
[14:24:22] <swfrench-wmf>	 could very well be, yeah - I would say that my knowledge of "how dumps works, including what wikis are processed when and on what schedule" is near zero :)
[14:42:33] <btullis>	 It seemed to be affecting only wikidatawiki and commonswiki, but we also got some similar alerts from the wikibase dumps.
[14:45:20] <btullis>	 swfrench-wmf: If you have a patch, you could attach it to this task: T352650 - or if you wanted to make a new task for handling itm then you could add that as a parent.
[14:45:21] <stashbot>	 T352650: WE 5.4 KR - Hypothesis 5.4.4 - Q3 FY24/25 - Migrate current-generation dumps to run on kubernetes - https://phabricator.wikimedia.org/T352650
[14:47:19] <btullis>	 There is some background information on how dumps works now, here: https://wikitech.wikimedia.org/wiki/Dumps/Airflow 
[14:47:50] <btullis>	 I could also give an overview in an SRE staff meeting some time, if that would help.
[14:52:42] <swfrench-wmf>	 thanks, btullis!