[14:37:35] <duesen>	 Hi all! is anyone around to back me up for an high priority (but not SUPER urgent) deployment for the rest gateway? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1259956
[14:38:14] <duesen>	 This would help fix some issues that thw wikipedia app is having with hitting rate limits. claime, Raine, are you around?
[14:39:34] <claime>	 duesen: After the service switchover
[14:39:50] <duesen>	 oh right, that's today... never mind then
[14:40:04] <claime>	 We should be ok to do it in an hour or so
[14:40:18] <claime>	 It's just services + traffic, not mediawiki switchover, it's less involved and risky
[14:40:31] <duesen>	 yea, that's when my block of meetings starts, and then it'S dinner time... I may come back to it in the evening, but tomorrow works as well.
[14:40:51] <claime>	 Tomorrow would be morning, as we do the mediawiki switchover in the afternoon :)
[14:41:29] <claime>	 I'll ping you if we're available earlier than in an hour though, we may be able to squeeze it in
[14:53:51] <Raine>	 duesen: I'll be around in the evening 
[18:07:34] <duesen>	 Raine, claime: walked the dog, had a coffee, wrote an email... feeling better now :) 
[18:07:59] <Raine>	 \o/
[18:08:17] <duesen>	 I rebased the docs patch. I'll hit +2 on it, then I go and test it together with the patch we merged earlier, on staging.
[18:08:32] <Raine>	 Version bump is merged too 
[18:08:59] <duesen>	 saw it, thank you!
[18:13:51] <duesen>	 ...running make check...
[18:17:32] <duesen>	 hm, two tests flaked out, running again. the more tests I add, the more can flake out... but as long as they don't fail consistently, it should be fine. rate limiting *is* timing sensitive...
[18:18:47] <Raine>	 is it because sometimes you're going to be at the minute boundary? 
[18:20:10] <duesen>	 yes... I tried to account for that, but... i guess i have to look into that issue again. 
[18:21:02] <duesen>	 on the second try, two different tests flaked out. i'll do a third run chust for charm, but if nothing is failing consistently, it's just a timing issue
[18:21:15] <Raine>	 ack
[18:21:36] <duesen>	 I added a lot more tests recently. that increases the chance of flake. but should look into counter-measures, though
[18:24:22] <duesen>	 two different failures again! I'll take that
[18:25:32] <duesen>	 applying to codfw
[18:28:22] <duesen>	 Raine: uh, something doesn't look right. The ratelimiter metrics vanished entirely. Traffic still looks good, so it's probably a prometheus issue. But I have no visibility on the the effects of my change.
[18:28:37] <duesen>	 Raine: is it possible that prometheus is not finding the new pods?
[18:28:53] <Raine>	 doesn't seem likely
[18:29:58] <duesen>	 the stats vanished retroactively. I can't see older metrics anymore either...
[18:30:15] <duesen>	 https://grafana-rw.wikimedia.org/d/UOH-5IDMz/api-and-rest-gateway
[18:30:27] <duesen>	 the ratelimit section. you need to pick a policy at the top
[18:30:29] <Raine>	 grafana logged you out, maybe?
[18:30:51] <Raine>	 works for me
[18:32:04] <duesen>	 works for eqiad, broken for codfw
[18:32:24] <Raine>	 huh right, sorry
[18:32:43] <duesen>	 some tag value changed?
[18:32:47] <Raine>	 eh but that's just DC switchover?
[18:33:03] <duesen>	 hm? how so?
[18:33:06] <Raine>	 we're in single DC mode for 1 week
[18:33:51] <duesen>	 oh...! ok, i didn't know that
[18:34:03] <duesen>	 the traffic stats are still there, but everything routes to eqiad?
[18:34:29] <Raine>	 yeah
[18:34:42] <duesen>	 so... i guess i can deploy to eqiad then.
[18:34:57] <duesen>	 got me scared for a minute :D
[18:35:06] <Raine>	 :D
[18:43:16] <duesen>	 Raine: looking good, thank you!
[18:48:00] <Raine>	 \o/
[19:09:04] <duesen>	 apergos: --^
[19:10:47] <apergos>	 lol
[19:11:06] <apergos>	 finally some time zone overlap and I miss it!