[07:24:06] iirc the task api is relying on a stateful index so it might well be elastic losing track of the requests it sent and since it's generally all async I hope it's not holding a thread [09:58:58] lunch [13:19:05] greetings [13:21:29] o/ [13:33:56] \o [14:27:37] So for today's retro I wanted to do a retro on query service SLOs (and maybe about query service in general) [14:28:17] I'm struggling to think of an appropriate format though, I don't think our usual start doing / stop doing / more of / less of makes much sense for that [14:28:31] So maybe we keep it more free-form as far as doc structure? I'm open to any ideas [14:32:50] ryankemper: free-form sounds good to me! [14:32:59] great [15:03:10] ebernhardson: mpham: retro meet.google.com/eki-rafx-cxi [15:58:07] My pc froze [15:58:16] I had to restart [16:36:14] anyone need puppet patches? I kinda lost track of time, looks like no one is here [17:04:08] dcausse just curious about https://gerrit.wikimedia.org/r/c/operations/puppet/+/775254 , how were you checking if the jvmquake triggered? Was it just looking for /tmp/jvmquake_warn_gc like the default? [17:15:37] inflatador: i think via the alerts that are being sent out, BlazegraphJvmQuakeWarnGC [17:15:46] inflatador: although i realize now they didn't really go out, Reason: The message is larger than the 40 KB maximum size [17:16:02] it was sent to discovery-alerts but the mailing list filtered it. hmm [17:16:23] but yes those are via the file flag, collected through prometheus [17:18:06] OK https://github.com/wikimedia/operations-alerts/blob/master/team-search-platform/blazegraph.yaml#L28 got it [17:20:34] yup thats the one, and the actual collection was done here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/770978 [17:22:36] ebernhardson nice, thanks! Any idea if prometheus or any other process cleans that up ? I guess it doesn't really matter, just curious [17:22:46] oops sorry missed the ping [17:23:03] i suppose i don't really know how the file gets deleted :) [17:23:08] sadly this flag has to be deleted manually when the alert is triggered [17:23:27] it's just to "learn" about jvmquake behavior [17:23:32] makes sene [17:23:34] *sense [17:24:02] I reset it manually earlier today, the values I've were really bad in the end [17:24:14] s/I've/I've set/ [17:25:47] s/I've set/I set [17:25:48] :P [17:26:01] meh :) [17:34:54] I don't know if it's a horrible idea, but i turned off the size check for discovery-alerts so it will forward mails regardless of length [17:35:16] probably doesn't matter, it's not like we are all syncing a local POP3 [17:36:18] * dcausse is curious to see what we missed [17:37:10] dcausse: i just accepted the two that it filtered, they shuld come through now [17:37:36] really doesn't look like 40kb+, but maybe i'm just old :P [17:37:57] oh, it's mountains of html. [17:38:02] well, whatever. it's fine :) [17:38:45] jvmquake again, sigh... [17:39:04] dcausse: those should be the old messages that i accepted, not new ones [17:40:16] I'll silence it while we merge the new settings [17:41:25] oh Ryan just merged it, thanks! [17:42:23] ryankemper: forgot to mention this in the patch but this will require a rolling-restart of blazegraph on the public wdqs-cluster to be fully effective [17:42:56] dcausse: oh right! I forgot about that aspect [17:43:03] I'll go ahead and roll a wdqs deploy [17:43:11] thanks! [17:43:24] going offline for dinner [18:04:09] random other followup, we wondered yesterday why eswiki wasn't in the ltr dataset. It seems when we ran the AB test we were having trouble collecting data from eswiki and cswiki, so they weren't tested: https://gerrit.wikimedia.org/r/377393 [18:04:44] s/in the ltr dataset/using ltr for ranking/ [18:10:10] (a bit sad, considering. would be nice to revisit that testing at some point) [18:12:37] lunch, back in ~30-45 [20:10:57] hmm, sometimes i'm not very smart. I've been wondering why the custom field i added wasn't making it through, when it reality it was i just had a `head -n 20` at the end of my jq pipeline reading the output... [20:11:00] so, i guess it's lunch time :) [20:33:38] ebernhardson: haha that's a classic one [21:08:49] back