[07:50:12] Plumber/landlord coming early tomorrow, won't be at the first hour of weds meeting. Will be there for the skip level with tajh [09:50:19] lunch [13:15:14] pfischer if you wanna take a look at https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/49 , I think that will unblock the dev elastic container build [13:35:54] sure [13:38:33] inflatador: 👍 [13:47:25] cool, merged [13:52:28] they're publishing the image now [14:53:18] Rebooting elastic codfw for security updates [15:02:42] I have an idea about the flink/ZK stuff...I'm thinking the config should be in the Flink Operator instead of the app config [15:12:12] hopping on in 3 minutes [15:53:53] OK, merging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/953688 [16:31:38] so the flink ZK stuff is getting further...it's trying and failing to do leader election [16:36:20] inflatador: I didn't get to CR before it got merged, but I noticed that the docker image patch introduced two changelog entries rather than just one https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/60dab96e36d4f42dc3893ea9b0994b2cf99c8448 [16:36:26] get to the cr before* [16:37:00] Looks to me like there should only be one net-new entry AFAICT [16:37:29] ryankemper Good catch, not sure what happened there but I can remove it [16:37:53] inflatador: oh nevermind I noticed jforrester already fixed it https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/d7e79f62a70ea5b530a93470b70f4dd8a5ad75c5 [16:40:49] Oh nice [16:41:09] workout/lunch, back in ~75m [17:48:26] ebernhardson: any objection to canceling our 1:1 today ? Long day on my side already :/ [17:55:01] back [18:03:41] rebooting eqiad elastic for security updates, holler if anything seems amiss [18:13:49] gehel: sure, no worries [18:35:35] well, the flink/zk stuff doesn't look firewall related, at least based on https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Exec_into_a_pod_and_run_commands [19:14:25] huh, in presto single vs double quote is "column_name" vs 'string' [19:15:25] also, apparently the answer to why does a request have multiple indexes, is those were write requests. They were writing to multiple clusters, hence multiple index names. Essentially a reminder to better filter the sources :) [19:17:18] or actually, it's saneitizer doing get's [19:53:06] grocer's apostrophe alert ;P [20:04:56] meh, i strew commas and apostrophes about without abandon :) [20:10:12] pfischer: i couldn't help poke it a bit more, in theory this should be the requests that returned duplicate results (in theory, should line up with your collected metrics): https://superset.wikimedia.org/superset/sqllab/?savedQueryId=758 [20:10:37] although, my query returned 7 rows for 8am-9am utc today, and the graph shows 5. So maybe something not completely aligned [21:02:26] Big D'oh! I realized that all in-place reindexing (with our tools) is either 37-38 seconds, 67-68, or 97-98.. that's 7-8 seconds of overhead and polling every 30 seconds. Not sure why some runs have a fraction of a second *less* overhead than the baseline, but the 30-second resolution is pointless on a tiny data set! I didn't notice because all indexing runs are very consistent (±1 sec), so the consistency wasn't weird, and the [21:02:26] 7-8 seconds of overhead masked the pattern a bit. [21:03:03] oh, yea that makes sense. You could change the polling code, sec [21:04:02] Trey314159: you want includes/Maintenance/Reindexer.php in the monitorSleepSeconds function. Change it to always yield 1 [21:04:42] * ebernhardson also feels like ratio is the wrong name for the argument to that function, but whatever :P [21:06:45] That was the plan... though thanks for the pointer. I'd initially just changed MONITOR_SLEEP_SECONDS to 1.. didn't look further to realize it was more complicated. (Hint: It's always more complicated!) [21:07:24] it uses a generator function so i could start the polls out quickly but slow them down as time went on, it was the fix to how in the past reindexing a 5 document test wiki still took 30s [21:08:40] yeah, makes sense [21:09:33] i guess you can change that constant to 1, it's used as the max value returned [21:12:45] Yeah, I saw that after looking a little more closely [21:24:39] With 1-second resolution re-index timing, the trend is much clearer... re-index = index + 12s (on my machine). That's comforting, actually. [21:35:20] * ebernhardson wonders why cloudelastic1001 returns ipv4, but cloudelastic1002 returns ipv4 and ipv6 from nslookup [21:35:45] oh nevermind i'm blind, it does :P