[09:38:09] started to ingest enwiki in relforge using a ingest pipeline and model that I think Erik uploaded, 3.7M article so far (~50%) of the data, definitely too slow :) [09:39:20] the ingest pipeline does not automatically scale sadly, had to parallelize bulk imports to all 3 three relforce machines [09:42:17] I might kill the import somewhere today otherwise that might annoy anyone else trying to use relforge, it's at page_id 35915956 (created in may 2012) [09:58:05] also, could not use the new dumps, seems like the redirect array is an array of arrays instead of an array of objects, filing a task [10:49:06] the reindexer complains about "Unexpected missing pods: {'mw-script.codfw.l6bq6791-6bgv8', 'mw-script.codfw.9u20xxi4-r6sxj'}" but these pods are here and running [11:09:57] seems like the way we extract pods from k8s state is too restrictive, will investigate after lunch [11:09:59] lunch [12:53:08] ah stupid me, the script assumes we run in eqiad and thus is polling wikikube@eqiad [13:47:08] started the reindex@codfw, if nothing wrong happens in the next 30mins I'll run the same reindex concurrently in eqiad [13:49:53] ah, forgot that it's doing big wikis first, doing commons now and I guess nothing won't budge much in the next 30mins :) [14:07:38] o/ [15:09:03] \o [15:11:25] o/ [15:12:02] ebernhardson: had a doubt, the reindex orchestrator is safe to run concurrently on the three clusters (eqiad, codfw, cloudelastic)? [15:13:23] dcausse: should be, yea. Each uses it's own state directory (by default, named by cluster) [15:13:56] ok thanks, going start it in eqiad and cloudelastic [15:35:29] ouch, 2x ab-test has been running all month. Was only supposed to get an extra week. [15:36:35] Trey314159: Sorry, I was a bit late. Are you around for our 1:1? [15:36:51] be there in a moment [15:57:11] quick school pickup, will be 5mins late to triage [16:00:52] need a 5 min break too, see 5 past then [17:00:55] Taking dog out [17:01:43] workout, back in ~40 [17:10:10] dinner [17:52:02] dinner [18:10:47] back [18:30:50] hmm, both dwell time visualizations in this report are garbage :P The violin graph doesn't really suggest anything, the graph of time vs proportion of visits says something is happening, but leaves it very fuzzy [18:30:57] maybe need an area-under-the-curve [18:38:26] turns out, area-under-the-curve of a survival function is...the mean. I guess i should just provide the mean dwell time [18:38:56] (not exactly, but basically) [18:39:51] i guess the mean is in the violin graph, but not visible enough or annotated with a number [19:23:51] ryankemper ebernhardson we are getting an alert for morelike latency in eqiad and I have to leave in 5m, any chance one of y'all could take a look? [19:24:21] I’m back in 15 will look then [19:24:56] ryankemper ebernhardson looks like it cleared...not sure what happened but might be worth a retrospective look https://grafana.wikimedia.org/goto/Yv3S2AWDg?orgId=1 [19:29:09] :S yea it's been flapping recently. On friday i tried ban/unban one of the nodes that was under heaviest load, but not clear it helped [19:36:16] err, i guess that would have been wednesday [19:57:08] proposed report for commonswiki 2x near match: https://people.wikimedia.org/~ebernhardson/T408154-AB-Test-Metrics-Commonswiki-Near-Match-2x.html [20:26:54] inflatador: ebernhardson: yup there's been a handful over the last week, spikes in fulltext & morelike & prefix, not so much compsuggest. it correlates pretty heavily with overall qps, and starts inflating the threadpool [20:29:32] eqiad gets a lot more traffic than codfw so it's kind of our canary. we might just be getting to a place where the ~50 nodes we have isn't enough under sustained high loads [20:36:10] tldr: sustained high load -> threadpool queue grows -> average latency goes up a bit, but p95 really goes into the 1-2 second range because there's a lot of requests blocked on their threadpool job to do its thing [20:36:24] here's some fulltext wonkiness from today that illustrates it pretty well https://grafana-rw.wikimedia.org/d/dc04b9f2-b8d5-4ab6-9482-5d9a75728951/elasticsearch-percentiles?orgId=1&from=2025-11-25T02:05:25.662Z&to=2025-11-28T15:27:39.926Z&timezone=utc&var-cluster=elasticsearch&var-exported_cluster=production-search [21:29:29] šŸ‘€