[06:35:47] inflatador: nice! seems like the cpu change you made had a great impact (almost doubled the ingestion rate apparently!), wdqs2022 seems to outperform old hosts now [07:59:29] ^ That sounds great! Thanks! Simple change with big impact... [08:59:12] errand [10:18:38] lunch [13:31:26] o/ [13:33:24] o/ [13:34:23] hm seems like zuul no longer wants to accept a patch if cindy has voted V-1 [13:39:25] Following the conversation on jupyter notebooks yesterday, I've tried the pycharm / jupyter integration. It works and provides much better auto completion! [13:57:44] gehel: Yes, it’s pretty seamless. [14:07:41] cindy failing on the case requiring @many_redirect, which ofcourse returns the right page when queried after the fact... thought that was because of redirects not being handle properly but the cirrusdoc prop seems to properly check that the page requested is part of the redirect array... [14:07:48] not sure what's going on :/ [14:24:19] well... stopped cindy to get one patch merged [14:24:51] will send a test patch to wait a bit before searching to see... [14:30:41] back in ~30...might be slightly late to retro [15:01:21] \o [15:15:22] o/ [15:16:05] 21h runtime on all the sup pods i restarted yesterday, doing better :) [15:19:18] o/ [15:19:20] :) [15:25:46] there are still the actual problems is was running into to deal with though, i see it getting exceptions from cirrusbuilddoc, will have to check if those are related to the revdelete that's already in or something else [15:25:49] ebernhardson: yeah! Thank you for the bad-response-patch. I tweaked the SUP dashboard today, memory consumption is now down to < 50mb (compared 1.2gb before). [15:26:29] What revdelete-issue? [15:26:30] interesting, what changed the memory usage so much? Is it just a difference of being backlogged or not [15:26:52] pfischer: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1017922 [15:27:15] No, the grafana dashboard was using a lot of memory, due to too many series I guess [15:27:32] Caped those via took operator [15:27:32] pfischer: basically SUP requests by rev-id, and some rev-id's are deleted as spam/vandalism/etc. Because SUP requests later than the old cirrus updater it was running into many more than we previously saw [15:28:22] pfischer: oh, ok that makes sense. and yea that dashboards was a bit laggy. will check it out [15:38:14] I brought up the rate-limiting discussion with Janis today and since this is a long-wanted feature sre/ops is willing to implement this the envoy-way, with a central envoy-rate-limiter service. They will discuss it next week but in case they commit to it, we (based on a header to be defined) would be first ones to be restricted by a rate limit. All we’d have to do then is handle 429-responses with a shorter-than-usal [15:38:14] retry delay. [15:38:41] pfischer: awesome! that is certainly some needed infrastructure [15:47:03] pfischer: thanks! [15:47:17] cool, keep us posted! [16:00:58] somehow i hadn't remembered this from the related articles analysis - [16:00:59] . When compared to the control group, no significant impact on pageviews was observed hinting at the conclusion that while the feature's content curation is preferred among users, it does not significantly affect user behavior - if a user was interested in continuing to read, she will select an article suggested by related pages with a higher probability than another article on a page. [16:01:01] However, the feature has little to no effect on users who were not interested in accessing more article [16:27:33] inflatador: would it make sense to re-enable puppet on few other wdqs20* hosts (e.g. wdqs2023, this one is way behind the others) to see? [16:29:02] dcausse Y, finally out of mtgs. Was planning on enabling on all hosts, but I can start with wdqs2023 if you prefer [16:29:28] inflatador: no all hosts is fine I guess, I all for it :) [16:29:33] *I'm [16:30:18] btw the bios tweak on wdqs2023 did not seem to have much effect, seems to be the slowest one tho [16:31:08] ebernhardson: wondering how they can identify "users who were not interested in accessing more article" [16:31:48] dcausse ACK, will roll back the BIOS change but probably won't reboot to apply until backlog is cleared [16:32:02] sure [16:32:27] the puppet change is active on all w[cq]s hosts now [16:50:23] lunch, back in ~1h [16:51:57] dr0ptp4kt: it's a bit hectic over here, I'll try to reschedule our 1:1 for later tonight [16:57:18] pfischer: let's do tuesday - i sent a new time. [17:08:28] dcausse: i think it was looking at the number of page views per session, people were clicking the related articles but they weren't viewing any more pages than without it [17:09:10] ok, thanks [17:41:41] baxk [17:46:54] dinner [18:12:43] * ebernhardson realizes there is an `os_family` i could group on instead of `device_family`. unsurprisingly that does better to show android vs iphone vs etc :P [18:46:56] ebernhardson: you mention superset in T358349#9727873, but I suspect that's the plan but isn't done yet, right? [18:46:57] T358349: Search Metrics - Number of Searches - https://phabricator.wikimedia.org/T358349 [18:47:48] gehel: right, i just kicked it off to calculate dailies but i haven't actually tried to query it from superset. But it's writing the same data to a table under my username so it should be queryable [18:48:00] cool! [20:03:26] * ebernhardson realizes that all these column names are terrible and will require explanation :P [20:04:04] ebernhardson: which host you on now for those notebooks? [20:04:38] dr0ptp4kt: stat1008, i combined them into stat1008:~ebernhardson/T358345.ipynb [20:05:25] thanks! [20:32:08] * ebernhardson is often both impressed and saddened by spark. It's processing terrabytes of data and pulling out useful information, very nice. But the stats say it works out to ~3 megabytes of webrequest data per second per core. Which doesn't seem all that impressive :) [20:32:42] just ballpark from the spark executors stats of input vs task time [21:09:36] wdqs2023 doesn't want to catch up...giving up and will do a data xfer shortly