[07:00:55] hi dcausse [07:01:11] atsukoito: hi! [09:07:44] errand [10:22:03] lunch [12:46:04] \o [12:46:47] o/ [12:47:20] meh 38min for a cindy run [12:47:27] with both images? that is a bit long :S [12:47:35] but i suppose about expected [12:47:42] might have been unlucking with for loop and the sleep 600 but still :/ [12:48:42] I remember a time when cindy was faster then jenkins, this last test jenkins +2ed before cindy finished the first round on opensearch 1.3 [12:49:03] wow, yea it used to be ~10-15min iirc [12:49:33] jenkins ran in 14m [12:49:49] they've also made jenkins faster, iirc there is some parallelism in there now [12:50:07] other approach is a parallel run but it has to go to another vm and possibly another jenkins account [12:50:24] does not seem great either [12:50:59] yea it's not. i had cindy doing parallel feature files awhile back, but it led to itermittent errors with timeouts [12:51:46] or a smaller test suite for the legacy image we're supposed to support [12:52:44] could maybe limit it to *_api.feature, not sure how complete the coverage is but it should hit many of the boxes [12:53:08] but the slowness is probably also just loading up the indexes. I've wondered if we could slim down the amount of data we load into opensearch but it's tedious to figure out [12:56:25] true... [13:03:50] looking in horizon if we could get a beefier vm [13:04:07] what's fai? [13:04:46] hmm, not sure? Acronym sounds familiar though [13:05:10] plausibly https://github.com/faiproject/fai ? [13:05:26] but not sure why we would use it, we use puppet [13:06:20] it's from Brian, could definitely be some testing of some sre tooling [13:06:51] but unsure if going with 8 cores for cindy will make things a lot faster [13:10:32] can't hurt to try [13:11:26] ok, we'll have to upgrade to trixie anyways [13:16:09] if only I remember where the "create" button is in horizon :) [13:16:46] ah it's because of poor french translation... [13:19:40] wondering what's the g4.cores8.ram24.disk20.ephemeral90.4xiops profile [13:28:03] was tempted by this "4xiops" suffix because I/O could well be the bottleneck as well [13:29:31] too tempting, trying this one [13:29:31] yea that does seem possible [13:29:49] I've pondered if we could cheat by putting the indexes on a ramdisk, but never got around to playing with it [13:31:31] oh could be nice indeed [14:13:16] meh, i guess i also have to use the fixes on cindy locally. unit tests still ran, but integration fails with search resolving to 127.0.0.1 [14:21:07] create-env 2m26 on the new host vs 2m52 on the old one, not a game changer but at least something [14:21:49] ebernhardson: this is the custom mw build I'm using on cindy https://people.wikimedia.org/~dcausse/mw-with-dps-mem-fix [14:22:14] this alongside with https://gitlab.wikimedia.org/repos/search-platform/cirrus-integration-test-runner/-/merge_requests/27 [14:22:21] thanks! i'll try that [14:22:37] i ended up making a custom mw image previously to embed ghostscript and the sury keys, but i guess that wasn't enough [14:22:59] oh, on the cindy host? [14:23:35] could save some time in create-env I guess if there was a local image ready [14:24:00] locally, although could also do it in cindy i guess [14:24:25] sure, finishing to setup cirrus-integ5 and I think we could try this [14:24:42] using this script to build it: https://phabricator.wikimedia.org/P93568 [14:24:56] thanks! [14:26:17] for another time, but i was also pondering getting rid of first-run. The problem is we never run first-run, so when we do it's probably broken [14:26:44] maybe call it ensure-prereqs.sh and run it every time, with guards to avoid repeating work...but then it's not clear that helps anything because the important bits still don't run [14:26:54] but if the important bits run, it slows everything down :P [14:28:01] yes... I guess it's fine to call it manually [14:28:25] and whenever we change the mw image, does happen that often [14:28:30] *not [14:29:12] for me it somewhat failed to run create-env and had to run first-run manually [14:29:30] yea, sadly not surprised :( [14:29:46] too easy to add stuff to create-env without knowing it'll break on the first run [14:34:14] ebernhardson: is there something you need to backup on cirrus-integ4? I might delete it [14:39:57] hm... perhaps mwcli can setup two different systems concurrently, containers have a "default" somewhere in there name [14:47:33] it has a --context option... [15:01:01] dcausse: no i don't think i need anything [15:01:50] yea i had noticed --context, had thought it might be useful for building a smoke test in the repo itself without mucking with my local env. I suppose it could be used for parallelizing the two images [15:11:37] ok droping cirrus-integ4 [15:14:29] dcausse: will we try doing `ttmserver` tomorrow again? [15:14:46] atsukoito: sure [15:15:00] atsukoito: is the plan to drop auth completely? [15:15:23] could not figure out why the username/password were not passed properly [15:16:08] I've rolled the diff to drop auth completely, and tested it doesn't need it. I'll make the revision of the today's morning diff but without auth then [15:16:32] sounds good [15:32:16] hm the --context will run into many issues, it needs separate mediawiki folders, which in retrospect makes sense [15:32:36] since mwcli seems to alter some config files [15:32:43] ahh, yea i suppose [15:33:29] it does seem to be tedious change to support this properly... [15:33:57] leaning towards a simplified test suite [15:37:14] seems reasonable [16:01:56] found a 90G disk on this host, mounted /srv to it and moved /var/lib/docker there, not very hopeful it'll make a diff but trying anyways [16:28:20] dcausse: tomorrow's backport is gonna be busy, https://schedule-deployment.toolforge.org/window/1780470000 [16:29:31] atsukoito: indeed... the afternoon window is still empty but might fill up quickly... [16:31:31] pretty sure tomorrow morning we won't have time to test ttm, there are 3 non mw-config patches already scheduled [16:32:09] atsukoito: would that work the UTC afternoon window? [16:32:32] yes, sure! [16:32:48] sounds good, we should get the first slot there [16:35:13] booked, see you tomorrow! [16:35:20] * atsukoito checking out earlier [16:39:28] take care! [16:40:50] sigh I think I forgot about "cirrus-artifacts" on cirrus-integ5 [16:41:16] hopefully the doc is sufficient? I attempted to show all parts [16:41:32] looking [16:47:07] yes should work, completely missed that part [17:00:01] looking at recent runs, cindy took between 25m and 30m at least since june 2025... [17:00:20] I guess the maximum acceptable is something under 30m [17:20:17] ouch, i guess i didn't notice :S [17:20:30] i wonder how long it takes locally...i feel like its more like 15m [17:21:44] there's a random(10m) penalty we take in CI but rarely (never?) saw runs under ~20m or so on the few historical patches in gerrit [17:23:01] create-env is around 2m20 we could speed that up if we run a smaller test suite without the additional wikis I guess [17:24:11] fresh takes a bit some time to boot for the npm deps [17:27:56] hmm, yea even that takes some time. The per-wiki setup is also annoyingly long, it would be nice if that could somehow be cached but then it has to know when to invalidate...and thats nearly impossible [17:28:24] yes... [17:28:32] with 8 cores, maybe it can init the wikis in parallel (via & and wait) but never tried [17:40:04] looking at timestamps from wdio (actually running) it's around 17m, with create_new_page taking 4min [17:42:48] I'm sure we could trim that one down and move to a unit test, it has plenty of variations on the query shape but in the end that's just testing a boolean in the SearchResultSet [17:43:32] yea lets go for it, we haven't critically considered whats in the suite for a long time [17:43:53] a bare run with a single image is 20m on the new cindy host [17:45:33] ok will continue on trying to optimize things a bit further [17:45:37] dinner