[08:31:32] kostajh: I see what you mean about ParallelCommand challenges, like putting each chromedriver on a different port. Splitting into two groups at a higher level seems like a good if messy workaround for the short-term. But hashar is probably right about the fixed costs of each job being very high. The nice thing about ParallelCommand is that it can fork the process after all dependencies are [08:31:38] installed, so makes sense to still pursue that as a longer-term plan I suppose. [08:49:32] The group A/B splitting paradigm could also help with ParallelCommand implementation of Selenium tests. In the PHPUnit group A/B patch (https://gerrit.wikimedia.org/r/c/integration/quibble/+/742200), there are two LocalSettings.php files and separate MW installs created. If we want to run all core/skin/extension Selenium tests in parallel in a single job, it would be easier if each core/skin/extension has its own LocalSettings.php [08:49:32] / database tables to work with [09:00:28] So the separate LocalSettings.php is just for targeting different databases? That's neat. [09:00:53] Ideally we don't need to isolate the dbs, but I can see why it's useful in practice. [09:02:32] I've seen tests which install an app's db and then make a copy for each suite; but that's probably not a big advantage over simply reinstalling. [09:06:02] hii [09:06:13] yesterday I looked a bit at npm install performance [09:06:36] and well I don't get how it takes several minutes when running it from Wikibase :/ [09:15:24] there's just a massive amount of dependencies, I think? [09:20:14] running `npm install` for Wikibase locally takes 1m40s and downloads 320mb of files into `node_modules` [09:20:48] (side note, can't remember if I shared this; I wrote some notes about selenium + CI at https://www.kostaharlan.net/posts/wikimedia-selenium ) [09:21:32] probably nothing you all don't already know, though :) [09:40:08] there's this from 2019 about npm install, not sure how accurate it is in 2022 http://www.tiernok.com/posts/2019/faster-npm-installs-during-ci/ [09:43:05] testing `pnpm install` on Wikibase I get an error with `install:bridge`, "Unknown option: 'Wikibase:remoteVersion_vue'" [10:33:03] I also found that on Wikibase using `npm --no-audit ci` saves me ~ 10 seconds [10:35:07] kostajh: I am guessing it is something that only npm supports [10:35:15] there are a few use cases here and there that forced us to move to npm 7 [10:35:23] though I am not sure it was for mediawiki extensions [10:41:41] https://blog.npmjs.org/post/618653678433435649/npm-v7-series-arborist-deep-dive is my read of the day ;) [11:20:18] 10Quibble, 10MediaWiki-Core-Tests, 10Browser-Tests, 10MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), and 2 others: Run browser tests in parallel - https://phabricator.wikimedia.org/T226869 (10hashar) [12:01:43] (03PS7) 10Awight: Wrapper to pretty-print parallel job progress [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 [12:01:46] (03PS44) 10Awight: Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) [12:01:49] rebased. [12:02:42] Is there anything else I can do to fancy these up for review? [12:03:19] (03PS13) 10Awight: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 [12:12:24] (03PS6) 10Awight: Split core npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/588087 [12:15:47] (03CR) 10jerkins-bot: [V: 04-1] Split core npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/588087 (owner: 10Awight) [12:22:04] (03PS4) 10Awight: Sequence of commands as a command [integration/quibble] - 10https://gerrit.wikimedia.org/r/588083 [12:25:03] (03PS5) 10Awight: Sequence of commands as a command [integration/quibble] - 10https://gerrit.wikimedia.org/r/588083 [12:25:30] No need to look at the Sequence stuff yet, I rebased just to check for surprises. [12:28:20] (03CR) 10jerkins-bot: [V: 04-1] Sequence of commands as a command [integration/quibble] - 10https://gerrit.wikimedia.org/r/588083 (owner: 10Awight) [12:34:44] awight: https://gerrit.wikimedia.org/r/c/integration/quibble/+/693458/ is a no-op on its own, right? I think it's fine to +2 [12:56:00] I am still triaging my vacations email :D [13:06:39] (03CR) 10Hashar: [C: 03+1] "`atexit` always triggers a warning, then I don't have any good idea right now to replace it by something else :D" [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [13:11:22] (03CR) 10Awight: Wrapper to pretty-print parallel job progress (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [13:26:31] (03CR) 10Awight: [C: 04-1] "Going to try rewriting this without `atexit`." [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [13:47:56] 10Quibble, 10MediaWiki-Core-Tests, 10Browser-Tests, 10MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), and 2 others: Run browser tests in parallel - https://phabricator.wikimedia.org/T226869 (10Krinkle) FYI: > Change by Awight **merged**: > > [mediawiki/core] selenium: run 4 tests in parallel > > 10Quibble, 10MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), 10Patch-For-Review, 10User-awight: Consider httpd for quibble instead of php built-in server - https://phabricator.wikimedia.org/T225218 (10kostajh) The conclusion to this was {https://phabricator.wikimedia.org/T285649} \o/ [14:13:30] ah, maxInstances is running entire suites concurrently? or running each `it()` test within a suite concurrently? [14:13:36] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/94938/console#console-section-9 suggests it is the former [14:13:45] and AbuseFilter went from 1:36 seconds to 36 seconds [14:14:29] WikibaseLexeme went from 3m36s to 1m26s although there are two failures https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/94940/console#console-section-9 [14:15:41] that's because it has some overly strict code `const title = RecentChangesPage.lastLexeme.getText();` which should be refactored to look for what it wants, since there's no guarantee about what will be "last" [14:35:56] kostajh: I think I've been saying the wrong thing. https://webdriver.io/docs/organizingsuites/ [14:36:17] Apparently, maxInstances runs suites concurrently, but cases from each suite are sequential. [14:36:34] That seems like a good default, really. [14:49:50] (oops, and this means some of our commit messages are misleading) [15:07:40] right [15:07:53] and that also makes it easier to understand how to fix the errors in wikibase, in theory anyway [15:08:02] cause it's conflicting code in the before() hook in each suite [15:12:02] (03CR) 10Awight: [C: 04-1] Wrapper to pretty-print parallel job progress (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [15:13:54] great! [15:24:05] (03PS8) 10Awight: Wrapper to pretty-print parallel job progress [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 [15:24:09] (03PS45) 10Awight: Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) [15:24:15] (03PS14) 10Awight: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 [15:24:42] (03CR) 10Awight: "Smoke-tested the happy path locally." [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [15:27:22] (03CR) 10jerkins-bot: [V: 04-1] Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [15:27:39] (03CR) 10jerkins-bot: [V: 04-1] Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 (owner: 10Awight) [15:28:10] (03CR) 10jerkins-bot: [V: 04-1] Wrapper to pretty-print parallel job progress [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [15:28:47] (03CR) 10Awight: "Was also able to locally verify that a failed subprocess will immediately stop the application and print errors." [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [15:32:02] (03PS9) 10Awight: Wrapper to pretty-print parallel job progress [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 [15:32:06] (03PS46) 10Awight: Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) [15:32:13] (03PS15) 10Awight: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 [15:40:32] Okay, the ParallelCommand branch is ready for re-review. [16:16:33] ok, will look tomorrow, hopefully [18:39:12] (03CR) 10Hashar: Wrapper to pretty-print parallel job progress (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/693458 (owner: 10Awight) [20:32:27] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/744073 has flaky tests on both selenium and api testing [21:33:50] 00:01:42.945 1) should GET an increased site edits stat [21:33:50] AssertionError: expected 178 to be above 178 [21:33:55] pretty sure I have seen that one before [21:37:15] 2022-01-06T20:36:23 23355 123 GET http://127.0.0.1:9413/api.php?format=json&action=query&meta=siteinfo&siprop=statistics application/json - - - [21:37:15] 2022-01-06T20:36:23 48058 170 POST http://127.0.0.1:9413/api.php application/json - - - [21:37:15] 2022-01-06T20:36:23 14647 116 POST http://127.0.0.1:9413/index.php application/json - - - [21:37:15] 2022-01-06T20:36:23 17308 123 GET http://127.0.0.1:9413/api.php?format=json&action=query&meta=siteinfo&siprop=statistics application/json - - - [21:37:41] I am betting the second query to siteinfo > statistics yields a stall/cached result [21:41:20] the second query has [21:41:25] [DBQuery] SiteStatsUpdate::doUpdate [0s] localhost:/workspace/db/quibble-mysql-8of2tix2/socket: UPDATE `site_stats` SET ss_total_edits=GREATEST(`ss_total_edits`+1,0),ss_total_pages=GREATEST(`ss_total_pages`+1,0) WHERE ss_row_id = 1 [21:41:25] [DeferredUpdates] DeferredUpdates::run: ended SiteStatsUpdate #691, processing time: 0.00060820579528809 [21:41:46] so the edit counts are deferred [21:42:06] if one retrieve stats, edit, retrieve stats again they are not updated yet [21:42:49] and in that build the stats update for the edit is actually done at the end of the request retrieving the stats [21:43:22] that is how I understand it [21:45:37] it should probably be marked as skipped for now and a task filed for pet to look at it [21:45:45] sleep & [22:14:49] there's another flaky test, disabled in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/751997 [22:14:51] anyway i'm out now too [22:15:22] we should probably tell #engineering-all / wikitech-l to keep an eye out for these and file tasks so they can be fixed/skipped [22:54:08] hrm now a third test is failing.