[00:12:29] (03PS1) 10Thcipriani: feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 [00:13:23] (03CR) 10jerkins-bot: [V: 04-1] feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 (owner: 10Thcipriani) [00:14:56] (03PS2) 10Thcipriani: feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 [00:15:50] (03CR) 10jerkins-bot: [V: 04-1] feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 (owner: 10Thcipriani) [00:16:34] grrr...won't let me use python3 exclusive syntax features [00:16:35] fun [00:17:41] (03PS3) 10Thcipriani: feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 [00:19:01] (03CR) 10jerkins-bot: [V: 04-1] feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 (owner: 10Thcipriani) [00:20:47] (03PS4) 10Thcipriani: feat: Add script to dump bundle as wikitext [tools/release] - 10https://gerrit.wikimedia.org/r/723012 [01:20:49] The ice in my ice tea has melted and I blame the MediaWiki-Docker dev-images jobrunner [01:22:13] It enters a rather heavy and tight PHP bootup/shutdown loop, unthrottled, if the database is temporarily unavailable or if any part of core, vendor, or an extension can't initalise the maintenance script context for other reasons. [01:58:43] that's because you forgot to install mediawiki-tea-frozen-ice package [02:27:57] we should... probably improve that. [02:38:45] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10MW-on-K8s, 10MediaWiki Train Development Environment, 10Release Pipeline, and 2 others: Train-dev: Ability to deploy to k8s - https://phabricator.wikimedia.org/T287993 (10jeena) There is still a bit of work to do here, such as adding the mw-stable helm repo, c... [02:39:08] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10MW-on-K8s, 10MediaWiki Train Development Environment, 10Release Pipeline, and 2 others: Train-dev: Ability to deploy to k8s - https://phabricator.wikimedia.org/T287993 (10jeena) [02:45:36] The previous dev setups often didn't have jobrunners churning in the background at all. mw runs 1 job post-send per page view which tends to work out fairly well (could be increased on dev setups). [02:46:06] Looking at the git history I can correlate it with https://gerrit.wikimedia.org/r/c/releng/dev-images/+/597826/ T246942 T246935 [02:46:06] T246935: Job queue runners for MediaWiki-Docker - https://phabricator.wikimedia.org/T246935 [02:46:06] T246942: TimedMediaHandler's ffmpeg processes get stuck when using resource limits on Docker image - https://phabricator.wikimedia.org/T246942 [02:46:26] That's a fairly narrow edge case to have something like this in the background for all the time, with a pretty hard to find sweet spot [02:47:15] That is, the most idea in terms of overhead would be to run it with --wait and maxtime=24h and e.g. a 1min sleep between iterations of entrypoint.sh for cases where it's failing or broken for some reason [02:47:31] but the problem with that is that then your changes won't apply since it's a running process. [02:48:27] the other thing is that with the default being sqlite, you really don't want a process in the background hitting the database in a constant loop [02:48:38] this loop being the one inside JobRunner.php (not the shell script) [02:49:11] the problem I ran into was in fact that the database was locked during an edit, and for some reason never unlocked. [02:49:25] I had to restart docker-compsoer in order to get out of it. [02:52:18] the cheap win in the short-term wouold be to add non-zero breathing room between jobrunner/entrypoint.sh iterations, and also to invoke it with something other than maxjobs=1. the docs seem to recommend `maxjobs=20 --wait` which means it will keep polling and popping without spawning or booting up new php processes until 20 jobs have passed, then sleep 10 seconds, and then start afresh. [02:52:19] https://www.mediawiki.org/wiki/Manual:Job_queue#Cron [02:53:36] Right now it uses maxjobs=1 without a time bound which means it's not unlikely that my edit locally today will run code I had checked out last week, if no job ran since, and --wait is still waiting in that runJobs.php rocess for the next job. [02:54:15] also, when we do make an edit, there's usually a couple jobs queued at once, so maxjob=1 is definitely too low. no need to restart and respwan for every job. [02:56:42] perhaps `--wait --maxtime=60` would be better, which means for 60 seconds the PHP code will poll db-select() queries for jobs and run any that it sees during that period, and then yields back to entrypoint.sh where we could wait 10s unconditionally, and then repeat. [02:57:54] although really for anything other than the two people in the world debugging TMH, it might be better to just turn this off entirely and let the default handle it. That way it simply does nothing when you're just working on other stuff with the laptop, and when you interact with the wiki, post-send it runs jobs. [03:00:38] the jobrunner coudl be commented out and be there to opt-in to without any config change, plus the above optimisation to avoid melting anymore ice tea, plus the default wgJobRunRate. Afaik it's fine for those who use TMH to have both the default and the opt-in jobrunner enabled. At worst, one of the post-pageview job runs will timeout quickly and fail and be immediately retried by the background runner. [04:18:54] Krinkle: The point of this is to more-closely replicate production than systems that came before it, hence the split out of the job runner. [05:01:38] James_F: I note that the node12 image updates npm from 6 to 7, but node14 goes back to npm 6. Unintentional? [05:02:03] I didn’t make the node 14 image. :-) [05:02:24] It should use 7, I agree. [05:03:32] ok, maybe we can adjust that before the fresh-node14 release [05:08:54] I'm skipping the qemu test pipeline for the newer node flavours for now. All the logic we own is the same between them, and I'll just keep node10 for now both for the test and so as to smoothen the transition. [05:09:14] In a future release we can make node14 the default, drop node10 etc. [05:10:09] probably in a week or two [06:12:37] Hello, is a Phabricator admin around? I need one for a confidential consultation. [06:39:57] urbanecm: What sort of phab admin? [06:41:57] Reedy: the one that can delete comments of others [06:42:09] I can do that [06:44:00] I'll PM [07:43:08] Krinkle: er, which node14 image? [09:51:35] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Add-Link, 10Growth-Team, 10Release Pipeline (Blubber), 10ci-test-error (WMF-deployed Build Failure): hudson.remoting.ProxyException: org.wikimedia.integration.ExecutionContext$NameNotFoundException: no value boun... - https://phabricator.wikimedia.org/T291554 [10:52:30] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Add-Link, 10Growth-Team, 10Release Pipeline (Blubber), 10ci-test-error (WMF-deployed Build Failure): hudson.remoting.ProxyException: org.wikimedia.integration.ExecutionContext$NameNotFoundException: no value boun... - https://phabricator.wikimedia.org/T291554 [10:52:59] (03CR) 10Hashar: [C: 03+2] "Lets give that a try, I will update the tox job next and retrigger whatever invokes shellcheck to verify ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/721895 (owner: 10Ebernhardson) [10:55:08] (03Merged) 10jenkins-bot: dockerfile: [tox-buster] Install shellcheck from buster-backports and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/721895 (owner: 10Ebernhardson) [10:57:14] !log Building tox images for improved shellcheck package https://gerrit.wikimedia.org/r/721895 [10:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:02:39] (03PS1) 10Hashar: jjb: update integration-config tox image for shellcheck [integration/config] - 10https://gerrit.wikimedia.org/r/723146 [11:12:31] (03CR) 10Hashar: "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/721889 (owner: 10Ebernhardson) [11:13:04] (03CR) 10Hashar: "I have deployed the job so we get the new shellcheck ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/723146 (owner: 10Hashar) [11:36:36] (03CR) 10Hashar: [C: 03+2] jjb: update integration-config tox image for shellcheck [integration/config] - 10https://gerrit.wikimedia.org/r/723146 (owner: 10Hashar) [11:38:38] (03Merged) 10jenkins-bot: jjb: update integration-config tox image for shellcheck [integration/config] - 10https://gerrit.wikimedia.org/r/723146 (owner: 10Hashar) [11:39:16] (03PS4) 10Hashar: jjb: Pass shellcheck at severity=critical [integration/config] - 10https://gerrit.wikimedia.org/r/721889 (owner: 10Ebernhardson) [11:39:19] (03PS1) 10Hashar: zuul: promote shellcheck job for integration/config [integration/config] - 10https://gerrit.wikimedia.org/r/723160 [11:42:07] (03CR) 10Hashar: [C: 03+2] "It will be passing with https://gerrit.wikimedia.org/r/c/integration/config/+/721889/" [integration/config] - 10https://gerrit.wikimedia.org/r/723160 (owner: 10Hashar) [11:43:40] 10Beta-Cluster-Infrastructure, 10Maps, 10Product-Infrastructure-Team-Backlog: Puppet config is broken for the maps instance on deployment-prep - https://phabricator.wikimedia.org/T291624 (10Jgiannelos) [11:43:57] (03Merged) 10jenkins-bot: zuul: promote shellcheck job for integration/config [integration/config] - 10https://gerrit.wikimedia.org/r/723160 (owner: 10Hashar) [11:44:40] !log Reloaded zuul for "zuul: promote shellcheck job for integration/config" https://gerrit.wikimedia.org/r/c/integration/config/+/723160/1 [11:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:45:13] (03CR) 10Hashar: "recheck now that shellcheck job is in test/gate-and-submit" [integration/config] - 10https://gerrit.wikimedia.org/r/721889 (owner: 10Ebernhardson) [12:02:32] (03CR) 10Hashar: [C: 03+2] "Jobs updated and shellcheck is now enforced by CI \\o//" [integration/config] - 10https://gerrit.wikimedia.org/r/721889 (owner: 10Ebernhardson) [12:02:46] ebernhardson: shellcheck stuff finally reviewed and deployed. Thank you! [12:05:01] (03Merged) 10jenkins-bot: jjb: Pass shellcheck at severity=critical [integration/config] - 10https://gerrit.wikimedia.org/r/721889 (owner: 10Ebernhardson) [14:26:51] legoktm: ci node14-test-browser. https://gerrit.wikimedia.org/r/c/fresh/+/698271/8/bin/fresh-node14#173 https://gerrit.wikimedia.org/r/c/fresh/+/723027/1/CHANGELOG.md [14:35:44] 10Beta-Cluster-Infrastructure: Upload cache not invalidated after purge - https://phabricator.wikimedia.org/T291643 (10dom_walden) [14:42:04] (03PS2) 10Hashar: Docker: [tox-buster] Add several packages so we can scrap sub-images [integration/config] - 10https://gerrit.wikimedia.org/r/721884 (https://phabricator.wikimedia.org/T291292) (owner: 10Jforrester) [14:42:27] (03CR) 10Hashar: [C: 03+2] "Rebased and building it!" [integration/config] - 10https://gerrit.wikimedia.org/r/721884 (https://phabricator.wikimedia.org/T291292) (owner: 10Jforrester) [14:44:52] (03Merged) 10jenkins-bot: Docker: [tox-buster] Add several packages so we can scrap sub-images [integration/config] - 10https://gerrit.wikimedia.org/r/721884 (https://phabricator.wikimedia.org/T291292) (owner: 10Jforrester) [14:46:18] !log Building releng/tox-buster:0.5.0 image [14:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:58:30] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Upload cache not invalidated after purge - https://phabricator.wikimedia.org/T291643 (10Reedy) > I notice that [[https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/reverse-proxy-staging.php|reverse-proxy-s... [15:07:48] (03PS1) 10Reedy: Add api tests to FlaggedRevs [integration/config] - 10https://gerrit.wikimedia.org/r/723217 [15:10:13] (03CR) 10Reedy: [C: 03+2] Add api tests to FlaggedRevs [integration/config] - 10https://gerrit.wikimedia.org/r/723217 (owner: 10Reedy) [15:10:31] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Add api tests to FlaggedRevs [integration/config] - 10https://gerrit.wikimedia.org/r/723217 (owner: 10Reedy) [15:12:35] (03Merged) 10jenkins-bot: Add api tests to FlaggedRevs [integration/config] - 10https://gerrit.wikimedia.org/r/723217 (owner: 10Reedy) [15:13:17] (03CR) 10Hashar: "The docker image went from 1.06 GB to 1.11 GB which sounds good." [integration/config] - 10https://gerrit.wikimedia.org/r/721884 (https://phabricator.wikimedia.org/T291292) (owner: 10Jforrester) [15:13:35] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/723217 [15:13:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:13:41] legoktm: James_F: actually, nevermind regarding the npm mismatch. I thoguht it was a mistake, but I think it kinda makes sense. node14 is based on nodejs.org packages which bundle npm 6 with node 14. npm 7 wasn't bundled until node 16 [15:13:51] But, the node12 images we have use Debian Bullseye packages directly [15:14:03] Huh. [15:14:04] and for some reason upstream debian decided to package npm 7 together with node 12 [15:14:19] and our node12 image is just plain debian, as will be for prod [15:14:20] Yeah, Debian’s node packaging is always odd. [15:14:30] athough we won't use npm in prod so maybe it doesn't matter [15:14:40] 🤷🏽‍♂️ [15:14:53] is CI mostly node14 now? [15:15:02] Migrating everything in CI to npm 7 is a good outcome. [15:15:13] No, nothing uses 14 yet. [15:15:26] Maybe something for a quiet weekend. [15:15:29] right, but not if CI and dev envs go back to using npm 6 after the node14 update [15:15:40] maybe we shoud postpone that until we're on node 16 like upstream does [15:16:02] or otherwise do something to make our node14 images use npm 7 [15:16:11] Or we could “fix” the node14 images. [15:16:13] Yeah. [15:17:35] I wonder what would happen if we install npm from apt-get in node14 dockerfile, and then leave out the last few untar steps (e.g. mv node-${NODE_VERSION}/lib/node_modules, and ln -s for npm) [15:17:53] probably debian pulls in nodejs as dep for npm so that won't work [15:18:12] indeed, and a milion other debian packages [15:18:24] because they decided to bundle each internal directory of npm-cli as a separate debian package [15:18:29] https://packages.debian.org/stable/npm [15:18:43] even debian can't escape the node dependency web [15:18:58] debian Buster has nodejs 10 and npm 5.8.0 (but npm 7.4.0 via buster-backports) [15:19:07] so depends on on we pick the packages I guess [15:19:22] this is a bullseye image [15:19:34] node12 installs nodejs/npm from debian (node 12, npm 7). [15:19:44] node14 installs nodejs/npm from nodejs.org (node14, npm 6) [15:19:47] the node14 image is provisioned from upstream iirc so it would come with whatever npm version is bundled with [15:20:13] yeah, the problem is, once we upgrade to using node12 and npm 7, we can't go back to npm 6 [15:20:18] yup bullseye has node 12 / npm 7 [15:20:31] so we need to make the node14 image use npm7 as well. [15:20:43] it's just a matter of how we install it. [15:20:47] do we still have our mirror repo? [15:21:05] https://github.com/wikimedia/integration-config/blob/master/dockerfiles/node10/Dockerfile.template#L9 [15:21:07] I am half thinking we might want to write a policy for nodejs/npm similar to the one we have for PHP at https://www.mediawiki.org/wiki/Support_policy_for_PHP [15:21:25] we can use that like we did for node10, we can ship npm 7 there so that the version stays stable [15:21:52] I mean mediawiki doesn't "support" it. It's just a few linters and unit test tools. It'd like having a phpunit support policy. [15:22:09] and we can go back, by regenerating the lock files. [15:22:12] for mediawiki sure [15:22:18] then we have the various nodejs based services [15:22:22] it's just easier if we avoid it, so that people have a conistenet dev environment experience. [15:22:48] regardless, if npm 7 still supports node 10, I guess we can upgraded npm on CI indeed [15:23:15] though I think James_F already upgraded most things to node 12 [15:23:20] no, I don't think we shouldchange the node 10 image two weeks before removing it. [15:23:31] this is just about node14 [15:23:46] we can upgrade our npm mirror to npm 7, and then use that for the node14 image [15:25:15] or get npm 7 from the debian package? [15:25:33] no, because we can't mix part-debian and part-nodejs.org tarball [15:25:42] installing npm from debian will mess up the image completely [15:25:46] The only user of node10 is Krinkle for fresh. [15:26:19] I was so hoping to drop that integration/npm mirror :/ [15:26:42] Yes, aren’t we all? [15:26:47] well, the alternative is to follow upstream node.js tarballs only and use it in both the node12 and node14 image. [15:27:03] that means CI-node12 will be downgraded from npm 7 to npm 6, and we'll stay on that for node14 as well. [15:27:19] This would match what people use locally if they use homebrew or nvm, or anything else really. [15:28:13] that is until some development team start requiring npm 7 features of some sort :\ [15:28:17] if not already [15:28:23] npm 7 came with nodejs 16 [15:28:33] we were on npm 5 and nodejs 10 until a week ago [15:28:46] AHHH [15:29:06] I guess I am mixing it up with people starting to use npm 6 features so :D [15:29:57] so get nodejs/npm from upstream to match the expectation one would have from their local install [15:30:01] and stop using the debian package [15:30:04] that sounds good for the CI images [15:30:10] Right now we have: CI-node10 (debian buster node10 + our npm 5 mirror), CI-node12 (debian bullseye node12+npm7), and CI-node14 (nodejs.org tarball for node14 which includes npm6) [15:30:21] The problem is that we can't have newer node go back to older npm. [15:30:25] but not sure what we would want for the mediawiki/services [15:30:27] Krinkle: No? Not a week ago. Months ago. [15:30:37] I'll give you 1 month. [15:31:32] I don't think anyone outside us three cares about the npm version. Historically, things have been least surprsing when we don't come up with our own version combinations that differ from upstream node.js [15:31:37] Regressing from npm7 feels like a mistake. [15:31:55] The new version of package-lock makes for dirty CI rubs. [15:31:59] Runs, even. [15:32:24] lockversion2 is compat with npm 6 afaik [15:33:05] I think it’s forwards compat but not back. [15:33:17] npm 6 will install npm7's lock version2 file. [15:33:31] tested, ensured and and blogged/bragged about upstream. [15:33:56] Hmm. [15:34:47] I'm not saying I prefer to stay on the older npm, but compat wouldn't be an issue, and it is what matches upstream packaging, and means we don't have to keep the npm.git mirror. [15:36:29] if we want to keep the happy debian accident of npm 7, we'd update our mirror and use that, and then when node16 rolls around we'd try to remember to not install debian packages for node/npm in CI and thus never use prod packaging for nodejs. That doesn't feel great either. But the only other way we have experience with that installs an npm version of choice on top of a debian nodejs package is by using our npm.git mirror. [15:37:39] I kinda like the idea of the CI node images just installing nodejs tarballs. Won't match prod 100% but would be a lot easier to maintain potentially and side-steps the whole debian/npm problem. [15:38:16] but we could start that from node16 forward as otherwise we'd downgrade npm (since tarball for node12 doesn't come with npm7) [15:39:18] or we could do it straight away, if we downgrade node12 images to npm6 implicilty by switching to node14's pattern of installing tarballs [15:39:49] When you say “CI images” you mean the CI-only ones and not the service images, just to clarify? Or both? [15:41:21] the SRE maintained node12-devel image relies on the debian package and I expect services to be build on top of that one [15:41:43] whereas the "CI images" would be the one defined in integration/config and pushed to "/releng" and used for linters [15:42:54] I mean the integration docker images published under releng indeed. [15:43:35] I am tempted to just use the upstream tarball for sake of simplicity [15:43:36] Assuming the SRE/devel ones only use pure debian, I guess that has its own set of issues, but at least it's somewhat consistent (albeit delayed as it can't get node14 for a while naturally). [15:44:31] operations/docker-images/production-images : images/nodejs12-devel/Dockerfile.template:RUN {{ "nodejs npm" | apt_install }} [15:44:43] so debian package ( same for nodejs10-devel ) [15:52:52] James_F: Krinkle: so I guess stick to npmjs tarball and thus to npm 6? [15:54:14] !log gitlab1001: brief downtime to apply [[gerrit:714382|gitlab cas: uid instead of CN; add nickname_key]] for T288392 [15:54:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:54:18] T288392: GitLab uses 'real name' as username (rather than 'shell name' or an user-specified name) - https://phabricator.wikimedia.org/T288392 [15:58:03] (03PS1) 10Jforrester: Zuul: Create 'extension-apitests' template for simplicity [integration/config] - 10https://gerrit.wikimedia.org/r/723224 [15:58:35] I'm not sure I agree with this plan. [16:03:02] The ideal would've been debian not shipping an out of channel npm version, and us getting free drinks on a nearby beach of our choosing. [16:04:37] second-best, might perhaps be then, to temporarily continue our npm mirror at version 7, and use it until we switch to pure upstream for node16+, assuming we don't want to differ from upstream long-term, just that we want to hold on to the upgrade we already got this time, right? [16:05:35] * addshore plays a sad trombone and points at gitlab [16:05:51] oh, 500s are gone for me now! [16:06:19] addshore: brennen was doing a fix of some sort. I guess it got restarted [16:06:26] aaah cool! :) [16:07:37] (03PS1) 10Krinkle: Install 7.5.2 [integration/npm] - 10https://gerrit.wikimedia.org/r/723226 (https://phabricator.wikimedia.org/T267888) [16:08:07] addshore: https://sal.toolforge.org/log/MHNdE3wB1jz_IcWughYB [16:08:28] oooh, is my name going to change *looks* [16:09:37] alas, it didn't work. [16:09:41] reverting that change. [16:10:37] PROBLEM - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:12:06] addshore: should be back up in ~30s. [16:12:14] (03CR) 10Hashar: [C: 03+1] "This way we get a consistent npm version accross nodejs images at the expense of having to keep maintaining this mirror. But that seems fi" [integration/npm] - 10https://gerrit.wikimedia.org/r/723226 (https://phabricator.wikimedia.org/T267888) (owner: 10Krinkle) [16:12:34] James_F: mirroring of our npm 7 ^ [16:23:43] 10Release-Engineering-Team (Doing), 10GitLab (Auth & Access), 10Patch-For-Review, 10Privacy, 10User-brennen: GitLab uses 'real name' as username (rather than 'shell name' or an user-specified name) - https://phabricator.wikimedia.org/T288392 (10brennen) No love on this one on gitlab1001. I get a 422 wit... [16:25:35] brennen: where does it use realname from? I appear as "Addshore" afaik [16:27:11] oh, not realname from ldap or something, but just the username of the dev / wikitech account? [16:27:25] addshore: your CN in LDAP is most likely Addshore. [16:27:58] (CN is what gets used currently.) [16:31:15] Mine is similar :D [16:38:20] (03CR) 10Krinkle: [C: 03+2] "I'm landing this in order to test the image end to end. Final decision can be made in the integration/config repo." [integration/npm] - 10https://gerrit.wikimedia.org/r/723226 (https://phabricator.wikimedia.org/T267888) (owner: 10Krinkle) [16:38:40] (03CR) 10Krinkle: [V: 03+2 C: 03+2] Install 7.5.2 [integration/npm] - 10https://gerrit.wikimedia.org/r/723226 (https://phabricator.wikimedia.org/T267888) (owner: 10Krinkle) [16:48:04] (03PS1) 10Krinkle: node14: Update from bundled npm 6 to pinned npm 7 [integration/config] - 10https://gerrit.wikimedia.org/r/723256 (https://phabricator.wikimedia.org/T267888) [16:55:18] I'm applying the -test and -test-browser updates as well locally, but the non-trivial part is up for review ^ cc hashar James_F [16:56:16] (03PS1) 10Jeena Huneidi: build-mw-image-loop: pull before pushing charts [tools/train-dev] - 10https://gerrit.wikimedia.org/r/723266 [17:04:49] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Add-Link, 10Growth-Team, 10Release Pipeline (Blubber), 10ci-test-error (WMF-deployed Build Failure): hudson.remoting.ProxyException: org.wikimedia.integration.ExecutionContext$NameNotFoundException: no value boun... - https://phabricator.wikimedia.org/T291554 [17:08:00] (03PS2) 10Jeena Huneidi: build-mw-image-loop: pull before pushing charts [tools/train-dev] - 10https://gerrit.wikimedia.org/r/723266 [17:11:12] (03CR) 10Ahmon Dancy: [V: 03+2 C: 03+2] mirror-repos.sh: Add two more repos to mirror [tools/train-dev] - 10https://gerrit.wikimedia.org/r/722976 (owner: 10Ahmon Dancy) [17:11:26] (03PS1) 10Ahmon Dancy: WIP: Access train-dev git server instead of gerrit [tools/train-dev] - 10https://gerrit.wikimedia.org/r/723267 [17:11:37] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:14:04] 10Release-Engineering-Team (Seen), 10GitLab (CI & Job Runners), 10User-brennen: Document long-term requirements for GitLab job runners - https://phabricator.wikimedia.org/T286958 (10wkandek) [[ https://people.wikimedia.org/~oblivian/ci/ci-threat.pdf | Separation into untrusted and trusted environments that h... [17:19:44] (03CR) 10Jeena Huneidi: [C: 03+1] "It's working for me" [tools/release] - 10https://gerrit.wikimedia.org/r/722979 (owner: 10Ahmon Dancy) [17:24:53] (03CR) 10Ahmon Dancy: Make auto-stage more responsive to interruption (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/722979 (owner: 10Ahmon Dancy) [17:25:57] (03PS3) 10Ahmon Dancy: Make auto-stage more responsive to interruption [tools/release] - 10https://gerrit.wikimedia.org/r/722979 [17:26:22] (03CR) 10Ahmon Dancy: Make auto-stage more responsive to interruption (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/722979 (owner: 10Ahmon Dancy) [17:34:22] (03CR) 10Ahmon Dancy: [V: 03+2 C: 03+2] "Works for me." [tools/train-dev] - 10https://gerrit.wikimedia.org/r/723266 (owner: 10Jeena Huneidi) [17:47:59] (03PS2) 10Krinkle: node14: Update from bundled npm 6 to pinned npm 7 [integration/config] - 10https://gerrit.wikimedia.org/r/723256 (https://phabricator.wikimedia.org/T267888) [17:49:44] Krinkle: in case you wondering why beta cluster goes read only so much https://phabricator.wikimedia.org/T277862#7374919 [17:50:06] (breaking tests depending on beta cluster) [17:51:16] the third one is RL. I will look into it. [17:51:38] RL runs during update.php? [17:54:41] this is all write queries whether during update.php or outside of it [17:55:02] one thing can be that update.php flushes all caches for RL [18:00:21] Krinkle: yup, DatabaseUpdater::purgeCache() has `$this->db->delete( 'module_deps', '*', __METHOD__ );` [18:08:58] Amir1: sure, it clears the cahce, but I eman the module dep store code isnt' run, right? [18:09:07] (and it has to clear the cache) [18:09:53] yup but I assume automated browser tests trigger an insert there [18:11:26] kind of. after a deploy, the first few page views will make load.php fetches that lazy compile LESS files and then we insert post-send the files we discovered that weren't tracked in module_deps [18:11:35] which then makes the version hash correct after that [18:12:03] Yup, primary db writes on GET. https://phabricator.wikimedia.org/T113916 [18:12:08] it's being worked on [18:12:21] it's blocked under a mountain of stuff [18:12:43] will get unblocked as part of multi-dc work starting next month in Q2 [18:13:45] but in terms of db traffic, it is expected. probably the same for memcached backend and objectcache table [18:13:55] but we probably silence those or track them elsewhere [18:14:37] we could potentially run purgeModuleDeps.php instead within the db updater which would preserve more of the cache between deployments [18:14:47] but I don't know if it's worth optimising for [18:15:00] I assume the db writes aren't causing a problem in beta right? [18:15:43] they are debounced with a non-blocking lock that degrades gracefully and doesn't lock from or to anything else outside the RL postsend updates [18:17:17] > I assume the db writes aren't causing a problem in beta right? [18:17:40] Generally the writes on beta cluster is heavy and I see a lot of read-only errors there [18:19:18] so I want to reduce it and this would help but the first two seems more important atm [18:19:45] the first one doing eight times more write queries than RL module dependency [18:27:07] 10Release-Engineering-Team (Doing), 10GitLab (Auth & Access), 10Patch-For-Review, 10Privacy, 10User-brennen: GitLab uses 'real name' as username (rather than 'shell name' or an user-specified name) - https://phabricator.wikimedia.org/T288392 (10brennen) Googling on that error again led to this issue: [[... [18:37:12] (03PS2) 10Jforrester: Zuul: Create 'extension-apitests' template for simplicity [integration/config] - 10https://gerrit.wikimedia.org/r/723224 [18:39:53] (03CR) 10Jforrester: [C: 03+2] Zuul: Create 'extension-apitests' template for simplicity [integration/config] - 10https://gerrit.wikimedia.org/r/723224 (owner: 10Jforrester) [18:41:45] (03Merged) 10jenkins-bot: Zuul: Create 'extension-apitests' template for simplicity [integration/config] - 10https://gerrit.wikimedia.org/r/723224 (owner: 10Jforrester) [18:41:52] !log Zuul: Create 'extension-apitests' template for simplicity [18:41:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:45:13] 10Release-Engineering-Team (Doing), 10GitLab (Auth & Access), 10Patch-For-Review, 10Privacy, 10User-brennen: GitLab uses 'real name' as username (rather than 'shell name' or an user-specified name) - https://phabricator.wikimedia.org/T288392 (10brennen) It also looks like `extern_uid` can be modified thr... [19:26:14] hashar: does it seem normal for `INFO:backend.MySQL:Terminating MySQL` to take ~2 minutes? looking at https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php72-docker/114519/console [19:55:48] !log gitlab-ansible-test: sudo gitlab-ctl cleanse to drop test data and reset [19:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:56:46] kostajh: yes I had a bug filed about it from October 2020 iirc [19:57:38] which was related to IO throttling or Docker poor performance due to syscall being passed through a filter [19:57:52] I think I had it fixed, I guess it is reappearing :-\ [19:58:55] https://phabricator.wikimedia.org/T265615 "Terminating MySQL takes several minutes in (Wikibase?) CI jobs" [20:00:54] so yeah disk io being throttled [20:01:20] then MySQL data are supposedly on a tmpfs and thus should not be throttled [20:01:44] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-10-01 to 2020-12-31 (Q2)), 10Quibble, and 2 others: Terminating MySQL takes several minutes in (Wikibase?) CI jobs - https://phabricator.wikimedia.org/T265615 (10hashar) 05Res... [20:01:52] or maybe that is set at the Qemu level and does impact io made to a tmpfs [20:02:01] no idea, that got to be investigated ;) [20:08:24] hashar: gotcha, yeah I remember that task now. thanks! [20:08:50] but tmpfs should not be affected [20:09:02] so who knows :-( it is going to be a long and tedious debug session I guess [20:10:44] 10Release-Engineering-Team (Seen), 10Data³, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: Release Engineering Data Collection and Retention (aka Data³) - https://phabricator.wikimedia.org/T216085 (10thcipriani) [20:13:04] anyway I should sleep. *wave* [20:15:52] Night [21:37:20] 10Phabricator: give visibility for "in progress" tasks on a work board - https://phabricator.wikimedia.org/T291593 (10mmodell) a:03mmodell [21:37:54] 10Phabricator: give visibility for "in progress" tasks on a work board - https://phabricator.wikimedia.org/T291593 (10mmodell) [21:37:56] 10Phabricator: Evaluate adding "In progress" status to Phabricator. - https://phabricator.wikimedia.org/T288956 (10mmodell) [21:54:29] hehe, i am on pipeline #575 on gitlab already D: [21:58:49] spammer [22:03:26] (03PS1) 10Ahmon Dancy: auto-stage: Be resilient to changes in the origin [tools/release] - 10https://gerrit.wikimedia.org/r/723318 [22:04:19] (03PS2) 10Ahmon Dancy: auto-stage: Be resilient to changes in the origin [tools/release] - 10https://gerrit.wikimedia.org/r/723318 [22:06:40] 10Release-Engineering-Team (Doing), 10GitLab (Auth & Access), 10Patch-For-Review, 10Privacy, 10User-brennen: GitLab uses 'real name' as username (rather than 'shell name' or an user-specified name) - https://phabricator.wikimedia.org/T288392 (10brennen) Seems like this works. Applied production config,... [22:07:47] (03CR) 10Ahmon Dancy: [C: 03+2] auto-stage: Be resilient to changes in the origin [tools/release] - 10https://gerrit.wikimedia.org/r/723318 (owner: 10Ahmon Dancy) [22:08:53] (03Merged) 10jenkins-bot: auto-stage: Be resilient to changes in the origin [tools/release] - 10https://gerrit.wikimedia.org/r/723318 (owner: 10Ahmon Dancy) [22:15:06] (03PS2) 10Ahmon Dancy: Access train-dev git server instead of gerrit [tools/train-dev] - 10https://gerrit.wikimedia.org/r/723267 [22:44:37] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10dduvall) [22:44:40] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T281165 (10dduvall) [22:44:49] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T281165 (10dduvall) 05Open→03Resolved [22:45:57] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10DannyS712) [22:58:05] (03CR) 10Jeena Huneidi: [C: 03+2] "Sorry I thought this merged already" [tools/release] - 10https://gerrit.wikimedia.org/r/722979 (owner: 10Ahmon Dancy) [22:59:09] (03PS4) 10Jeena Huneidi: Make auto-stage more responsive to interruption [tools/release] - 10https://gerrit.wikimedia.org/r/722979 (owner: 10Ahmon Dancy) [23:44:27] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10MediaWiki-Docker: "npm test" takes a long time after having used MediaWiki-Docker - https://phabricator.wikimedia.org/T291674 (10Krinkle) [23:54:26] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10dduvall) [23:54:29] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.38.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T281165 (10dduvall)