[00:39:43] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:40:59] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [01:46:29] Project beta-scap-sync-world build #35662: 04FAILURE in 2 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/35662/ [01:56:09] Project beta-scap-sync-world build #35663: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/35663/ [02:13:09] Project beta-scap-sync-world build #35664: 04STILL FAILING in 8 min 46 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/35664/ [02:15:41] Yippee, build fixed! [02:15:41] Project beta-scap-sync-world build #35665: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/35665/ [02:36:14] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) [02:36:27] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) [02:39:40] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) [02:44:12] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) To the train deployer, (to explain all the above) Changes on T289619 introduced a regression (T299352). I replicated the error early on Friday and p... [08:00:07] kostajh: I am going toswitch the jobs to Quibble 1.3.0 [08:01:01] (03CR) 10Hashar: [C: 03+2] jjb: switch Quibble jobs to 1.3.0 (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/754567 (owner: 10Hashar) [08:02:06] !log Updating Jenkins jobs for Quibble 1.3.0 [08:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:02:54] (03Merged) 10jenkins-bot: jjb: switch Quibble jobs to 1.3.0 [integration/config] - 10https://gerrit.wikimedia.org/r/754567 (owner: 10Hashar) [08:04:35] hashar: +1 [08:07:54] !log Updating Jenkins jobs for Quibble to pass `--parallel-npm-install` https://gerrit.wikimedia.org/r/c/integration/config/+/754569 [08:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:08:10] kostajh: and we now have --parallel-npm-install [08:08:14] well almost [08:10:55] heh [08:11:15] did I mess up the jjb config? [08:17:10] kostajh: no it is working fine [08:17:23] it is just that I have said "we now" while the jobs were still being deployed [08:17:31] they are all updated now [08:17:41] ah, ok [08:18:02] but https://gerrit.wikimedia.org/r/c/integration/config/+/754569 is not merged? [08:18:20] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/754569 (owner: 10Kosta Harlan) [08:18:27] yeah too many tasks in // [08:18:28] :D [08:19:43] :) [08:20:16] (03Merged) 10jenkins-bot: jjb: Pass --parallel-npm-install to Selenium jobs [integration/config] - 10https://gerrit.wikimedia.org/r/754569 (owner: 10Kosta Harlan) [08:20:20] * kostajh watches https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/96963/console [08:44:10] (03PS1) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [08:46:47] (03CR) 10Kosta Harlan: Parallelism as a command object (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [08:47:57] (03CR) 10jerkins-bot: [V: 04-1] BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [08:50:09] hashar: looks like it worked, and was a little faster, though we'd have to look at averages over time [08:50:52] kostajh: on top of that I am probably going to rebuild all agents soon with higher IO which would help a bit as well [08:50:57] (03PS2) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [08:51:05] looking at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/754548/2#message-a1057fdca2d063467e800bc89790c7cce4910c48, one build had no change, the other was a few minutes faster, but there's so many other factors... [09:52:57] (03Restored) 10Kosta Harlan: [DNM] Add more dependencies to full run to check build time [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [09:53:04] (03PS5) 10Kosta Harlan: [DNM] Add more dependencies to full run to check build time [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [09:53:38] (03PS6) 10Kosta Harlan: [DNM] Add more dependencies to full run to check build time [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:00:16] (03PS7) 10Kosta Harlan: [DNM] Add more dependencies to full run to check build time [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:03:30] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check build time [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:07:42] (03PS8) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:13:03] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:21:31] (03PS19) 10Kosta Harlan: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 (owner: 10Awight) [10:21:34] (03PS3) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:21:37] (03PS9) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:21:40] (03PS1) 10Kosta Harlan: ParallelCommand: Fallback to default of two workers [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 [10:23:01] (03CR) 10Kosta Harlan: Parallelism as a command object (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [10:27:30] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:31:26] (03PS2) 10Kosta Harlan: ParallelCommand: Fallback to default of two workers [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 [10:31:32] (03PS20) 10Kosta Harlan: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 (owner: 10Awight) [10:31:38] (03PS4) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:31:44] (03PS10) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:37:33] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:42:57] (03CR) 10Kosta Harlan: "I'm not sure why this doesn't work. Locally I see e.g." [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:52:06] (03CR) 10Kosta Harlan: "I still get errors when running the unit tests on macOS, though: https://gitlab.wikimedia.org/-/snippets/12" [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 (owner: 10Kosta Harlan) [11:12:15] (03PS1) 10Giuseppe Lavagetto: stage-srv-mediawiki: also remove noc.w.org files [tools/release] - 10https://gerrit.wikimedia.org/r/754889 [11:19:54] Hey! I'm hearing from colleagues at WMDE that CI might be broken. At a glance it looks like the new quibble containers might be missing `ext-xmlwriter` [11:20:16] See, for example, https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php73-noselenium-docker/45876/console [11:30:41] yup, I got this error from four jobs in a single gate-and-submit build [11:30:44] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php73-noselenium-docker/45876/console [11:30:46] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/44397/console [11:30:48] https://integration.wikimedia.org/ci/job/wikibase-client-docker/26479/console [11:30:53] https://integration.wikimedia.org/ci/job/wikibase-repo-docker/26475/console [11:31:17] the client/repo jobs also complain about other missing extensions: dom, intl, mbstring, xml, xmlreader [11:32:04] hashar: kostajh: ^^ [11:35:21] tarrow: Lucas_WMDE checking [11:35:21] I’ll create a phab task [11:35:27] +1 :] [11:35:37] cheers! [11:36:10] I wonder why it refers to /etc/php/8.1 when the build is for 7.3 [11:36:21] and Composer is operating significantly slower than normal because you do not have the PHP curl extension enabled. [11:36:25] which do not seem right [11:36:29] I will roll back to previous images [11:38:23] hashar: https://phabricator.wikimedia.org/T299389 [11:38:26] excellent [11:38:28] and thanks! [11:38:32] the jobs are being rolled back [11:38:44] and I guess if one look at docker-registry.wikimedia.org/releng/quibble-buster-php73:1.3.0 there will be some funky files under /etc/php [11:39:04] I have upgraded the jobs to quibble 1.3.0 this morning with new images I have build yesterday [11:39:44] !log Rolling back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # T299389 [11:39:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:39:46] T299389: Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 [11:40:09] 737 jobs being updated [11:40:23] I will send the revert commits once it is done [11:40:56] kostajh: looks like the sury.org php7.3 package is broken causing docker-registry.wikimedia.org/releng/quibble-buster-php73:1.3.0 to lack php config files :D [11:41:03] I will dig into it this afternoon I guess [11:41:59] ouch [11:43:19] :/ [11:44:39] 10Continuous-Integration-Infrastructure, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) p:05Triage→03High I have upgraded the Jenkins jobs thi... [11:45:24] (03PS1) 10Hashar: Revert "jjb: Pass --parallel-npm-install to Selenium jobs" [integration/config] - 10https://gerrit.wikimedia.org/r/754895 (https://phabricator.wikimedia.org/T299389) [11:45:26] (03PS1) 10Hashar: Revert "jjb: switch Quibble jobs to 1.3.0" [integration/config] - 10https://gerrit.wikimedia.org/r/754896 (https://phabricator.wikimedia.org/T299389) [11:45:42] Lucas_WMDE: tarrow: taavi: should be fixed now! [11:46:27] thanks, I’ll retry my gate-and-submit [11:46:36] I will have to inspect the various releng/quibble*:1.3.0 images [11:47:34] if only there was a way to inspect the tree of layers .. [11:47:34] wooo! [11:47:46] Maybe time for CI for the CI? [11:48:07] yeah potentially [11:48:57] Did you want something like this? https://github.com/wagoodman/dive [11:49:34] docker had an option to show the tree of layers but it got removed for some reason :/ [11:49:41] so yeah something like that would do :] [11:53:10] :) [11:58:18] hashar: yikes, sorry. [12:04:56] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) From `apt list`: | php-apcu | 5.1.21... [12:05:27] kostajh: not your fault :] [12:05:36] it is definitely an issue with the php7.3 packages provided by sury.org [12:05:50] php-xdebug and php-apcu are made to depends on the 8.1 versions :/ [12:08:37] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) [12:10:18] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10kostajh) Can/should we use pecl to install a... [12:23:37] (03CR) 10Hashar: [C: 03+2] "deployed it a few minutes ago" [integration/config] - 10https://gerrit.wikimedia.org/r/754895 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [12:23:40] (03CR) 10Hashar: "deployed it a few minutes ago" [integration/config] - 10https://gerrit.wikimedia.org/r/754896 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [12:25:26] (03Merged) 10jenkins-bot: Revert "jjb: Pass --parallel-npm-install to Selenium jobs" [integration/config] - 10https://gerrit.wikimedia.org/r/754895 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [12:25:56] Thanks for the speedy fix <3 [12:41:32] (03CR) 10Hashar: [C: 03+2] "deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/754896 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [12:43:24] (03PS1) 10Hashar: dockerfiles: quibble images to use soname xdebug/apcu [integration/config] - 10https://gerrit.wikimedia.org/r/754905 (https://phabricator.wikimedia.org/T754896) [12:43:26] (03Merged) 10jenkins-bot: Revert "jjb: switch Quibble jobs to 1.3.0" [integration/config] - 10https://gerrit.wikimedia.org/r/754896 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [12:45:02] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Provide buster-based Ruby CI jobs - https://phabricator.wikimedia.org/T280874 (10Nikerabbit) @Jdforrester-WMF Why do you say translatewiki-rake-docker is unused? I just noticed we haven't been running rake tests for translatewiki.net repo for soon a... [12:45:35] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) >>! In T299389#7627900, @kostajh wro... [12:46:31] kostajh: pecl is surely appealing, but I would rather not have to deal with the compilation mess when building images :] [12:47:29] (03CR) 10Hashar: [C: 03+2] "checked manually locally" [integration/config] - 10https://gerrit.wikimedia.org/r/754905 (https://phabricator.wikimedia.org/T754896) (owner: 10Hashar) [12:47:36] 10Continuous-Integration-Config, 10translatewiki.net: Upgrade bundler to 2.x in rake-docker jobs - https://phabricator.wikimedia.org/T243280 (10Nikerabbit) 05Open→03Invalid translatewiki-rake-docker is not currently being run: https://phabricator.wikimedia.org/T280874#7628017 [12:49:50] (03Merged) 10jenkins-bot: dockerfiles: quibble images to use soname xdebug/apcu [integration/config] - 10https://gerrit.wikimedia.org/r/754905 (https://phabricator.wikimedia.org/T754896) (owner: 10Hashar) [12:51:13] (03CR) 10Hashar: "That is for T299389" [integration/config] - 10https://gerrit.wikimedia.org/r/754905 (https://phabricator.wikimedia.org/T754896) (owner: 10Hashar) [12:51:41] (03CR) 10Awight: [C: 03+1] ParallelCommand: Fallback to default of two workers (032 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 (owner: 10Kosta Harlan) [12:53:35] (03CR) 10Awight: Parallelism as a command object (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [12:55:54] (03CR) 10Awight: Parallelism as a command object (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [13:00:38] (03PS51) 10Awight: Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) [13:01:43] (03CR) 10Kosta Harlan: [C: 03+1] Parallelism as a command object [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [13:02:09] (03CR) 10Awight: "PS 51: fall back to serial execution if cpu count is unavailable. Use more stable os.cpu_count API." [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [13:19:04] (03CR) 10Hashar: "Successfully published image docker-registry.discovery.wmnet/releng/quibble-buster-php81:1.3.0-s1" [integration/config] - 10https://gerrit.wikimedia.org/r/754905 (https://phabricator.wikimedia.org/T754896) (owner: 10Hashar) [13:24:58] (03PS1) 10Hashar: jjb: switch Quibble jobs to 1.3.0 (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/754932 (https://phabricator.wikimedia.org/T299389) [13:25:25] (03PS1) 10Hashar: jjb: Pass --parallel-npm-install to Selenium jobs (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/754934 [13:46:00] 10Phabricator (Upstream), 10PHP 8.0 support: Wrap get_magic_quotes_gpc check in PhabricatorStartup.php (removed in PHP 8.0) - https://phabricator.wikimedia.org/T299399 (10Aklapper) [13:47:35] 10Phabricator, 10Security-Team, 10Security: Audit members of acl*security for more than x duration of no activity (Jan 2022) - https://phabricator.wikimedia.org/T299400 (10Aklapper) [13:56:02] (03CR) 10Awight: "What about running the NpmInstall in the same thread as each browser test? This avoids some of the inefficiency you pointed out, of fanni" [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [13:56:31] (03CR) 10Awight: "Should include a test." [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [13:57:10] (03CR) 10Awight: BrowserTests: Rework npm parallel install using ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [14:32:11] (03CR) 10Awight: [DNM] Add more dependencies to full run to check ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [14:32:46] (03CR) 10Awight: BrowserTests: Rework npm parallel install using ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [14:34:18] (03PS1) 10Hashar: dockerfiles: fix php 8.1 being used in php73 image [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) [14:34:57] (03CR) 10Hashar: "To be revisited, some image should probably not be renamed" [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [14:35:04] (03CR) 10Hashar: [C: 04-2] dockerfiles: fix php 8.1 being used in php73 image [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [14:37:54] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379 (10Nikerabbit) There is more flexibility now, but I am curious which of the issues still persist from the... [14:44:44] (03CR) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [14:46:26] (03CR) 10Awight: BrowserTests: Rework npm parallel install using ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [14:52:57] (03CR) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [15:00:03] !log Updating Jenkins jobs for Quibble 1.3.0 with proper PHP version in the images # T299389 [15:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:00:05] T299389: Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 [15:00:23] (03CR) 10Hashar: [C: 03+2] jjb: switch Quibble jobs to 1.3.0 (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/754932 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [15:02:27] (03Merged) 10jenkins-bot: jjb: switch Quibble jobs to 1.3.0 (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/754932 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [15:03:05] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) It is fixed by https://gerrit.wikime... [15:03:25] kostajh: I am going to apply --parallel-npm-install again [15:03:52] hashar: ok! [15:04:18] some of the images erroneously use php8.1 instead of php7.x , will dig into those later [15:04:28] but for the quibble ones that is fixed [15:13:12] kostajh: I have refreshed the jobs with `--parallel-npm-install` [15:13:17] (03CR) 10Hashar: [C: 03+2] "Jobs deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/754934 (owner: 10Hashar) [15:13:30] nice [15:13:49] will have to deal with the other broken images :/ [15:15:30] maintenance-disconnect-full-disks build 352565 integration-agent-docker-1008 (/: 20%, /srv: 100%, /var/lib/docker: 35%): OFFLINE due to disk space [15:16:15] (03Merged) 10jenkins-bot: jjb: Pass --parallel-npm-install to Selenium jobs (take 2) [integration/config] - 10https://gerrit.wikimedia.org/r/754934 (owner: 10Hashar) [15:20:35] maintenance-disconnect-full-disks build 352566 integration-agent-docker-1008 (/: 20%, /srv: 26%, /var/lib/docker: 34%): RECOVERY disk space OK [15:27:11] 10Quibble, 10MediaWiki-Core-Tests, 10Browser-Tests, 10MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), and 2 others: Run browser tests in parallel - https://phabricator.wikimedia.org/T226869 (10Lucas_Werkmeister_WMDE) [15:56:52] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) @hashar let me know when this is offline so i can take over [15:57:20] (03PS2) 10Hashar: dockerfiles: fix php 8.1 being used in php73 image [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) [16:02:33] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) @Papaul the machine is shutting down. I am on IRC if you want t... [16:03:47] PROBLEM - Host contint2001 is DOWN: PING CRITICAL - Packet loss = 100% [16:05:14] ^ host is under maintenance [16:05:56] hashar: I don't have to see how many test failures I caused! [16:06:35] I have shutdown CI entirely, so hopefully not much [16:08:39] hashar: when it runs, I probably broke everything [16:21:56] (03PS1) 10Ahmon Dancy: updated git merge-base comment [tools/scap] - 10https://gerrit.wikimedia.org/r/754982 [16:22:14] If someone with gerrit super powers and a bit of time is around, T298683 could use a look from someone who knows how to debug gerrit account status after a block/unblock cycle. [16:22:15] T298683: Account recovery help needed for Developer account Iniquity - https://phabricator.wikimedia.org/T298683 [16:23:18] bd808: Taking a look [16:23:26] <3 [16:31:07] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) [16:40:41] bd808: I did something and added a note to the ticket [16:42:05] dancy: thanks! I tried something similar with the rest api, but didn't know how to actually check to see if it had worked. [16:42:27] I have somehow managed to avoid a gerrit admin hat lo these many years :) [16:45:52] RECOVERY - Host contint2001 is UP: PING OK - Packet loss = 0%, RTA = 31.66 ms [16:46:51] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) reset IDRAC, uograde BIOS and IDRAC. [16:48:26] PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:50:32] hashar: Zuul came back and now it's being super weird [16:50:34] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/754909 [16:52:56] RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:54:32] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) 05Open→03Resolved a:03Papaul I have restarted ferm. Zuul... [16:56:44] RhinosF1: just recheck those [16:56:45] :) [16:56:56] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) @hashar no problem you can close the task once all is back onli... [16:57:35] RhinosF1: there is a race condition which is that Zuul starts processing events before Jenkins had the opportunity to register the jobs in Zuul [16:57:48] Ah [16:57:53] it is an ordering problem, Zuul should start after Jenkins [16:58:05] in most case it is not an issue, but on a fresh boot that tends to happen :] [16:58:09] those can be `recheck` [16:58:21] I am more or less of, but will watch here from time to time [16:58:33] ok [16:58:43] It's not showing the queue at moment to be ran [17:01:00] are Zuul and Jenkins supposed to be fully back up now or are they still starting up? [17:02:18] because Zuul(?) is stubbornly refusing to resume the gate-and-submit of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/754562 AFAICT [17:03:07] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) [17:04:55] Lucas_WMDE: I don't think it'll catch up past abandoned jobs [17:05:00] From when it was killed [17:05:06] yes, that’s why I left several new comments on the change [17:05:10] which I thought would restart it [17:05:29] Although https://gerrit.wikimedia.org/r/c/mediawiki/core/+/754909 isn't moving [17:05:36] recheck is all it'll hear [17:05:43] hashar: it's not joining the queue [17:09:18] yeah I think something isn’t working properly yet [17:10:53] (03CR) 10Ahmon Dancy: "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/754982 (owner: 10Ahmon Dancy) [17:20:58] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10jeena) @Jdlrobson Thanks for your writeup on the blockers, it's really helpful! Is there anything remaining to be done for T299352? [17:31:45] still nothing going on in Zuul… [17:42:17] (03CR) 10Ahmon Dancy: "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/754982 (owner: 10Ahmon Dancy) [17:49:34] now it’s working again \o/ (see discussion in -operations) [17:49:44] (03CR) 10Ahmon Dancy: [C: 03+2] updated git merge-base comment [tools/scap] - 10https://gerrit.wikimedia.org/r/754982 (owner: 10Ahmon Dancy) [17:51:34] (03Merged) 10jenkins-bot: updated git merge-base comment [tools/scap] - 10https://gerrit.wikimedia.org/r/754982 (owner: 10Ahmon Dancy) [18:01:03] !log added ryankemper as a member of the deployment-prep project [18:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:08:54] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10jeena) I am going ahead with the train after being advised T299352 is not currently a blocker. [18:13:21] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) From my opinion T299352 is still a blocker. [18:15:49] maintenance-disconnect-full-disks build 352596 integration-agent-docker-1013 (/: 24%, /srv: 95%, /var/lib/docker: 29%): OFFLINE due to disk space [18:20:51] maintenance-disconnect-full-disks build 352597 integration-agent-docker-1013 (/: 24%, /srv: 45%, /var/lib/docker: 1%): RECOVERY disk space OK [19:03:01] (03CR) 1020after4: [C: 03+2] stage-srv-mediawiki: also remove noc.w.org files [tools/release] - 10https://gerrit.wikimedia.org/r/754889 (owner: 10Giuseppe Lavagetto) [19:04:36] (03Merged) 10jenkins-bot: stage-srv-mediawiki: also remove noc.w.org files [tools/release] - 10https://gerrit.wikimedia.org/r/754889 (owner: 10Giuseppe Lavagetto) [19:05:28] maintenance-disconnect-full-disks build 352606 integration-agent-docker-1013 (/: 24%, /srv: 95%, /var/lib/docker: 9%): OFFLINE due to disk space [19:06:28] twentyafterfour: are you deploying to testwikis? [19:06:46] jeena: no but I can if you'd like [19:06:53] I didn't mean to do that :) [19:07:56] :P waiting on some fixes still [19:09:42] (03CR) 1020after4: [C: 03+1] mirror-repos.sh: Get the name of the default branch [tools/train-dev] - 10https://gerrit.wikimedia.org/r/754072 (owner: 10Ahmon Dancy) [19:10:33] (03CR) 1020after4: [C: 03+1] mirror-repos.sh: Move --prune into git_remote_update [tools/train-dev] - 10https://gerrit.wikimedia.org/r/754073 (owner: 10Ahmon Dancy) [19:10:44] maintenance-disconnect-full-disks build 352607 integration-agent-docker-1013 (/: 24%, /srv: 43%, /var/lib/docker: 9%): RECOVERY disk space OK [19:47:02] (03PS3) 10Hashar: dockerfiles: fix php 8.1 being used in php73 image [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) [19:52:07] 10Release-Engineering-Team (Seen), 10serviceops: contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) [19:52:12] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) [19:52:48] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10hashar) CI had to be restarted after the machine went up due to some od... [19:52:54] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Radar), 10SRE, 10ops-codfw, 10serviceops-radar: contint2001.mgmt disappeared from Icinga - https://phabricator.wikimedia.org/T298861 (10hashar) 05Stalled→03Resolved a:03jbond The DRAC on contint2001.wikimedia.org has been upgraded... [19:53:25] (03CR) 10Hashar: [C: 03+2] "I have checked the image, php is 7.3 but there are php8.1* packages installed which is annoying." [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [19:56:33] (03Merged) 10jenkins-bot: dockerfiles: fix php 8.1 being used in php73 image [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [19:56:49] !log building Docker images for https://gerrit.wikimedia.org/r/754951 [19:56:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:08:11] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Jdlrobson) [20:09:13] (03CR) 10Hashar: "Successfully published image docker-registry.discovery.wmnet/releng/composer-php73:0.4.0" [integration/config] - 10https://gerrit.wikimedia.org/r/754951 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [20:14:46] (03PS1) 10Hashar: jjb: fix php 8.1 packages used ini php73 images [integration/config] - 10https://gerrit.wikimedia.org/r/755016 (https://phabricator.wikimedia.org/T299389) [20:16:47] (03CR) 10Hashar: [C: 03+2] "Jobs updated" [integration/config] - 10https://gerrit.wikimedia.org/r/755016 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [20:17:44] one less incident [20:17:57] cause surely php73 container randomly running php 8.1 instead is a fault :] [20:18:17] I don't think those would have any impact [20:18:22] 10Continuous-Integration-Infrastructure, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): Wikibase CI broken due to missing PHP extensions: dom, intl, mbstring, xml, xmlreader, xmlwriter - https://phabricator.wikimedia.org/T299389 (10hashar) 05Open→03Resolved a:03hashar S... [20:19:17] (03Merged) 10jenkins-bot: jjb: fix php 8.1 packages used ini php73 images [integration/config] - 10https://gerrit.wikimedia.org/r/755016 (https://phabricator.wikimedia.org/T299389) (owner: 10Hashar) [20:49:14] 10Phabricator, 10Security-Team, 10Security: Audit members of acl*security for more than x duration of no activity (Jan 2022) - https://phabricator.wikimedia.org/T299400 (10Dsharpe) I can take this on, but I don't think I have access to run the query. I don't see a way to get the equivalent result through th... [21:29:18] 10Gerrit: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) [21:30:49] (03PS1) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) [21:34:17] (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [21:35:54] (03PS2) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) [21:37:47] (03CR) 10jerkins-bot: [V: 04-1] Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [21:41:09] (03PS1) 10Hashar: Update Gerrit to 3.3.9 [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T299451) [21:46:40] (03PS3) 10Hashar: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) [22:12:17] (03PS2) 10Hashar: Update Gerrit to 3.3.9 + plugins [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T240264) [22:13:32] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10Zabe) [22:13:44] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10Zabe) [22:13:55] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Zabe) [22:15:04] 10Gerrit, 10Patch-For-Review, 10Privacy, 10Upstream: Gerrit loads font from fonts.googleapis.com and fonts.gstatic.com - https://phabricator.wikimedia.org/T240264 (10hashar) [22:15:06] 10Gerrit, 10Patch-For-Review: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) [22:16:29] 10Phabricator (Upstream), 10PHP 8.0 support, 10Upstream: Wrap get_magic_quotes_gpc check in PhabricatorStartup.php (removed in PHP 8.0) - https://phabricator.wikimedia.org/T299399 (10mmodell) Already fixed upstream: https://we.phorge.it/rP67cf80b377bd33b5ff259fff26e09a3c1424f422 [22:16:34] 10Gerrit, 10Patch-For-Review: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) I have uploaded gerrit-3.3.9.war as well as our plugins build from https://gerrit.wikimedia.org/r/c/operations/software/gerrit/+/755024/3/