[00:07:08] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T293960 (10jeena) [00:09:27] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T293960 (10jeena) [01:02:30] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10ppelberg) [01:04:08] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Zabe) [03:16:11] 10Continuous-Integration-Infrastructure, 10Browser-Tests, 10User-zeljkofilipin: specFileRetries does not seem to apply in extension wdio.conf.js - https://phabricator.wikimedia.org/T296826 (10Krinkle) >>! In T296826#7541133, @zeljkofilipin wrote: > @Krinkle, @Catrope do you agree to add @kostajh to the packa... [04:28:43] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:58:54] 10Project-Admins: Create a generic #SVG tag - https://phabricator.wikimedia.org/T287930 (10Volker_E) @Aklapper Hi, one of the various examples is {T299370} General SVG standard usage in our software discussing tasks, similar to #CSS. Tasks that are not [[ https://phabricator.wikimedia.org/project/profile/211/ |... [06:05:06] 10Project-Admins: Create a generic #SVG tag - https://phabricator.wikimedia.org/T287930 (10Volker_E) [06:14:54] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Tgr) [06:31:03] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:04:28] (03PS1) 10Hashar: Run git-fat pull for gerrit deploy/wmf branches [integration/config] - 10https://gerrit.wikimedia.org/r/755317 [09:08:22] (03CR) 10Hashar: [C: 03+2] "Tried it and it works, will prevent us from merging patches that that do not had artifacts uploaded yet." [integration/config] - 10https://gerrit.wikimedia.org/r/755317 (owner: 10Hashar) [09:10:18] (03Merged) 10jenkins-bot: Run git-fat pull for gerrit deploy/wmf branches [integration/config] - 10https://gerrit.wikimedia.org/r/755317 (owner: 10Hashar) [09:21:37] (03CR) 10Hashar: [C: 03+2] Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [09:25:02] (03CR) 10Hashar: "recheck should trigger git fat pull" [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [09:27:13] (03CR) 10Hashar: [C: 03+2] Update Gerrit to 3.3.9 + plugins [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [09:28:30] (03Merged) 10jenkins-bot: Merge tag 'v3.3.9' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755024 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [09:28:33] (03Merged) 10jenkins-bot: Update Gerrit to 3.3.9 + plugins [software/gerrit] (deploy/wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/755028 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [09:32:29] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:40:16] bah poor mgmt interface [10:01:16] 10Gerrit, 10Patch-For-Review: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) I have updated the replica: ` [2022-01-19T09:59:02.105Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review [replica] 3.3.9 ready ` `ssh -p 29418 gerrit-replica.wikimedia.org ger... [10:07:29] 10Gerrit, 10Patch-For-Review: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) I will upgrade the Gerrit primary later. [10:15:01] 10Beta-Cluster-Infrastructure: beta-scap-sync-world job is stuck, beta cluster not updated - https://phabricator.wikimedia.org/T299485 (10kostajh) p:05Triage→03High [10:15:33] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: beta-scap-sync-world job is stuck, beta cluster not updated - https://phabricator.wikimedia.org/T299485 (10Majavah) [10:22:49] (03CR) 10Kosta Harlan: ParallelCommand: Fallback to default of two workers (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 (owner: 10Kosta Harlan) [10:22:52] (03Abandoned) 10Kosta Harlan: ParallelCommand: Fallback to default of two workers [integration/quibble] - 10https://gerrit.wikimedia.org/r/754873 (owner: 10Kosta Harlan) [10:30:22] (03CR) 10Kosta Harlan: [C: 03+1] Parallelism as a command object (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/587885 (https://phabricator.wikimedia.org/T235449) (owner: 10Awight) [10:32:41] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [10:32:47] (03PS21) 10Kosta Harlan: Split extension and skin npm and composer tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/587888 (owner: 10Awight) [10:33:07] (03PS5) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:38:28] !log kill some stuck jobs T299485 [10:38:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:38:30] T299485: beta-scap-sync-world job is stuck, beta cluster not updated - https://phabricator.wikimedia.org/T299485 [10:40:14] (03PS6) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:40:23] (03PS11) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:40:34] (03CR) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:41:52] thanks Reedy [10:47:25] kostajh: Should all be up to date now... [10:48:53] (03PS7) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:48:57] (03PS12) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:50:34] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: beta-scap-sync-world job is stuck, beta cluster not updated - https://phabricator.wikimedia.org/T299485 (10kostajh) 05Open→03Resolved a:03kostajh Seems like this is fixed, thanks @Reedy [10:52:00] (03CR) 10jerkins-bot: [V: 04-1] BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [10:53:59] (03PS8) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:54:05] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:54:15] (03PS13) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [10:57:03] (03CR) 10jerkins-bot: [V: 04-1] BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 (owner: 10Kosta Harlan) [10:57:30] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [10:58:27] (03PS9) 10Kosta Harlan: BrowserTests: Rework npm parallel install using ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/754866 [10:58:33] (03PS14) 10Kosta Harlan: [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 [11:04:38] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T293960 (10Lucas_Werkmeister_WMDE) [11:09:55] 10Quibble: Switch QUnit tests to use Apache backend - https://phabricator.wikimedia.org/T299491 (10kostajh) [11:10:06] 10Quibble: Switch QUnit tests to use Apache backend - https://phabricator.wikimedia.org/T299491 (10kostajh) [11:10:10] 10Continuous-Integration-Config, 10Quibble, 10MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), 10Patch-For-Review: Switch all Quibble Selenium jobs to use apache - https://phabricator.wikimedia.org/T285649 (10kostajh) [11:11:06] 10Quibble: Quibble ci-full-run should use Apache backend - https://phabricator.wikimedia.org/T299492 (10kostajh) [11:14:36] (03CR) 10jerkins-bot: [V: 04-1] [DNM] Add more dependencies to full run to check ParallelCommand [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [11:23:37] (03CR) 10Kosta Harlan: "Filed T299492 about the failure here" [integration/quibble] - 10https://gerrit.wikimedia.org/r/738066 (owner: 10Kosta Harlan) [11:24:36] 10Quibble: Switch QUnit tests to use Apache backend - https://phabricator.wikimedia.org/T299491 (10kostajh) [11:24:38] 10Quibble: Quibble ci-full-run should use Apache backend - https://phabricator.wikimedia.org/T299492 (10kostajh) [11:34:06] 10Phabricator (Upstream), 10PHP 8.0 support, 10Upstream: Wrap get_magic_quotes_gpc check in PhabricatorStartup.php (removed in PHP 8.0) - https://phabricator.wikimedia.org/T299399 (10Aklapper) 05Open→03Invalid Eh, thanks! I guess I'll need to spend time to locally switch to the we.phorge.it version, as I... [11:49:14] 10Scap, 10SRE: scap fails deployments on bullseye/python 3.9 - https://phabricator.wikimedia.org/T299501 (10Joe) [11:57:48] 10Scap, 10SRE: scap fails deployments on bullseye/python 3.9 - https://phabricator.wikimedia.org/T299501 (10Joe) The problem arises because pyyaml version 5.3.1 by default uses the safe loader for python objects, so to make the yaml load we need to change the code from: ` yaml.load(dump) ` to ` yaml.load(du... [12:16:52] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T293960 (10Lucas_Werkmeister_WMDE) [13:28:43] 10Gerrit, 10Privacy, 10Upstream: Gerrit loads font from fonts.googleapis.com and fonts.gstatic.com - https://phabricator.wikimedia.org/T240264 (10hashar) [13:28:54] 10Gerrit: Upgrade Gerrit from 3.3.6 to 3.3.9 - https://phabricator.wikimedia.org/T299451 (10hashar) 05Open→03Resolved Both primary and replica are now running 3.3.9. [13:29:50] 10Gerrit, 10Release-Engineering-Team (Seen): Use upstream Gerrit.war instead of building our own - https://phabricator.wikimedia.org/T268019 (10hashar) [13:29:56] 10Gerrit, 10Privacy, 10Upstream: Gerrit loads font from fonts.googleapis.com and fonts.gstatic.com - https://phabricator.wikimedia.org/T240264 (10hashar) 05Open→03Resolved a:03hashar The patch is now included in the gitiles plugin by upstream. The Gerrit 3.3.9 deployment I have completed a minute ago u... [13:59:01] (03PS1) 10Giuseppe Lavagetto: deploy: explicitly use the unsafe loader in yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/755388 (https://phabricator.wikimedia.org/T299501) [14:50:32] 10Continuous-Integration-Infrastructure, 10Browser-Tests, 10User-zeljkofilipin: specFileRetries does not seem to apply in extension wdio.conf.js - https://phabricator.wikimedia.org/T296826 (10zeljkofilipin) Thanks @Krinkle! Welcome @kostajh to the package collaborators! [14:57:01] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Radar), 10Security-Team, 10serviceops, and 2 others: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10Jelto) [15:07:10] PROBLEM - SSH on contint1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [15:19:08] RECOVERY - SSH on contint1001 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [15:40:49] PROBLEM - SSH on contint1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [15:41:09] (03CR) 10Ahmon Dancy: [C: 03+2] deploy: explicitly use the unsafe loader in yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/755388 (https://phabricator.wikimedia.org/T299501) (owner: 10Giuseppe Lavagetto) [15:43:23] (03Merged) 10jenkins-bot: deploy: explicitly use the unsafe loader in yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/755388 (https://phabricator.wikimedia.org/T299501) (owner: 10Giuseppe Lavagetto) [15:49:14] (03PS1) 10Giuseppe Lavagetto: make-container-image: simplify fixing symlinks [tools/release] - 10https://gerrit.wikimedia.org/r/755401 [16:25:47] 10Project-Admins, 10Data-Engineering: Make EChetty Editor of Data-Catalog workboard - https://phabricator.wikimedia.org/T299541 (10odimitrijevic) [16:26:57] 10Project-Admins, 10Data-Engineering: Create a workboard for Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10odimitrijevic) Thanks so much! Can you please also add @Echetty, our product manager as a trusted contributor? [16:35:59] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) [16:36:23] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) p:05Triage→03Unbreak! There are bunch of processes such as `/usr/bin/node /opt/lib/node_modules/jest-worker/build/workers/processChild... [16:46:02] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) I think that comes from termbox, the first event in zuul would be: 2022-01-19 14:45:40,777 DEBUG zuul.DependentPipelineManager: Found job... [16:46:51] 10Project-Admins, 10Data-Engineering: Make EChetty Editor of Data-Catalog workboard - https://phabricator.wikimedia.org/T299541 (10odimitrijevic) 05Open→03Invalid Closing as duplicate of T299357 [16:52:24] 10Project-Admins, 10Data-Engineering: Make EChetty Editor of Data-Catalog workboard - https://phabricator.wikimedia.org/T299541 (10Aklapper) (Feel free to {nav icon=anchor,name=Edit Related Tasks... > Close As Duplicate} in the upper right corner. Thanks!) [16:52:38] 10Project-Admins, 10Data-Engineering: Create a workboard for Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10Aklapper) [16:52:42] 10Project-Admins, 10Data-Engineering: Make EChetty Editor of Data-Catalog workboard - https://phabricator.wikimedia.org/T299541 (10Aklapper) [16:53:08] 10Project-Admins, 10Data-Engineering: Create a workboard for Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10Aklapper) >>! In T299357#7633085, @odimitrijevic wrote: > Can you please also add @Echetty, our product manager as a trusted contributor? {{Done}} [16:54:04] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) a:03Dzahn [16:54:53] (03CR) 10Ahmon Dancy: [C: 03+2] make-container-image: simplify fixing symlinks [tools/release] - 10https://gerrit.wikimedia.org/r/755401 (owner: 10Giuseppe Lavagetto) [16:55:00] 10Project-Admins, 10Data-Engineering: Allow folks to create/edit workboard for #Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10Aklapper) [16:55:08] 10Project-Admins, 10Data-Engineering: Allow folks to create/edit workboard for #Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10Aklapper) 05Open→03Resolved [16:55:34] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) Hello #ops-eqiad contint1001.wikimedia.org is unresponsive. Moritz tried to reach it out through the serial console but it... [16:55:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10Lucas_Werkmeister_WMDE) This was probably caused for https://integration.wikimedia.org/ci/job/termbox-pipeline-rehearse/92/console,... [16:56:08] (03Merged) 10jenkins-bot: make-container-image: simplify fixing symlinks [tools/release] - 10https://gerrit.wikimedia.org/r/755401 (owner: 10Giuseppe Lavagetto) [16:59:39] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:03:12] is there anything I can do to my puppet patch waiting for PCC that's been stuck as "queued" in Zuul for half an hour now? [17:06:58] I think contint1001 is having issues at the moment. May or may not be related. What's the id of your change? [17:07:42] 745199 [17:17:06] taavi: I ran the pcc job manually for your change. Results are here: https://puppet-compiler.wmflabs.org/pcc-worker1002/33329/ [17:17:37] thanks! [17:21:34] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) per the gitlab IC meeting we had today, the plan is: - give Jelto / Arnold / Brennen access to existing project "devtools" that has Gerrit and Phabricator... [17:22:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) @Cmjohnson acknowledged the issue and will be able to restart the host in a couple hours. The services offered by contint1... [17:22:48] dancy: yeah we have lost contint1001 filed as https://phabricator.wikimedia.org/T299542 [17:22:51] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) 05Open→03In progress [17:22:54] the host will be powercycled later today [17:23:13] but jobs should be runnable on contint2001 [17:23:19] so we are more or less covered hopefully [17:25:05] maybe you can help me understand something. Why would Taavi's change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/745199) would wait so long for the PCC compiler when all the PCC nodes appear to be idle. [17:26:45] HMM [17:28:14] maybe check experimental is broken [17:28:55] RECOVERY - SSH on contint1001 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [17:28:58] usually I look at zuul debug log [17:29:17] PROBLEM - Check systemd state on contint1001 is CRITICAL: CRITICAL - starting: Late bootup, before the job queue becomes idle for the first time, or one of the rescue targets are reached. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:29:47] https://integration.wikimedia.org/zuul/#q=745199 shows "operations-puppet-catalog-compiler-test" with state "queued" [17:29:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10Joe) No need for further restarts, I was able to powercycle the server using ipmi. @Cmjohnson you don't need to do anything :) [17:30:00] taking a look at the debug log now. [17:30:09] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10Joe) 05Open→03Resolved [17:30:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) Thank you very much for the powercycle. [17:31:17] !log Adding https://integration.wikimedia.org/ci/computer/contint1001/ back to the pool after the machine got powercycled # T299542 [17:31:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:31:20] T299542: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 [17:35:35] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) [x] added @Jelto @Arnoldokoth and @brennen to project `devtools`, with admin privileges After logging out and back in on Horizon you should see the new pr... [17:53:46] 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) [17:54:31] 10GitLab, 10serviceops: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659 (10Dzahn) [17:54:37] 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) [17:54:45] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) [17:55:25] 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) oh yea, one more reason is also that we will want to test whether gitlab-runner (as opposed to gitlab-server) will... [17:55:42] 10GitLab, 10GitLab-Test, 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) [17:58:55] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) 05In progress→03Stalled And yes, we are at quota limit. I think it's just "instance count" and not more specific about CPU/disk etc. Part of the reaso... [18:00:01] RECOVERY - Check systemd state on contint1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:01:32] 10Release-Engineering-Team (Seen), 10MediaWiki-extensions-UserMerge, 10Stewards-and-global-tools, 10MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), 10Patch-For-Review: Undeploy UserMerge Extension from WMF production - https://phabricator.wikimedia.org/T216089 (10Majavah) [18:04:02] 10Release-Engineering-Team (Seen), 10MediaWiki-extensions-UserMerge, 10Stewards-and-global-tools, 10MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), 10Patch-For-Review: Undeploy UserMerge Extension from WMF production - https://phabricator.wikimedia.org/T216089 (10Majavah) [18:04:18] 10Release-Engineering-Team (Next), 10Wikimedia-Site-requests, 10WikimediaMessages, 10EngProd-Virtual-Hackathon, 10MW-1.38-notes (1.38.0-wmf.17; 2022-01-10): Put "shim" code for namespaces, logs, and log i18n into WikimediaMessages so we can undeploy extensions - https://phabricator.wikimedia.org/T222918 (... [18:11:04] 10Project-Admins, 10Data-Engineering: Allow folks to create/edit workboard for #Data-Catalog component - https://phabricator.wikimedia.org/T299357 (10odimitrijevic) Wonderful! Thank you so much! [18:17:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10ops-eqiad: contint1001.wikimedia.org is almost unresponsive - https://phabricator.wikimedia.org/T299542 (10hashar) I think one of the follow up action is T290608 which is that obsolete intermediate Docker layers and containers a... [18:20:04] 10GitLab, 10GitLab-Test, 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10bd808) You can see quota vs usage at https://openstack-browser.toolforge.org/project/devtool... [18:20:11] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Patch-For-Review: Pipeline lib still leaks containers on contint1001 / contint2001 - https://phabricator.wikimedia.org/T290608 (10hashar) contint1001 went unresponsive today. A series of change got send that... [18:20:52] !log Adding https://integration.wikimedia.org/ci/computer/contint1001/ back to the pool again [18:20:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:51:28] (03PS20) 10Kosta Harlan: [WIP] Run PHPUnit tests in parallel [integration/quibble] - 10https://gerrit.wikimedia.org/r/742200 (https://phabricator.wikimedia.org/T50217) [18:51:35] (03PS42) 10Kosta Harlan: [DNM] CI full run with extensions [integration/quibble] - 10https://gerrit.wikimedia.org/r/742201 [18:55:04] gerrit upgraded (3.3.6 > 3.3.9 ) - contint1001 outage - ton of meetings [18:55:09] I call it a day [18:58:24] 10Phabricator, 10serviceops, 10Patch-For-Review: move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597 (10Aklapper) [18:58:32] 10Phabricator, 10Release-Engineering-Team (Next), 10serviceops: Deprecate git-ssh service on phabricator.wikimedia.org - https://phabricator.wikimedia.org/T296022 (10Aklapper) [18:59:02] (03CR) 10jerkins-bot: [V: 04-1] [DNM] CI full run with extensions [integration/quibble] - 10https://gerrit.wikimedia.org/r/742201 (owner: 10Kosta Harlan) [19:05:18] hashar: Have a good night. [19:06:09] almost :] [19:08:12] dancy: for contint1001 I guess we can reclaim bunch of disk via a `docker image prune` [19:08:39] and the CI agent I think that is triggered by the maintenance full disk job [19:08:58] and the WMCS agents have it in a weekly cron (with `docker container prune` as well) [19:09:18] on contint1001 / contint2001 they are never garbage collected cause there is no such cron [19:09:40] and the maintenance full disk job would not garbage collect since there is plenty of disk space on the docker partition [19:09:49] so the job logic is always under the threshold [19:14:02] oh [19:14:14] that is exactly what dancy `ëocker system prune` is doing [19:14:39] /srv/docker/image/overlay2/layerdb/sha256 and /srv/docker/image/overlay2/imagedb/content/sha256 are shrinking [19:14:40] :] [19:15:02] so indeed time for a good night [19:15:21] ah [19:15:30] I haven't dig into that one since I was in a training before our team meeting [19:15:38] sorry wrong backscroll [19:16:57] 10Project-Admins, 10Infrastructure-Foundations, 10PM, 10Puppet: Clarify Puppet tag - https://phabricator.wikimedia.org/T295221 (10Aklapper) @joanna_borun: ping [19:18:19] about ppc on https://gerrit.wikimedia.org/r/c/operations/puppet/+/745199 greeping for the change number on contint2001 in /var/log/zuul/zuul.log gives some clues [19:19:30] taavi commented `check experimental` in Gerrit [19:20:01] the event reaches the Zuul scheduler which processes the comment event which matches the experimental pipeline. The change is added to the pipe: [19:20:10] 2022-01-19 16:35:06,253 INFO zuul.Scheduler: Adding operations/puppet, to [19:21:13] it is launched a while later: [19:21:15] 2022-01-19 17:53:37,505 INFO zuul.Gearman: Launch job operations-puppet-catalog-compiler-test (uuid: f435a99ed8f246e5babf5ef165f7b8c7) for change with dependent changes [] [19:22:54] 10GitLab, 10GitLab-Test, 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) a:03Dzahn OK, thanks bd808! Will do and come back to this. [19:23:54] and well Zuul had too many jobs request at that time https://grafana.wikimedia.org/d/000000322/zuul-gearman?viewPanel=10&orgId=1&from=1642604869791&to=1642620211303 [19:23:57] 10GitLab, 10serviceops: upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659 (10Dzahn) [19:24:05] 10GitLab, 10GitLab-Test, 10Release-Engineering-Team (Radar), 10serviceops-radar, 10Cloud-VPS (Quota-requests): Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 (10Dzahn) 05Open→03In progress [19:24:09] since `experimental` pipeline has a low precedence, it is functions are run last [19:24:13] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) [19:24:28] the python based gearman server first runs all functions marked `high` precedence [19:24:33] then the `normal` precedence [19:24:37] and finally the `low` precedence [19:24:52] so as long as they are jobs in the high or normal queues, nothing runs from the low queue [19:26:07] tldr: requests to puppet compiler via check experimental have low precedence and would run after everything else [19:26:44] I am off for real now [19:34:10] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10kostajh) [19:34:19] 10Deployments, 10Release-Engineering-Team: Deployment calendar: MediaWiki branch cut reported as between 3-4 AM UTC but actually it seems to be between 2-3 AM - https://phabricator.wikimedia.org/T297724 (10Aklapper) @thcipriani Should this remain open per last comment? [20:02:22] Hmm.. Thanks for the info about the prioritization Antoine. [20:09:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Patch-For-Review: Pipeline lib still leaks containers on contint1001 / contint2001 - https://phabricator.wikimedia.org/T290608 (10dancy) >>! In T290608#7633747, @hashar wrote: > @dancy issued a `docker syste... [20:14:28] 10Phabricator, 10iNaturalist: Rename Phabricator Board from iNaturalist to Wikiproject Biodiverstiy - https://phabricator.wikimedia.org/T299578 (10Andrawaag) [20:16:42] 10Release-Engineering-Team (Seen), 10MediaWiki-extensions-UserMerge, 10Stewards-and-global-tools, 10MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), 10Patch-For-Review: Undeploy UserMerge Extension from WMF production - https://phabricator.wikimedia.org/T216089 (10Majavah) a:03Majavah [20:29:44] hey folks! there seems to be a patch stuck on queue (the SecurePoll one) https://integration.wikimedia.org/zuul/ [20:34:28] rebasing did the trick [20:36:52] (03PS1) 10Ahmon Dancy: DNM: Test a failing pipeline [tools/scap] - 10https://gerrit.wikimedia.org/r/755484 (https://phabricator.wikimedia.org/T290608) [20:40:05] (03CR) 10jerkins-bot: [V: 04-1] DNM: Test a failing pipeline [tools/scap] - 10https://gerrit.wikimedia.org/r/755484 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [20:54:57] cool, ty zabe :) [20:55:01] (03PS1) 10MSantos: add parsoid as dependency for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/755487 [20:55:33] (03PS2) 10MSantos: add parsoid as dependency for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/755487 [21:14:55] (03PS3) 10Ahmon Dancy: Enforce pipefail on all run step commands [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/702778 (https://phabricator.wikimedia.org/T290608) (owner: 10Dduvall) [21:20:28] (03CR) 10Ahmon Dancy: [C: 03+2] Enforce pipefail on all run step commands [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/702778 (https://phabricator.wikimedia.org/T290608) (owner: 10Dduvall) [21:21:31] (03Merged) 10jenkins-bot: Enforce pipefail on all run step commands [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/702778 (https://phabricator.wikimedia.org/T290608) (owner: 10Dduvall) [21:21:56] (03CR) 10Ahmon Dancy: "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/755484 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [21:53:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Patch-For-Review: Pipeline lib still leaks containers on contint1001 / contint2001 - https://phabricator.wikimedia.org/T290608 (10dancy) Discussed with @dduvall today. The problem is that if a pipeline step... [22:08:09] (03PS1) 10Ahmon Dancy: PipelineRunner.run(): Forcibly remove container on exception [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/755496 (https://phabricator.wikimedia.org/T290608) [22:15:21] (03CR) 10Dduvall: [C: 03+2] PipelineRunner.run(): Forcibly remove container on exception [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/755496 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [22:16:00] (03Merged) 10jenkins-bot: PipelineRunner.run(): Forcibly remove container on exception [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/755496 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [22:16:39] (03CR) 10Ahmon Dancy: "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/755484 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [22:17:15] 10Project-Admins: Create project tag for Data-Engineering - https://phabricator.wikimedia.org/T287531 (10Aklapper) Followup is T298671 [22:18:57] (03Abandoned) 10Ahmon Dancy: DNM: Test a failing pipeline [tools/scap] - 10https://gerrit.wikimedia.org/r/755484 (https://phabricator.wikimedia.org/T290608) (owner: 10Ahmon Dancy) [22:20:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Release Pipeline, 10Patch-For-Review: Pipeline lib still leaks containers on contint1001 / contint2001 - https://phabricator.wikimedia.org/T290608 (10dancy) 05Open→03Resolved a:03dancy There are more improvements to the cl... [23:29:11] (03PS1) 10Bartosz Dziewoński: Zuul: [DiscussionTools] Add Gadgets as dependency for Phan jobs [integration/config] - 10https://gerrit.wikimedia.org/r/755502 [23:29:43] (03CR) 10Bartosz Dziewoński: "Needed for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/755494" [integration/config] - 10https://gerrit.wikimedia.org/r/755502 (owner: 10Bartosz Dziewoński) [23:32:01] (03CR) 10jerkins-bot: [V: 04-1] Zuul: [DiscussionTools] Add Gadgets as dependency for Phan jobs [integration/config] - 10https://gerrit.wikimedia.org/r/755502 (owner: 10Bartosz Dziewoński)