[00:40:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [00:45:02] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:02:07] zuul / jenkins seem not extremely happy. There is a huge "waiting" backlog in zuul but almost nothing happening in jenkins. [01:04:26] bd808: I'd be willing to try the gracefuul zuul reload https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart [01:06:16] no wait, it's still doing stuff [01:06:29] looking at tail -f -n100 /var/log/zuul/zuul.log like it says on wikitech [01:06:35] I see it is merging things [01:06:54] so I guess I take that back and just let it do its thing [01:07:04] yeah zuul seems ok really. there are just a lot of 'stuck' coverage and code health jobs [01:07:33] fundraising-civicrm-docker [01:09:02] cscott pushed a ton of things at nearly the same time -- https://gerrit.wikimedia.org/r/q/topic:%22po-getpageproperty%22+(status:open%20OR%20status:merged) -- might be part of it [01:10:39] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [01:11:23] ah, I bet it is then [01:20:41] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:25:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [08:12:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10serviceops: contint1001 and contint2001 need a newer version of Docker installed - https://phabricator.wikimedia.org/T300682 (10MoritzMuehlenhoff) >>! In T300682#7715571, @dduvall wrote: > @Muehlenhoff for some reason, it... [08:23:22] (03PS1) 10Legoktm: Add Echo as a dependency of OATHAuth for phan [integration/config] - 10https://gerrit.wikimedia.org/r/763458 [08:23:49] (03CR) 10Legoktm: "I messed up in f9260fcb61fedae7, forgot that phan has a separate list." [integration/config] - 10https://gerrit.wikimedia.org/r/763458 (owner: 10Legoktm) [08:29:59] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T300198 (10TTO) [08:37:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10serviceops: contint1001 and contint2001 need a newer version of Docker installed - https://phabricator.wikimedia.org/T300682 (10MoritzMuehlenhoff) These have now been imported: ` jmm@gin:~$ curl -s https://apt.wikimedia... [09:25:38] 10Release-Engineering-Team, 10Scap: Refactor scap sync-canary - https://phabricator.wikimedia.org/T301717 (10jnuche) a:03jnuche [09:51:05] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T300198 (10hashar) I went to push wmf.22 on all wikis since I have missed this task has been added as a blocker. Then as @TTO mentioned on T301936#7... [10:07:58] 10Project-Admins, 10User-Urbanecm: Create tag for server-side upload requests - https://phabricator.wikimedia.org/T295231 (10Aklapper) >>! In T295231#7514648, @Legoktm wrote: > The situation today is that chunked uploading is much more mature and probably the more reliable way to upload stuff (when it works th... [11:07:14] !log Disabled deployment-deploy03 Jenkins agent in order to revert some mediawiki/core patch and test the outcome [11:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:12:58] !log Bringing deployment-deploy03 back [11:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:41:06] 10Release-Engineering-Team: Update [[mw:Release_Manager_(Wikimedia)]] - https://phabricator.wikimedia.org/T301971 (10Aklapper) [12:48:52] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Jelto) [12:54:16] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Jelto) p:05Medium→03Low I cleaned up the old `gitlab-ansible-test` instance together with floating IP and disk. I also added some docs in https://wikitech.wik... [12:54:40] 10GitLab (Infrastructure), 10serviceops, 10Patch-For-Review: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Jelto) [13:16:31] (03CR) 10Reedy: [C: 03+2] Add Echo as a dependency of OATHAuth for phan [integration/config] - 10https://gerrit.wikimedia.org/r/763458 (owner: 10Legoktm) [13:18:58] (03Merged) 10jenkins-bot: Add Echo as a dependency of OATHAuth for phan [integration/config] - 10https://gerrit.wikimedia.org/r/763458 (owner: 10Legoktm) [13:20:13] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/763458 [13:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:33:00] (03CR) 10Hashar: [C: 03+2] "I have deployed the jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/763207 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [13:34:47] (03Merged) 10jenkins-bot: Define pipelines for datahub [integration/config] - 10https://gerrit.wikimedia.org/r/763207 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [13:35:04] !log Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/763207 [13:35:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:36:42] (03CR) 10Btullis: Define pipelines for datahub (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/763207 (https://phabricator.wikimedia.org/T301453) (owner: 10Btullis) [13:37:17] (03CR) 10Hashar: [C: 03+2] "Jobs updated" [integration/config] - 10https://gerrit.wikimedia.org/r/762482 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:39:06] (03Merged) 10jenkins-bot: qemu-run: use one line per qemu-system-x86_64 option [integration/config] - 10https://gerrit.wikimedia.org/r/762482 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:40:24] (03CR) 10Hashar: [C: 03+2] "I have deployed the jobs, triggered a build of fresh-test which ran on the old agent-qemu-1001 and I have confirmed /srv/vm-images/qemu-de" [integration/config] - 10https://gerrit.wikimedia.org/r/762483 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:40:33] (03CR) 10jerkins-bot: [V: 04-1] qemu-run: avoid copying image and faster disk IO [integration/config] - 10https://gerrit.wikimedia.org/r/762483 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:40:52] (03PS2) 10Hashar: qemu-run: avoid copying image and faster disk IO [integration/config] - 10https://gerrit.wikimedia.org/r/762483 (https://phabricator.wikimedia.org/T284774) [13:40:58] hashar: Apologies for jumping the gun with T301453 - Should I merge https://gerrit.wikimedia.org/r/c/analytics/datahub/+/762950 so that it will try again? [13:40:58] T301453: Create DataHub containers with deployment pipeline - https://phabricator.wikimedia.org/T301453 [13:41:02] (03CR) 10Hashar: [C: 03+2] qemu-run: avoid copying image and faster disk IO [integration/config] - 10https://gerrit.wikimedia.org/r/762483 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:41:10] (03PS2) 10Hashar: qemu-run: allocate more CPU to the VM [integration/config] - 10https://gerrit.wikimedia.org/r/762484 (https://phabricator.wikimedia.org/T284774) [13:42:53] (03Merged) 10jenkins-bot: qemu-run: avoid copying image and faster disk IO [integration/config] - 10https://gerrit.wikimedia.org/r/762483 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:51:04] (03PS3) 10Hashar: qemu-run: allocate more CPU to the VM [integration/config] - 10https://gerrit.wikimedia.org/r/762484 (https://phabricator.wikimedia.org/T284774) [13:52:53] (03CR) 10Hashar: [C: 03+2] "I have removed `-cpu max` since it is not supported on Qemu 2.8 on agent-qemu-1001. The fresh-test job is not necessarily faster but I be" [integration/config] - 10https://gerrit.wikimedia.org/r/762484 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [13:53:14] (03PS7) 10Hashar: jjb: adjust qemu-run.bash to use a qcow2 image [integration/config] - 10https://gerrit.wikimedia.org/r/759499 (https://phabricator.wikimedia.org/T284774) [13:54:38] (03Merged) 10jenkins-bot: qemu-run: allocate more CPU to the VM [integration/config] - 10https://gerrit.wikimedia.org/r/762484 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [14:10:54] (03PS8) 10Hashar: jjb: adjust qemu-run.bash to use a qcow2 image [integration/config] - 10https://gerrit.wikimedia.org/r/759499 (https://phabricator.wikimedia.org/T284774) [14:13:00] (03PS2) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:15:38] (03CR) 10jerkins-bot: [V: 04-1] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:24:15] (03PS3) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:25:16] (03CR) 10jerkins-bot: [V: 04-1] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:26:34] (03PS4) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:27:24] (03CR) 10jerkins-bot: [V: 04-1] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:29:13] 10Beta-Cluster-Infrastructure: Failed to fetch global:echo:seen:message:time:156760 : (0) (curl error: 7) Couldn't connect to server - https://phabricator.wikimedia.org/T301988 (10dom_walden) [14:32:27] (03PS5) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:35:44] 10Beta-Cluster-Infrastructure: Failed to fetch global:echo:seen:message:time:156760 : (0) (curl error: 7) Couldn't connect to server - https://phabricator.wikimedia.org/T301988 (10Majavah) [14:35:46] 10Beta-Cluster-Infrastructure: deployment-echostore01 periodically going offline - https://phabricator.wikimedia.org/T296013 (10Majavah) [14:36:24] 10Beta-Cluster-Infrastructure: Setup prometheus-mysqld-exporter on beta cluster - https://phabricator.wikimedia.org/T301989 (10dom_walden) [14:36:32] (03PS6) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [14:47:25] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T300198 (10hashar) [14:57:34] (03CR) 10Hashar: [C: 03+2] "I have pooled agent-qemu-1003 and unpooled agent-qemu-1001." [integration/config] - 10https://gerrit.wikimedia.org/r/759499 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [14:59:17] (03Merged) 10jenkins-bot: jjb: adjust qemu-run.bash to use a qcow2 image [integration/config] - 10https://gerrit.wikimedia.org/r/759499 (https://phabricator.wikimedia.org/T284774) (owner: 10Hashar) [15:16:49] Krinkle: fresh-test worked with node12 https://gerrit.wikimedia.org/r/c/fresh/+/763525 :] [15:30:08] hashar: nice! [15:32:25] I still have to figure out why docker pull is so slow [15:32:31] looks like disk writes are capped somehow [15:36:22] Hey folks. [15:36:24] Mornin [15:36:26] (US) [15:41:21] 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10AlexisJazz) [15:41:53] 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10AlexisJazz) [15:41:54] 10Beta-Cluster-Infrastructure, 10Quality-and-Test-Engineering-Team (QTE), 10SRE, 10Traffic, and 2 others: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy - https://phabricator.wikimedia.org/T293585 (10AlexisJazz) [15:46:05] (03CR) 10Ahmon Dancy: Remove obsolete "scap sync" command (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [15:48:07] (03CR) 10Ahmon Dancy: [C: 04-1] "Tested. Looks great. Just needs the Bug: in the commit message. Feel free to self-+2 after correcting that." [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (owner: 10Ahmon Dancy) [15:58:19] !log root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service # T301995 [15:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:58:22] T301995: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 [15:58:39] (03PS7) 10Jaime Nuche: Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (https://phabricator.wikimedia.org/T301716) (owner: 10Ahmon Dancy) [16:00:03] (03CR) 10Ahmon Dancy: [C: 03+1] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (https://phabricator.wikimedia.org/T301716) (owner: 10Ahmon Dancy) [16:00:43] (03CR) 10Jaime Nuche: [C: 03+2] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (https://phabricator.wikimedia.org/T301716) (owner: 10Ahmon Dancy) [16:01:19] (03CR) 10Jaime Nuche: [C: 03+2] Remove obsolete "scap sync" command (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (https://phabricator.wikimedia.org/T301716) (owner: 10Ahmon Dancy) [16:01:29] (03CR) 10Jaime Nuche: [V: 03+2 C: 03+2] Remove obsolete "scap sync" command [tools/scap] - 10https://gerrit.wikimedia.org/r/760611 (https://phabricator.wikimedia.org/T301716) (owner: 10Ahmon Dancy) [16:02:15] 10Release-Engineering-Team (Done by Feb 23🔥), 10Scap, 10Patch-For-Review: Delete scap sync command - https://phabricator.wikimedia.org/T301716 (10jnuche) 05Open→03Resolved [16:02:17] 10Release-Engineering-Team (Done by Feb 23🔥), 10Scap, 10Documentation: scap help needs updating - https://phabricator.wikimedia.org/T301343 (10jnuche) [16:05:07] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10Zabe) Since there seems to be a valid, I did the same mitigation as in T271808 and T293070. ` root@deployment-cache-up... [16:06:32] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic, 10HTTPS: The certificate for upload.wikimedia.beta.wmflabs.org expired on February 16, 2022. - https://phabricator.wikimedia.org/T301995 (10AlexisJazz) >>! In T301995#7718593, @Zabe wrote: > Since there seems to be a valid certificate, I did the same mitigat... [16:52:53] (03PS1) 10Hashar: jjb: update Quibble jobs from 1.3.0 to 1.4.0 [integration/config] - 10https://gerrit.wikimedia.org/r/763559 (https://phabricator.wikimedia.org/T300340) [16:54:15] (03CR) 10Hashar: "I am guessing I will deploy them on Friday morning." [integration/config] - 10https://gerrit.wikimedia.org/r/763559 (https://phabricator.wikimedia.org/T300340) (owner: 10Hashar) [17:05:56] (03CR) 10Dduvall: [C: 03+2] scap prep auto: Add staging fingerprint support [tools/scap] - 10https://gerrit.wikimedia.org/r/761451 (https://phabricator.wikimedia.org/T301417) (owner: 10Ahmon Dancy) [17:08:18] (03Merged) 10jenkins-bot: scap prep auto: Add staging fingerprint support [tools/scap] - 10https://gerrit.wikimedia.org/r/761451 (https://phabricator.wikimedia.org/T301417) (owner: 10Ahmon Dancy) [17:09:43] (03PS1) 10Ahmon Dancy: train-dev: Remove unused download_if_missing function [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763564 [17:14:50] 10Release-Engineering-Team (Done by Feb 23🔥), 10Scap: Add rollback mechanism to `scap prep auto` - https://phabricator.wikimedia.org/T301417 (10dduvall) p:05Triage→03Medium a:03dduvall [17:18:12] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T300198 (10hashar) 05Open→03Resolved Based on the log triage this is a success! [17:18:23] train is a success. I am off for the evening ;) [17:18:35] Have a good nice hashar! [17:18:43] thank you :-] [17:19:18] (03CR) 10Ahmon Dancy: [C: 03+2] train-dev: Remove unused download_if_missing function [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763564 (owner: 10Ahmon Dancy) [17:21:07] (03Merged) 10jenkins-bot: train-dev: Remove unused download_if_missing function [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763564 (owner: 10Ahmon Dancy) [17:29:23] (03PS1) 10Ahmon Dancy: avoid spamming the console when installing the helm diff plugin [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763569 [17:30:13] (03PS1) 10Ahmon Dancy: use rsync_helper in a couple more places [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763570 [17:30:22] (03PS1) 10Ahmon Dancy: Enable minikube registry addon [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763571 [17:30:44] (03CR) 10jerkins-bot: [V: 04-1] avoid spamming the console when installing the helm diff plugin [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763569 (owner: 10Ahmon Dancy) [17:32:53] (03PS2) 10Ahmon Dancy: avoid spamming the console when installing the helm diff plugin [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763569 [17:33:10] (03CR) 10jerkins-bot: [V: 04-1] Enable minikube registry addon [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763571 (owner: 10Ahmon Dancy) [17:33:23] (03CR) 10Ahmon Dancy: [C: 03+2] use rsync_helper in a couple more places [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763570 (owner: 10Ahmon Dancy) [17:34:16] (03Merged) 10jenkins-bot: use rsync_helper in a couple more places [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763570 (owner: 10Ahmon Dancy) [17:38:31] (03CR) 10Ahmon Dancy: [C: 03+2] avoid spamming the console when installing the helm diff plugin [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763569 (owner: 10Ahmon Dancy) [17:38:47] (03PS2) 10Ahmon Dancy: Enable minikube registry addon [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763571 [17:39:40] 10Continuous-Integration-Infrastructure, 10Jenkins, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, and 2 others: Add tests to check that all modules using require have required module files listed in `packageFiles` - https://phabricator.wikimedia.org/T301924 (10Jdlrobson) Don't think this is one is Ve... [17:39:59] (03Merged) 10jenkins-bot: avoid spamming the console when installing the helm diff plugin [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763569 (owner: 10Ahmon Dancy) [17:45:30] (03CR) 10Ahmon Dancy: [C: 03+2] Enable minikube registry addon [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763571 (owner: 10Ahmon Dancy) [17:46:21] (03Merged) 10jenkins-bot: Enable minikube registry addon [tools/train-dev] - 10https://gerrit.wikimedia.org/r/763571 (owner: 10Ahmon Dancy) [17:51:22] 10GitLab (Administration, Settings & Policy), 10Release-Engineering-Team, 10Upstream, 10User-brennen: GitLab group permissions are not inherited by sub-groups for groups of users invited to the parent repo - https://phabricator.wikimedia.org/T300939 (10brennen) [17:59:22] 10GitLab (Integrations), 10Release-Engineering-Team (Seen): Gerritlab - https://phabricator.wikimedia.org/T300819 (10brennen) p:05Triage→03Low Having skimmed the README, this doesn't look like it would specifically require any action on the GitLab administration side for contributors to use. I'm not sure... [18:01:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [18:01:51] 10GitLab (Misc), 10Release-Engineering-Team (Seen): Gerritlab - https://phabricator.wikimedia.org/T300819 (10brennen) [18:08:23] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:16:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [18:36:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [18:52:39] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:13:22] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10brennen) Capturing a couple of points from this morning's discussion: - Drastically increas... [19:17:55] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10brennen) (Noting that stack trace thoughts above are more geared towards PHP errors than the c... [19:19:09] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite) Had a meeting about this today. Key takeaways for me: # The workflow improvemen... [19:19:27] 10Phabricator: Change Phabricator username - https://phabricator.wikimedia.org/T302022 (10mdipietro) [19:35:52] 10Continuous-Integration-Config, 10MediaWiki-extensions-Page_Forms: Add phan to PageForms - https://phabricator.wikimedia.org/T228155 (10Daimona) 05Open→03Resolved [20:03:17] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Next), 10User-brennen: Provision untrusted GitLab job runners to handle user-level projects and merge requests from forks - https://phabricator.wikimedia.org/T297426 (10brennen) @thcipriani we discussed creating this task yesterday; I'd forgotten it al... [20:03:39] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Next), 10User-brennen: Provision untrusted instance-wide GitLab job runners to handle user-level projects and merge requests from forks - https://phabricator.wikimedia.org/T297426 (10brennen) [20:06:09] (03PS1) 10Hashar: jjb: compress quibble full run debug log [integration/config] - 10https://gerrit.wikimedia.org/r/763592 [20:23:26] PROBLEM - Check systemd state on doc1001 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-doc1002.eqiad.wmnet.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:41:56] Hi releng, what's the process for requesting a new user to be part of a gerrit group? In this case we'd like to add Anthony Quhen (aqu) to analytics (https://gerrit.wikimedia.org/r/admin/groups/d34747bee94be39cff54b5fda1ae36b575107792,members) [20:45:25] razzi: https://phabricator.wikimedia.org/project/view/3957/ [20:46:31] Thanks p858snake ! [21:19:50] 10Phabricator: Change Phabricator username - https://phabricator.wikimedia.org/T302022 (10Aklapper) 05Open→03Resolved a:03Aklapper Hej hej! :) Done. You may also change further information on https://phabricator.wikimedia.org/people/editprofile/29609/ if wanted. [21:20:05] RECOVERY - Check systemd state on doc1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:28:26] 10Continuous-Integration-Config: PHP notices should fail CI tests - https://phabricator.wikimedia.org/T302033 (10Huji) [21:34:32] can haz acl*repository-admins on phabricator? :) [21:35:14] I want to test pushing over https [21:35:20] and make a new repo for that [21:47:11] mutante: yeah, one second [21:47:47] brennen: :)) thanks! [21:48:18] !log added Dzahn (mutante) to acl*repository-admins on phabricator [21:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:49:42] brennen: confirmed it works:) ty! [21:50:06] sure thing. :) [22:28:19] (03PS1) 10Ahmon Dancy: WIP: build/push/deploy container images [tools/scap] - 10https://gerrit.wikimedia.org/r/763609 [22:29:54] (03CR) 10jerkins-bot: [V: 04-1] WIP: build/push/deploy container images [tools/scap] - 10https://gerrit.wikimedia.org/r/763609 (owner: 10Ahmon Dancy) [22:30:46] 10Continuous-Integration-Config: PHP notices should fail CI tests - https://phabricator.wikimedia.org/T302033 (10Zabe) This is suppossed to be the case, since `convertNoticesToExceptions="true"` is being set in `phpunit.xml.dist` and `tests / phpunit / suite.xml`. But apparently it isn't working. [22:57:26] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10colewhite)