[05:44:24] (03PS1) 10Legoktm: dockerfiles: Upgrade Rust images to 1.56.1 [integration/config] - 10https://gerrit.wikimedia.org/r/736326 [05:44:26] (03PS1) 10Legoktm: jjb: Use Rust 1.56.1 images [integration/config] - 10https://gerrit.wikimedia.org/r/736327 [09:06:39] (03CR) 10Hashar: [C: 03+2] "Neat" [integration/config] - 10https://gerrit.wikimedia.org/r/736326 (owner: 10Legoktm) [09:08:24] (03Merged) 10jenkins-bot: dockerfiles: Upgrade Rust images to 1.56.1 [integration/config] - 10https://gerrit.wikimedia.org/r/736326 (owner: 10Legoktm) [09:18:54] 10Project-Admins: Create project tag for Airflow - https://phabricator.wikimedia.org/T294781 (10Aklapper) @mforns: Requested public project #Airflow has been created: #Airflow. I assume this will supersede the "Airflow" column on https://phabricator.wikimedia.org/tag/analytics/ ? Should tickets in that column b... [09:30:39] (03CR) 10Hashar: "Successfully published image docker-registry.discovery.wmnet/releng/rust:1.56.1-1" [integration/config] - 10https://gerrit.wikimedia.org/r/736326 (owner: 10Legoktm) [09:31:34] (03CR) 10Hashar: [C: 03+2] "INFO:jenkins_jobs.builder:Reconfiguring jenkins job rust-coverage-publish" [integration/config] - 10https://gerrit.wikimedia.org/r/736327 (owner: 10Legoktm) [09:32:59] 10Release-Engineering-Team (Seen), 10Scap, 10User-brennen: Investigate scap cluster_ssh idling until pressing ENTER repeatedly - https://phabricator.wikimedia.org/T223287 (10hashar) 05Open→03Declined After talking about it again with others, there is no proof pressing {key RETURN} actually makes thing an... [09:33:17] (03Merged) 10jenkins-bot: jjb: Use Rust 1.56.1 images [integration/config] - 10https://gerrit.wikimedia.org/r/736327 (owner: 10Legoktm) [10:24:01] (03CR) 10Hashar: "I have pushed this change since there is no CI for the patch-queue branch." [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/731900 (https://phabricator.wikimedia.org/T289512) (owner: 10Hashar) [11:16:49] (03PS1) 10Hashar: Normalize files indentation [tools/train-dev] - 10https://gerrit.wikimedia.org/r/736442 [11:16:51] (03PS1) 10Hashar: Trim trailing whitepsace [tools/train-dev] - 10https://gerrit.wikimedia.org/r/736443 [12:14:55] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10daniel) ##### Risky Patch! 🚂🔥 * **Change**: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/699067/ * **Summary**: ** Rewrite of how we do in-process caching... [13:37:17] 10Release-Engineering-Team (Seen), 10Scap, 10User-brennen: Investigate scap cluster_ssh idling until pressing ENTER repeatedly - https://phabricator.wikimedia.org/T223287 (10Reedy) I'm not sure I ever said it made the process any faster... But it did at least make the output continue... [14:05:27] (03CR) 10Nikki Nikkhoui: CI configuration (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:40:36] (03PS3) 10Jforrester: jjb, Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:40:39] (03CR) 10Jforrester: jjb, Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:41:54] (03CR) 10Nikki Nikkhoui: jjb, Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:43:36] (03PS4) 10Jforrester: jjb, Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:43:49] (03CR) 10Jforrester: [C: 03+2] "jjb definitions deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:46:00] (03Merged) 10jenkins-bot: jjb, Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps [integration/config] - 10https://gerrit.wikimedia.org/r/736012 (https://phabricator.wikimedia.org/T288134) (owner: 10WQuarshie) [14:48:33] !log Zuul: [mediawiki/services/example-node-api] Add rehearse, publish steps T288134 [14:48:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:48:35] T288134: Deploy prototype API - https://phabricator.wikimedia.org/T288134 [15:01:20] (03CR) 10Jforrester: "Very nice." [tools/release] - 10https://gerrit.wikimedia.org/r/736072 (owner: 10Thcipriani) [15:12:10] 10Continuous-Integration-Infrastructure: Migrate quibble images from node10 to something modern - https://phabricator.wikimedia.org/T294931 (10Jdforrester-WMF) [15:13:25] (03CR) 10Ahmon Dancy: "Do you have plans to move this into scap?" [tools/release] - 10https://gerrit.wikimedia.org/r/736072 (owner: 10Thcipriani) [15:23:57] 10Release-Engineering-Team, 10Scap: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Pchelolo) [15:32:46] 10Continuous-Integration-Config, 10Release-Engineering-Team (Next), 10MediaWiki-Core-Tests, 10Code-Health, and 6 others: Reduce runtime of MW shared gate Jenkins jobs to 5 min - https://phabricator.wikimedia.org/T225730 (10Jdlrobson) If we wanted to reduce the time of this job personally I'd suggest revisi... [15:36:37] (03CR) 10Hashar: Output one line command to reproduce a run (033 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/735659 (https://phabricator.wikimedia.org/T201503) (owner: 10Hashar) [15:47:13] (03CR) 10Thcipriani: stage-train: this should make Tuesdays one command (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/736072 (owner: 10Thcipriani) [15:49:58] (03CR) 10Ahmon Dancy: [C: 03+2] train-dev: automatically create the build directory [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735610 (owner: 10Hashar) [15:50:36] (03Merged) 10jenkins-bot: train-dev: automatically create the build directory [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735610 (owner: 10Hashar) [15:57:46] (03CR) 10Ahmon Dancy: [C: 04-1] "typo, otherwise LGTM" [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735611 (owner: 10Hashar) [15:58:56] (03CR) 10Ahmon Dancy: mirror-repos: run up to 6 git remote update in parallel (031 comment) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735612 (owner: 10Hashar) [16:02:05] (03CR) 10Ahmon Dancy: [C: 04-1] mirror-repos: run up to 6 git remote update in parallel (032 comments) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735612 (owner: 10Hashar) [16:07:18] 10Release-Engineering-Team, 10Scap: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10thcipriani) More info on this ` 15:22:44 [restbase2010.codfw.wmnet] Unhandled error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/scap/cli.py", lin... [16:10:36] thcipriani: out of curiosity, which debian version is restbase2010 using? [16:11:05] stretch-wikimedia only has scap 4.0.0-1, while buster-wikimedia has 4.0.2-1 [16:14:32] 10Release-Engineering-Team (Radar), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10thcipriani) ` [thcipriani@deploy1002 ~]$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service -oIdentitiesOnly=yes -oIdentityFile=/etc/keyholder... [16:14:45] 10Release-Engineering-Team (Radar), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Majavah) [[ https://debmonitor.wikimedia.org/packages/scap | Debmonitor ]] reveals restbase2010 (and a bunch of other servers) are still using scap 4.0.0... [16:16:37] majavah: ^ that's super interesting, thanks for that comment. [16:17:03] looks like restbase2010 is stretch [16:17:04] majavah: see https://phabricator.wikimedia.org/T294148#7460181 [16:17:23] * legoktm comments directly [16:18:44] oh boy: packaging issues with stretch hosts? [16:19:05] 10Release-Engineering-Team (Radar), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Legoktm) >>! In T294936#7478496, @thcipriani wrote: > Pinging #serviceops for help: could a serviceopsen ensure that scap is at version 4.0.2 everywhere?... [16:20:36] 10Release-Engineering-Team (Radar), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10dancy) @Legoktm I will prepare a new release. [16:21:25] (03CR) 10Hashar: Support cloning from local repositories (032 comments) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735611 (owner: 10Hashar) [16:21:33] ah, interesting, thanks legoktm [16:21:49] (03PS3) 10Hashar: Support cloning from local repositories [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735611 [16:22:00] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10thcipriani) [16:23:21] :) [16:23:27] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Legoktm) >>! In T294936#7478503, @Majavah wrote: > I don't have an explanation for the one host running scap 3.17.1 visible on debmonitor, mw2280 has be... [16:24:29] (03PS3) 10Hashar: mirror-repos: run up to 6 git remote update in parallel [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735612 [16:24:38] (03CR) 10Hashar: mirror-repos: run up to 6 git remote update in parallel (033 comments) [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735612 (owner: 10Hashar) [16:25:10] (03CR) 10Hashar: [C: 04-1] "Will rebase once the other series of changes has been merged." [tools/train-dev] - 10https://gerrit.wikimedia.org/r/736442 (owner: 10Hashar) [16:39:51] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [16:56:14] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Release 1.37.0-rc.1 - https://phabricator.wikimedia.org/T294951 (10Reedy) [16:56:23] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Release 1.37.0-rc.1 - https://phabricator.wikimedia.org/T294951 (10Reedy) [16:56:29] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Release 1.37.0-rc.0 - https://phabricator.wikimedia.org/T289591 (10Reedy) [16:56:43] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Write release announcement for 1.37.0-rc.1 - https://phabricator.wikimedia.org/T294952 (10Reedy) [17:02:02] 10MediaWiki-Releasing, 10MW-1.37-notes, 10MW-1.37-release: Release 1.37.0-rc.1 - https://phabricator.wikimedia.org/T294951 (10Reedy) [17:40:23] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:43:41] 10Release-Engineering-Team (Priority Backlog 🔥), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T293948 (10dduvall) a:05thcipriani→03dduvall [18:02:58] dancy: thx for the train-dev reviews :] [18:03:14] I will look at writing a blog post about editor config and how to set it up in a repo [18:03:59] (03CR) 10Ahmon Dancy: [C: 03+2] Support cloning from local repositories [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735611 (owner: 10Hashar) [18:04:24] and that bash JOB CONTROL stuff is really like walking on a mine field :-\ [18:04:28] (03Merged) 10jenkins-bot: Support cloning from local repositories [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735611 (owner: 10Hashar) [18:05:10] (03CR) 10Ahmon Dancy: "I have an outstanding question about the '6' in patchset 2." [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735612 (owner: 10Hashar) [18:05:15] Thanks for the mods! [18:05:27] I'm looking forward to the mirror-repos.sh parallelism [18:05:32] I've been wanting to do that for a long time [18:06:54] then if you have the build dir set on your machine already, there is little incentive to do it [18:07:34] 10Project-Admins: Create project tag for Airflow - https://phabricator.wikimedia.org/T294781 (10mforns) @Aklapper Thanks a lot for creating the project! > I assume this will supersede the "Airflow" column on https://phabricator.wikimedia.org/tag/analytics/ ? Should tickets in that column be tagged with Airflow?... [18:26:14] 10Release-Engineering-Team (Radar), 10SRE Observability: Alert RelEng when mw-client-error editing dashboard shows errors at a rate of over 1000 errors in a 12 hr period - https://phabricator.wikimedia.org/T293694 (10Jdlrobson) > The log transformation pipeline could tag logs with the property of being known,... [18:42:42] (03PS1) 10Ahmon Dancy: Further updates to release process [tools/scap] - 10https://gerrit.wikimedia.org/r/736550 [18:44:04] (03PS1) 10Ahmon Dancy: Release 4.0.3-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/736551 [18:44:32] (03CR) 10Ahmon Dancy: [C: 03+2] Release 4.0.3-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/736551 (owner: 10Ahmon Dancy) [18:48:34] (03Merged) 10jenkins-bot: Release 4.0.3-1 [tools/scap] - 10https://gerrit.wikimedia.org/r/736551 (owner: 10Ahmon Dancy) [18:54:42] 10Release-Engineering-Team, 10serviceops: Deploy Scap version 4.0.3 - https://phabricator.wikimedia.org/T294966 (10dancy) [18:55:41] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10dancy) [18:55:47] 10Release-Engineering-Team, 10serviceops: Deploy Scap version 4.0.3 - https://phabricator.wikimedia.org/T294966 (10dancy) [18:57:22] 10Release-Engineering-Team, 10serviceops: Deploy Scap version 4.0.3 - https://phabricator.wikimedia.org/T294966 (10Legoktm) a:03Legoktm [18:59:59] 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.0.3 - https://phabricator.wikimedia.org/T294966 (10dancy) [19:25:55] hi releng folks. I'm wondering if someone could take a look at a one line CI config change we've pushed up over in fr-tech to remove the checkout of a repo we no longer use during the build of civicrm. The patch is here https://gerrit.wikimedia.org/r/c/integration/config/+/732064/2 thanks in advance!!! [19:26:14] 10Phabricator: Update Herald (H260) to include upcoming CommTech sprint milestones (5573, 5574, 5575, 5576) - https://phabricator.wikimedia.org/T292112 (10ldelench_wmf) Hi @Aklapper , can you let me know when this might be triaged? I can also ask Max for help when he returns. [19:42:10] jgleeson: I can merge and deploy it [19:43:26] thanks much jeena !!!! [19:50:33] (03CR) 10Jeena Huneidi: [C: 03+2] "LGTM" [integration/config] - 10https://gerrit.wikimedia.org/r/732064 (https://phabricator.wikimedia.org/T277500) (owner: 10Jgleeson) [19:52:19] (03Merged) 10jenkins-bot: Remove zuul clone of civicrm buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/732064 (https://phabricator.wikimedia.org/T277500) (owner: 10Jgleeson) [19:52:53] huge thanks jeena :) [19:55:22] np jgleeson :) lmk if there are any issues [20:03:13] 10Phabricator: Update Herald (H260) to include upcoming CommTech sprint milestones (5573, 5574, 5575, 5576) - https://phabricator.wikimedia.org/T292112 (10Aklapper) 05Open→03Resolved a:03Aklapper Ah, thanks for the ping! Updated in https://phabricator.wikimedia.org/H260#1797 [20:26:59] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): beta logstash servers run out of disk space - https://phabricator.wikimedia.org/T288989 (10dancy) [20:34:13] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10dancy) >>! In T233134#7463528, @colewhite wrote: > As part T288618 work, we've set up a separate cluster... [20:36:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): beta logstash servers run out of disk space - https://phabricator.wikimedia.org/T288989 (10dancy) Hello. I need help with the issue of deployment-logstash04's root filesystem filling up on a daily basis. I asked a question on {T233134} as we... [20:37:12] bd808: what would be an easy eay to redirect one .wmflabs/wmcloud.org subdomain to another? Preferably without it requiring a vps instance to stay around to serve it :) [20:37:57] thinking of logstash-beta.wmflabs.org > beta-logs.wmcloud.org specifically, this is likely permanent as new name given logstash->openseearch [20:38:02] Krinkle: I built a thing for that! https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects [20:38:28] neat [20:39:09] related, anything with a *.wmcloud.org name automagically has the same *.wmflabs.org name redirected to it [20:48:42] bd808: ack, filed a subtask for you. I could try it as well, but not currently a member of that project. [20:49:14] Cool. I'll set it up today as a check to see how much the documentation has rotted :) [20:49:21] thx :) [20:49:40] I also only realized just now that the service also moved vps projects, it's now under vps 'logging' rather than depprep [20:50:20] fancy! I'm really happy that it has a owning team now [20:52:12] indeed :D setting up the current elk7 cluster alone wasn't exactly a fun project [21:11:35] has deployment-prep logging / ELK stack been overhauled entirely? [21:13:10] hashar: yup. Cole sent an email about it. Look for "[deployment-prep users] Announcing new logstash-beta cluster and upcoming changes." in your mail [21:20:32] bd808: excellent thank you [21:20:43] quite happy to see cwhite & all stepping in to set that up [21:21:11] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10bd808) [21:26:20] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10bd808) [21:26:43] Krinkle: redirects are in place and working at least from my laptop [21:27:13] logstash-beta.wmflabs.org -> logstash-beta.wmcloud.org -> beta-logs.wmcloud.org [21:28:47] Works here too [21:30:26] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10bd808) With {T294978} now done, https://logstash-beta.wmflabs.org/ redirects to https://logstash-beta.wmc... [21:30:33] 10Phabricator: Add a Herald rule for User-MediaJS - https://phabricator.wikimedia.org/T286077 (10MediaJS) 05Open→03Declined I feel like there isn’t much of a use case for me at this moment, and I am still learning the basics of MW, so I withdraw this request. Apologizes for any time wasted. [21:48:32] hi releng, can anyone identify what part of this output is causing the V-1 ? https://integration.wikimedia.org/ci/job/quibble-donationinterface-REL1_35-php73-docker/1054/console [21:48:51] The npm security tests seem new to me - would they be leading to the -1 vote? [21:50:40] hmm, no, that's just standard output from newer npm [21:50:58] comparing with previous test run [21:52:15] ok, it's doing the same split test run (no-db, then database), and in both cases all the tests pass [21:52:34] but in the latest run, the 'Recording test results' step seems to fail with [21:52:40] 10Continuous-Integration-Config, 10MediaWiki-extensions-Examples, 10Patch-For-Review: Add examples extension to the CI gate - https://phabricator.wikimedia.org/T292288 (10Inductiveload) @Hashar sounds interesting. Could you outline how one would go about that and I'll give it a go. [21:52:40] ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error? [22:01:33] hmm, still failing on recheck, and getting the same error on unrelated changes in the same repo: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DonationInterface/+/736556 [22:14:59] (03PS1) 10Dduvall: backport: Approve selected/given changes [tools/scap] - 10https://gerrit.wikimedia.org/r/736589 (https://phabricator.wikimedia.org/T294454) [22:15:32] (03CR) 10jerkins-bot: [V: 04-1] backport: Approve selected/given changes [tools/scap] - 10https://gerrit.wikimedia.org/r/736589 (https://phabricator.wikimedia.org/T294454) (owner: 10Dduvall) [22:20:20] (03PS2) 10Dduvall: backport: Approve selected/given changes [tools/scap] - 10https://gerrit.wikimedia.org/r/736589 (https://phabricator.wikimedia.org/T294454) [22:22:30] 10Phabricator, 10Release-Engineering-Team (Done by Thu 04 Nov🔥): "Project report" Age Distribution query links to individual weeks lack project tag and task status parameters - https://phabricator.wikimedia.org/T291710 (10mmodell) 05Open→03Resolved [22:22:56] 10Phabricator, 10Release-Engineering-Team (Done by Thu 04 Nov🔥): "Project report" Age Distribution query links to individual weeks lack project tag and task status parameters - https://phabricator.wikimedia.org/T291710 (10mmodell) [22:32:28] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10SRE Observability (FY2021/2022-Q2): Logstash in beta fails periodically - https://phabricator.wikimedia.org/T211984 (10colewhite) [22:32:39] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10colewhite) 05Open→03Resolved a:03colewhite This task is fairly old and the landscape has changed si... [22:47:53] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Legoktm) @Pchelolo scap is now upgraded on all the restbase hosts. [22:49:02] dancy: I upgraded all the canaries and restbase nodes, and then I think I could do the full rollout after the backport window in a few minutes? [22:51:40] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Pchelolo) @Legoktm just tried to deploy again, same result. [22:52:17] legoktm: ok [22:52:32] though it seems it didn't fix the problem on restbase hosts? [22:53:03] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Pchelolo) Actually, different result, now it's UndefinedError instead of AttributeError.. Feel free to try yourself by running scap deploy from /srv/dep... [22:53:30] Hmm.. I'll give it a try [22:53:50] I did not upgrade the deployment servers themselves, just the restbase hosts [22:54:08] gotcha [22:55:35] I *think* you should be able to see https://debmonitor.wikimedia.org/packages/scap which shows which version is installed on which hosts fyi [22:56:21] thx. I can get in there. [23:00:14] legoktm: Can you run the offending scap deploy command? I get a permission denied error from ssh (logged in as deploy-service user). [23:00:22] sure [23:01:37] Unhandled error: [23:01:38] deploy-local failed: {} [23:01:48] Cripes! [23:02:01] Any backtrace? [23:02:22] https://phabricator.wikimedia.org/P17672 [23:02:25] thats' the full output [23:02:32] thx [23:04:49] nothing interesting in syslog [23:05:11] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10dancy) {P17672} [23:05:40] if I add more -vvv to the command would that help? [23:05:44] I can also run it directly on the host [23:07:36] is the repo supposed to be dirty? https://phabricator.wikimedia.org/P17673 [23:08:34] all the other hosts are like that, so I guess it is [23:13:08] dancy: do you think this is something that could be figured out quickly? or should I roll it back? [23:13:37] I don't think I'll be able to figure it out today so rolling back is reasonable. [23:13:49] ok [23:17:15] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10dancy) I think the UndefinedError is a jinja2 template error. [23:23:35] legoktm: Can you run `scap deploy-log` in that same directory and attach the output to the ticket? [23:23:47] 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.0.3 - https://phabricator.wikimedia.org/T294966 (10Legoktm) I rolled back the canaries to 4.0.2 for now. [23:23:49] yes [23:24:13] ah, there's the traceback [23:24:21] yesssss [23:24:57] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: RESTBase deployment fails with scap internal error - https://phabricator.wikimedia.org/T294936 (10Legoktm) Indeed: {P17675} [23:25:54] also I do think we should add you to the deploy-service group [23:26:08] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/admin/data/data.yaml#553 already has some releng members [23:30:47] Alright. I'm done for the day. I'll see what I can figure out tomorrow. [23:32:10] o/