[06:16:05] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review: docker-gc: A tool for partially pruning docker resources - https://phabricator.wikimedia.org/T294034 (10Joe) I have a general doubt about packaging for this software: the patch above creates docker images to run the software, but it needs to... [09:07:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10hashar) In my experience it is better done during low CI traffic, start of morning in Dallas will work just fine. We would then send a... [09:34:52] 10Release-Engineering-Team (Radar), 10SRE, 10serviceops, 10Developer Productivity, and 2 others: Debug hosts sometimes Fatal error: "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10hashar) Spotted this on labweb1002 / lab1001 today, all messages referred to... [09:37:32] 10Release-Engineering-Team (Radar), 10SRE, 10serviceops, 10Developer Productivity, and 2 others: Debug hosts sometimes Fatal error: "The UdpSocket to 127.0.0.1:10514 has been closed" - https://phabricator.wikimedia.org/T214734 (10hashar) I have forgot, a reqid example: https://logstash.wikimedia.org/app/d... [10:12:13] 10Release-Engineering-Team (Doing), 10Security-Team, 10ContentSecurityPolicy, 10GitLab (Administration, Settings & Policy), and 3 others: Define a Content Security Policy for GitLab - https://phabricator.wikimedia.org/T285363 (10hashar) We have a few more reports coming in. Some examples: ---- blocked-ur... [10:32:14] 10Gerrit, 10Wikibugs: Submitted changes to wikibugs cause Gerrit to inserts metadata to commit message - https://phabricator.wikimedia.org/T294423 (10hashar) [10:35:43] 10Gerrit, 10Wikibugs: Submitted changes to wikibugs cause Gerrit to inserts metadata to commit message - https://phabricator.wikimedia.org/T294423 (10hashar) 05Open→03Resolved a:03hashar The reason is the submit type configuration for the repository has been set to {nav Rebase Always} ( https://gerrit.wi... [11:04:26] I’ve had two gate-and-submit mwgate-node12-docker builds fail with weird errors during npm install (first a ton of ENOENT in _cacache, then a bunch of corrupted tarballs at the end) [11:04:36] has anyone else seen this? latest one is here: https://integration.wikimedia.org/ci/job/mwgate-node12-docker/51505/consoleFull [11:19:21] 10Continuous-Integration-Infrastructure, 10Jenkins: Jenkins search results missing /ci/ URL component - https://phabricator.wikimedia.org/T294424 (10Lucas_Werkmeister_WMDE) [11:19:30] 10Continuous-Integration-Infrastructure, 10Jenkins: Jenkins search results missing /ci/ URL component - https://phabricator.wikimedia.org/T294424 (10Lucas_Werkmeister_WMDE) p:05Triage→03Lowest [11:20:05] looking at the https://integration.wikimedia.org/ci/job/mwgate-node12-docker/ backlog it seems like several other extensions have had those npm errors, I’ll file a task [11:24:39] filed https://phabricator.wikimedia.org/T294426, apparently I didn’t add the right tags yet if wikibugs didn’t mention it [11:34:55] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Majavah) [11:36:42] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) Seems to happen on several integration-agent... [11:40:30] 10Continuous-Integration-Infrastructure, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) I can’t reproduce the error locally with fre... [11:47:28] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) >>! In T294426#7461156... [11:53:39] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) Nope, Wikibase still f... [12:04:30] Lucas_WMDE: that got reported previously indeed [12:05:12] looks like updating the Wikibase package locks to v2 fixes the issue for Wikibase at least [12:05:24] (I’m waiting for the build to complete before commenting on Phab) [12:06:28] ah yeah https://phabricator.wikimedia.org/T278982 [12:06:34] which I declined blaming cosmic rays [12:06:35] no wait it still failed [12:06:51] hmm [12:07:28] hmm no [12:07:31] not that one bah [12:08:24] maybe it got reported in this channel rather than a task [12:09:02] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) >>! In T294426#7461187... [12:09:48] AHHH https://phabricator.wikimedia.org/T293937#7450744 [12:10:26] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10ci-test-error: CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 (10hashar) The same has been reported at T294426 and we will follow up there. [12:10:34] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) Ah, found the error in... [12:19:21] Lucas_WMDE: the build is a bit fragile :-\ [12:19:26] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10hashar) The same got reported a few days ago T... [12:19:51] maybe npm is no more able to catch up that specific issue and no more fallback to retrieve from npmjs.org [12:20:10] I thought the npm cache was supposed to be very stable these days [12:20:25] and the mwgate-node12-docker build history also looked like it was failing very consistently [12:20:48] yeah cause all build share the same cache [12:20:58] so if somehow a corrupted cache got saved, it is restored for every following builds [12:21:09] hm [12:22:39] !log integration-castor03: sudo rm -fR /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/mwgate-node12-docker # T294426 T293937 [12:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:22:43] T293937: CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 [12:22:43] T294426: mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 [12:22:54] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10castor, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10hashar) [12:23:01] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10castor, 10ci-test-error: CI mwext-node12-rundoc-docker job failing on repos using Storybook - https://phabricator.wikimedia.org/T293937 (10hashar) [12:24:23] maybe the cache is written differently between v1 and v2 package-lock.json formats [12:27:29] maybe I can sneak in a `npm cache verify` [12:29:11] well, the build succeeded now [12:29:33] i nuked the cache which could have helped [12:29:42] or some build saved a fixedup version of the cache [12:29:59] but in reality, I should really rethink that system entirely. It has served it is purpose [12:29:59] I think updating the lockfile fixed it already [12:30:52] (03PS1) 10Kosta Harlan: Zuul: [GrowthExperiments] Add CirrusSearch to phan dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/734954 (https://phabricator.wikimedia.org/T292141) [12:30:52] and I have seen your other task about Jenkins search results having the wrong url. IT is definitely an upstream issue I will check whether they have released a fix already [12:30:58] or if there is an issue filed [12:31:28] kostajh: can you deploy that CI config change or shall I? [12:32:59] (03CR) 10Hashar: [C: 03+2] "Note that dependencies are not recursively process and the CirrusSearch phan dependencies ('Elastica' and 'SiteMatrix') would be missing." [integration/config] - 10https://gerrit.wikimedia.org/r/734954 (https://phabricator.wikimedia.org/T292141) (owner: 10Kosta Harlan) [12:33:12] kostajh: doing it ;) [12:33:52] (03CR) 10Kosta Harlan: Zuul: [GrowthExperiments] Add CirrusSearch to phan dependencies (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/734954 (https://phabricator.wikimedia.org/T292141) (owner: 10Kosta Harlan) [12:34:56] (03Merged) 10jenkins-bot: Zuul: [GrowthExperiments] Add CirrusSearch to phan dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/734954 (https://phabricator.wikimedia.org/T292141) (owner: 10Kosta Harlan) [12:39:14] !log reloaded Zuul for [GrowthExperiments] Add CirrusSearch to phan dependencies - https://gerrit.wikimedia.org/r/734954 [12:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:39:58] hashar: cheers [13:28:59] (03CR) 10Kosta Harlan: [C: 03+2] Release Quibble 1.2.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/734211 (https://phabricator.wikimedia.org/T259456) (owner: 10Hashar) [13:29:07] (03CR) 10Kosta Harlan: [C: 03+2] changelog: begin new 1.2.1 version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/734212 (owner: 10Hashar) [13:33:50] !log removed role::beta::puppetmaster in https://gerrit.wikimedia.org/r/c/operations/puppet/+/734962/ [13:33:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:36:27] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10castor, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) >>! In T29... [13:46:40] (03Merged) 10jenkins-bot: Release Quibble 1.2.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/734211 (https://phabricator.wikimedia.org/T259456) (owner: 10Hashar) [13:48:44] hashar: looks clearing the cache fixed the issue as well, the mwgate-node12-docker backlog is all grean again already [13:51:22] (03Merged) 10jenkins-bot: changelog: begin new 1.2.1 version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/734212 (owner: 10Hashar) [13:57:43] Lucas_WMDE: great, thank you for confirming. That does not explain though how the issue appeared in the first time but I am willing to ignore that [13:57:58] finding the actual root cause is probably going to be a fairly long time sink unfortunately [13:58:04] yeah [13:58:50] I was wondering if rsyncing the cache between agents makes the problem worse or not… if only one agent had a broken cache, I think there’s a real risk we’d just retry builds and get lucky, and take much longer to identify the issue [13:59:35] but at least it seems like lockfileVersion 2 made the package more robust against the broken cache (one Wikibase build already successfully `npm ci`’d before you cleared the cache), so I’m happy with that [13:59:45] maybe libup should do that automatically or something [14:00:18] (s/do that/update the lockfile/ that wasn’t very clearly phrased) [14:02:35] 10Continuous-Integration-Infrastructure: Upgrade integration/npm to 7.x - https://phabricator.wikimedia.org/T273811 (10Lucas_Werkmeister_WMDE) CI seems to run npm v7 now: ` + node --version v12.22.5 + npm --version 7.21.0 ` [14:02:42] we have moved from npm 6 to 7 fairly recently [14:03:11] and yeah I would expect eventually everything get moved to lockfileVersion 2 though I am unaware of such ongoing effort. legoktm James_F would surely know [14:03:18] ok [14:12:35] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10castor, 10ci-test-error (WMF-deployed Build Failure): mwgate-node12-docker gate-and-submit builds failing (ENOENT _cacache errors resulting in corrupted tarballs) - https://phabricator.wikimedia.org/T294426 (10Lucas_Werkmeister_WMDE) 05Open→... [14:31:48] hashar, Lucas_WMDE: oh, I didn't realize CI was already on npm v7. I guess I'll upgrade libup to start using it [14:35:07] legoktm: we had 7.5.2 shipped a few weeks ago, further bumped to 7.21.0 [14:35:22] and some effort has been made to use npm 7 accross the fleet of node 10, 12 and 14 [14:35:33] I wonder if they fixed the github: bug [14:35:49] or maybe node10 still uses 6.14.5 [14:36:32] cool then [14:36:53] npm v7 broke the format of `npm audit` so it'll be a bit of work, but hopefully I'll have time this weekend [14:37:00] maybe libup can do a pass to bump everything to lockfileversion 2, then I have no idea what it entitles [14:38:47] I'm sure some npm "security" advisory will come around in a few days forcing that anyways [14:41:36] 10Release-Engineering-Team (Done by Thu 04 Nov), 10Patch-For-Review: docker-gc: A tool for partially pruning docker resources - https://phabricator.wikimedia.org/T294034 (10dancy) >>! In T294034#7460507, @Joe wrote: > I have a general doubt about packaging for this software: the patch above creates docker imag... [15:07:12] legoktm: on a different topic, thx for the wikibugs change for #wikimedia-quibble though apparently the bot hasn't loaded the new config. I gave a few details at https://gerrit.wikimedia.org/r/c/labs/tools/wikibugs2/+/734561/ [15:07:27] the change commit does not show up as https://sal.toolforge.org/tools.wikibugs [15:08:28] 10Release-Engineering-Team (Doing), 10Quibble: Establish communication channel for Quibble development (plot twist: Slack channel) - https://phabricator.wikimedia.org/T286770 (10hashar) [15:23:07] 10Release-Engineering-Team (Doing), 10Quibble: Establish communication channel for Quibble development (plot twist: Slack channel) - https://phabricator.wikimedia.org/T286770 (10hashar) I have enabled [[ https://meta.wikimedia.org/wiki/Wm-bot | wm-bot ]] to have some channel logs. They are available at https:/... [15:29:46] 10Continuous-Integration-Infrastructure: Upgrade integration/npm to 7.x - https://phabricator.wikimedia.org/T273811 (10Jdforrester-WMF) 05Open→03Resolved a:03Krinkle Yup, done in {a5a0d8497db5e9063141df19b10534c2a8b2f5ff}. [15:29:48] 10Continuous-Integration-Infrastructure: Upgrade CI containers/jobs to provide npm 7 - https://phabricator.wikimedia.org/T273812 (10Jdforrester-WMF) [15:31:05] 10Continuous-Integration-Infrastructure: Upgrade CI containers/jobs to provide npm 7 - https://phabricator.wikimedia.org/T273812 (10Jdforrester-WMF) 05Open→03Resolved a:03Krinkle Done in {849a44a686f6c8419f260844b64aff16d00f8f64}. [15:31:07] 10Continuous-Integration-Infrastructure: Deal with release of npm 7 - https://phabricator.wikimedia.org/T273785 (10Jdforrester-WMF) [15:31:44] 10Continuous-Integration-Infrastructure: Deal with release of npm 7 - https://phabricator.wikimedia.org/T273785 (10Jdforrester-WMF) [15:33:44] (03PS3) 10Hashar: Use upstream gear instead of our fork [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/731900 (https://phabricator.wikimedia.org/T289512) [15:34:59] (03CR) 10Ahmon Dancy: [C: 03+2] Use upstream gear instead of our fork [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/731900 (https://phabricator.wikimedia.org/T289512) (owner: 10Hashar) [15:43:00] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) [15:43:34] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn mw2255 is done [15:55:01] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Zuul, and 2 others: Upgrade zuul gearman when upstream releases it - https://phabricator.wikimedia.org/T289512 (10hashar) 05Open→03Resolved a:03dancy From integration/config: ` tox -r -e zu... [15:55:16] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10dancy) Here's a recent entry from `deployment-logstash04.deployment-prep.eqiad1.wikimedia.cloud`. There... [15:56:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Zuul, 10Patch-For-Review, and 2 others: Improve scheduling of CI jobs invoked by zuul - https://phabricator.wikimedia.org/T258630 (10hashar) The patch has made it to upstream `gear` release `0.16.0` [16:05:12] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): Migrate deployment-prep away from Debian Stretch to Buster/Bullseye - https://phabricator.wikimedia.org/T278641 (10dancy) [16:32:06] 10Continuous-Integration-Config, 10Release-Engineering-Team (Radar), 10MediaWiki-Vendor, 10Parsoid (Tracking), and 2 others: `mediawiki-core-php72-phan-docker` job runs `composer install` instead of using packages from mediawiki/vendor - https://phabricator.wikimedia.org/T287419 (10cscott) Here's another... [16:44:00] (03PS1) 10Ahmon Dancy: Start of release process improvements [tools/scap] - 10https://gerrit.wikimedia.org/r/735030 [16:51:54] !log Tag Quibble 1.2.0 @ bdabd84 # T259456 T292772 T256402 [16:51:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:51:59] T256402: Remove JUnit artefacts from Quibble jobs - https://phabricator.wikimedia.org/T256402 [16:51:59] T292772: ERROR: setuptools==41.0.0 is used in combination with setuptools_scm>=6.x - https://phabricator.wikimedia.org/T292772 [16:52:00] T259456: Quibble should configure php 7.4+ built in web server to use multiple workers - https://phabricator.wikimedia.org/T259456 [17:12:23] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: Validate that the specified changes are suitable - https://phabricator.wikimedia.org/T294453 (10jeena) [17:12:29] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url command - https://phabricator.wikimedia.org/T287042 (10dduvall) p:05Triage→03Medium a:03dduvall [17:15:52] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: approve changes - https://phabricator.wikimedia.org/T294454 (10jeena) [17:20:25] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: update values.yaml - https://phabricator.wikimedia.org/T294455 (10jeena) [17:21:10] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: update values.yaml - https://phabricator.wikimedia.org/T294455 (10jeena) [17:24:52] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: copy files for legacy deployment - https://phabricator.wikimedia.org/T294457 (10jeena) [17:27:10] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: approve changes - https://phabricator.wikimedia.org/T294454 (10dduvall) a:05dduvall→03None [17:27:35] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: update values.yaml - https://phabricator.wikimedia.org/T294455 (10dduvall) a:05dduvall→03None [17:29:20] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap (backport) should support configuration of Gerrit URL - https://phabricator.wikimedia.org/T294459 (10dduvall) [17:29:44] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url command - https://phabricator.wikimedia.org/T287042 (10dduvall) a:05dduvall→03None [17:30:24] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap (backport) should support configuration of Gerrit URL - https://phabricator.wikimedia.org/T294459 (10dduvall) p:05Triage→03Medium [17:43:59] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: deploy to k8s - https://phabricator.wikimedia.org/T294462 (10jeena) [18:03:17] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url: legacy deployment - https://phabricator.wikimedia.org/T294466 (10jeena) [18:12:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) >>! In T294271#7460829, @hashar wrote: >start of morning in Dallas will work just fine. Cool, thanks! So, @Papaul maybe you wa... [18:13:51] 10Phabricator, 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10User-brennen: Dockerize our Phabricator development environment - https://phabricator.wikimedia.org/T245575 (10Sj) Do we have anyone working with the Phorge.it crew? What other similar-scale Phab instances are there that are planning to move to P... [18:28:46] lemme see... [18:35:30] (03PS1) 10Ahmon Dancy: Revive tox.ini [tools/scap] - 10https://gerrit.wikimedia.org/r/735045 [18:39:19] (03PS1) 10Ahmon Dancy: Remove obsolete scap-in-train-dev script [tools/scap] - 10https://gerrit.wikimedia.org/r/735046 [18:40:51] (03PS2) 10Ahmon Dancy: Revive tox.ini [tools/scap] - 10https://gerrit.wikimedia.org/r/735045 [18:46:15] (03CR) 10Ahmon Dancy: [C: 03+2] Start of release process improvements [tools/scap] - 10https://gerrit.wikimedia.org/r/735030 (owner: 10Ahmon Dancy) [18:46:56] (03Merged) 10jenkins-bot: Start of release process improvements [tools/scap] - 10https://gerrit.wikimedia.org/r/735030 (owner: 10Ahmon Dancy) [19:22:28] (03CR) 10Ahmon Dancy: [C: 04-1] Revive tox.ini [tools/scap] - 10https://gerrit.wikimedia.org/r/735045 (owner: 10Ahmon Dancy) [19:23:11] 10Release-Engineering-Team (Priority Backlog 🔥), 10GitLab (Auth & Access), 10User-brennen: Gitlab 2fa password validation seems bugged - https://phabricator.wikimedia.org/T292431 (10brennen) [19:23:13] 10Release-Engineering-Team (Done by Thu 04 Nov), 10GitLab (Auth & Access), 10User-brennen: Reproduce GitLab 2fa failures - https://phabricator.wikimedia.org/T293528 (10brennen) 05Open→03Resolved a:03brennen Turns out this is easy enough to reproduce: You just can't configure 2fa at the moment, since n... [19:26:06] 10Release-Engineering-Team (Priority Backlog 🔥), 10GitLab (Auth & Access), 10Upstream, 10User-brennen: Gitlab 2fa password validation seems bugged - https://phabricator.wikimedia.org/T292431 (10brennen) See upstream: [[https://gitlab.com/gitlab-org/gitlab/-/issues/342152|Broken 2FA registration for omniaut... [19:32:32] (03CR) 10Dduvall: [C: 03+2] Remove obsolete scap-in-train-dev script [tools/scap] - 10https://gerrit.wikimedia.org/r/735046 (owner: 10Ahmon Dancy) [19:33:13] (03Merged) 10jenkins-bot: Remove obsolete scap-in-train-dev script [tools/scap] - 10https://gerrit.wikimedia.org/r/735046 (owner: 10Ahmon Dancy) [20:07:44] (03Abandoned) 10Ahmon Dancy: Revive tox.ini [tools/scap] - 10https://gerrit.wikimedia.org/r/735045 (owner: 10Ahmon Dancy) [20:09:38] (03PS1) 10Ahmon Dancy: Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 [20:11:00] (03PS2) 10Ahmon Dancy: Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 [20:17:43] (03PS3) 10Ahmon Dancy: Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 [20:17:45] (03PS1) 10Ahmon Dancy: Simplify blubber.yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/735058 [20:36:26] 10Quibble, 10Release-Engineering-Team (Doing): Establish communication channel for Quibble development (plot twist: Slack channel) - https://phabricator.wikimedia.org/T286770 (10hashar) 05In progress→03Resolved Wikibugs got reloaded I think marking this resolved will make it join the channel and indeed com... [20:55:21] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [20:58:00] 10Release-Engineering-Team (Doing), 10Security-Team, 10ContentSecurityPolicy, 10GitLab (Administration, Settings & Policy), and 3 others: Define a Content Security Policy for GitLab - https://phabricator.wikimedia.org/T285363 (10sbassett) Outside of the bug where csp report-only headers seem to actively bl... [21:03:47] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) Thanks @Papaul ! it's back in service now I am not sure what is next exac... [21:10:54] 10Deployments, 10Release-Engineering-Team (Doing): L10n cache files building up on backup deploy hosts - https://phabricator.wikimedia.org/T275826 (10Dzahn) fwiw, this isn't just "on deploy hosts", this is also on individual appservers. for example just did a `scap pull` on mw2255 after it had hardware mainten... [21:24:18] 10Project-Admins: Create project tag for Data-Engineering - https://phabricator.wikimedia.org/T287531 (10Milimetric) @Aklapper: yes, this migration is going to take a while. We have a ton of tasks and new ways of thinking about our work. So we'll have to intersect those and see what happens. Too many unknowns... [21:39:53] (03CR) 10Jforrester: jjb: Update castor users from 0.2.1 (2018) to 0.2.4 (2019) (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/734412 (https://phabricator.wikimedia.org/T188375) (owner: 10Jforrester) [21:43:27] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Papaul) @Dzahn thank you. I think it is best to just close this task and go "on d... [21:47:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Papaul) @Dzahn Next week Monday 1st at 9:30 am CT [21:59:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10hashar) It is an holiday here in France (All-saints) , then I am not critical to the DRAC upgrade ;) I will make arrangement, it will b... [22:00:40] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) @hashar I am wondering if you need me around (for mgmt access / root / +2 ). I have a request to be off that day but it's not sur... [22:11:49] (03PS1) 10Dduvall: Refactor scap.plugins.gerrit for use with backport command [tools/scap] - 10https://gerrit.wikimedia.org/r/735064 (https://phabricator.wikimedia.org/T294459) [22:12:22] (03CR) 10jerkins-bot: [V: 04-1] Refactor scap.plugins.gerrit for use with backport command [tools/scap] - 10https://gerrit.wikimedia.org/r/735064 (https://phabricator.wikimedia.org/T294459) (owner: 10Dduvall) [22:13:01] (03PS2) 10Dduvall: Refactor scap.plugins.gerrit for use with backport command [tools/scap] - 10https://gerrit.wikimedia.org/r/735064 (https://phabricator.wikimedia.org/T294459) [22:15:56] (03CR) 10Dduvall: [C: 04-1] "Small nit. I'll run the tests and report back." [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:17:40] (03PS4) 10Ahmon Dancy: Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 [22:17:42] (03PS2) 10Ahmon Dancy: Simplify blubber.yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/735058 [22:18:01] (03CR) 10jerkins-bot: [V: 04-1] Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:18:09] (03CR) 10Ahmon Dancy: Add shellcheck to scripts/check (032 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:19:23] (03CR) 10Dduvall: [C: 03+1] "Tests pass for me locally. I think CI just had some network issue or something." [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:19:37] (03CR) 10Dduvall: [C: 03+1] "recheck" [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:21:22] (03CR) 10Dduvall: [C: 03+2] Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:22:02] (03Merged) 10jenkins-bot: Add shellcheck to scripts/check [tools/scap] - 10https://gerrit.wikimedia.org/r/735056 (owner: 10Ahmon Dancy) [22:22:20] (03CR) 10Dduvall: [C: 03+2] Simplify blubber.yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/735058 (owner: 10Ahmon Dancy) [22:22:58] (03Merged) 10jenkins-bot: Simplify blubber.yaml [tools/scap] - 10https://gerrit.wikimedia.org/r/735058 (owner: 10Ahmon Dancy) [22:26:35] (03PS3) 10Dduvall: Refactor scap.plugins.gerrit for use with backport command [tools/scap] - 10https://gerrit.wikimedia.org/r/735064 (https://phabricator.wikimedia.org/T294459) [22:27:26] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) 05Open→03Resolved a:03Dzahn I agree and boldly resolve it, expecting... [22:29:29] (03PS1) 10Ahmon Dancy: make-container-image/webserver: Allow alternate GIT_BASE and BRANCH [tools/release] - 10https://gerrit.wikimedia.org/r/735068 [22:29:59] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10User-brennen: Scap backport change_url command - https://phabricator.wikimedia.org/T287042 (10dduvall) [22:30:00] 10Release-Engineering-Team (Done by Thu 04 Nov), 10MW-on-K8s, 10Release Pipeline, 10Patch-For-Review, 10User-brennen: Scap (backport) should support configuration of Gerrit URL - https://phabricator.wikimedia.org/T294459 (10dduvall) 05Open→03In progress [22:31:21] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) After re-thinking this and chatting some more on IRC I now think we should not do this and close my own request as invalid. It's... [22:33:42] 10Release-Engineering-Team, 10serviceops: contint hardware refresh? - https://phabricator.wikimedia.org/T294276 (10Dzahn) [22:36:05] 10Release-Engineering-Team, 10serviceops: contint hardware refresh? - https://phabricator.wikimedia.org/T294276 (10Dzahn) If we order new hardware here this can also be combined with switching the main server back to eqiad (T256422) (or not). Not directly related though except it might be useful for bringing u... [22:36:09] (03CR) 10Ahmon Dancy: [C: 03+2] make-container-image/webserver: Allow alternate GIT_BASE and BRANCH [tools/release] - 10https://gerrit.wikimedia.org/r/735068 (owner: 10Ahmon Dancy) [22:38:46] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10netops, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL )) - https://phabricator.wikimedia.org/T283582 (10Dzahn) [22:39:11] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) 05Open→03Declined Suggesting to do this once T256422 is resolved or T294276 or CI does not run on contint* servers anymore, w... [22:39:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) Be bold and reopen if you really think otherwise. [22:44:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE, 10serviceops: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) P.S. The actual "contint2001.mgmt" alert in Icinga is actually quite some time ago.. not worth it. but there are other alerts (IP... [22:48:11] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10SRE Observability (FY2021/2022-Q2): Logstash in beta fails periodically - https://phabricator.wikimedia.org/T211984 (10colewhite) a:03colewhite [22:52:38] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org does not receive any mediawiki events - https://phabricator.wikimedia.org/T233134 (10colewhite) As part T288618 work, we've set up a separate cluster that ingests deployment-prep's logs here... [22:58:28] 10Beta-Cluster-Infrastructure, 10SRE Observability, 10Wikimedia-Logstash, 10observability: [_field_stats] endpoint is deprecated! Use [_field_caps] instead or run a min/max aggregations on the desired fields. - https://phabricator.wikimedia.org/T241485 (10colewhite) I've not seen this error before, nor am... [23:01:27] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar), 10SRE, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflab throws multiple "Error: Could not locate that visualization" - https://phabricator.wikimedia.org/T204845 (10colewhite) 05Open→03Invalid There has been no DBQuery dashb... [23:14:50] (03PS1) 10Ahmon Dancy: Remove abandoned save/restore subcommands [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735071 [23:16:46] (03CR) 10Ahmon Dancy: [C: 03+2] Remove abandoned save/restore subcommands [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735071 (owner: 10Ahmon Dancy) [23:20:52] (03PS1) 10Ahmon Dancy: Strip trailing slash from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 [23:20:58] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability: logstash-beta.wmflabs.org default dashboard missing - https://phabricator.wikimedia.org/T184602 (10colewhite) 05Open→03Invalid It seems logstash-beta dashboards were cleared some time ago. On the other hand, this is resolved on https:... [23:21:02] (03PS2) 10Ahmon Dancy: Strip trailing slashes from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 [23:21:40] (03CR) 10jerkins-bot: [V: 04-1] Strip trailing slashes from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 (owner: 10Ahmon Dancy) [23:23:34] (03PS3) 10Ahmon Dancy: Strip trailing slashes from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 [23:28:58] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10WorkType-NewFunctionality: Create a logstash input filter to preprocess mysqld syslog messages - https://phabricator.wikimedia.org/T140751 (10colewhite) I can no longer find these messages in production nor in beta logs. It seems `pro... [23:31:04] (03PS1) 10Ahmon Dancy: make-container-image/webserver/Makefile: Write last-build file [tools/release] - 10https://gerrit.wikimedia.org/r/735074 [23:32:59] (03CR) 10Ahmon Dancy: [C: 03+2] make-container-image/webserver/Makefile: Write last-build file [tools/release] - 10https://gerrit.wikimedia.org/r/735074 (owner: 10Ahmon Dancy) [23:34:25] (03Merged) 10jenkins-bot: make-container-image/webserver/Makefile: Write last-build file [tools/release] - 10https://gerrit.wikimedia.org/r/735074 (owner: 10Ahmon Dancy) [23:58:49] (03CR) 10Ahmon Dancy: [C: 03+2] Strip trailing slashes from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 (owner: 10Ahmon Dancy) [23:59:13] (03Merged) 10jenkins-bot: Strip trailing slashes from build dir [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735072 (owner: 10Ahmon Dancy) [23:59:52] (03PS1) 10Ahmon Dancy: train-dev: Add build-image subcommand [tools/train-dev] - 10https://gerrit.wikimedia.org/r/735076