[00:26:59] (03CR) 10C. Scott Ananian: Load Parsoid from vendor as fallback, and configure (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/703182 (https://phabricator.wikimedia.org/T218534) (owner: 10Kosta Harlan) [00:32:30] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10User-DannyS712, 10ci-test-error (WMF-deployed Build Failure): Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10cscott) I marked the line in https://gerrit.wikimedia... [00:43:28] (03CR) 10C. Scott Ananian: "I think this patch should probably be reverted entirely. See discussion in https://phabricator.wikimedia.org/T287001#7226184" [integration/quibble] - 10https://gerrit.wikimedia.org/r/703182 (https://phabricator.wikimedia.org/T218534) (owner: 10Kosta Harlan) [01:07:03] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705781 [01:07:05] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705781 (owner: 10TrainBranchBot) [01:08:00] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705781 (owner: 10TrainBranchBot) [01:21:12] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T281158 (10Jdlrobson) [01:21:35] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T281158 (10Jdlrobson) [06:24:17] 10Release-Engineering-Team (Next), 10MW-on-K8s, 10serviceops: Perform l10n cache rebuild using initContainers instead of including it in the image - https://phabricator.wikimedia.org/T286952 (10Joe) I have one doubt about the idea of using persistent local volumes... that would mean tying pods to specific no... [07:40:01] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T281156 (10hashar) I have messed up yesterday and either forgot about the train or though we were Monday. As a result I am running it this morning instead. [09:00:56] 10Release-Engineering-Team (Next), 10MW-on-K8s, 10serviceops: Perform l10n cache rebuild using initContainers instead of including it in the image - https://phabricator.wikimedia.org/T286952 (10JMeybohm) I think the basic idea would be, similar to hostPath, to basically have one PV per node. So this would no... [09:16:16] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T281157 (10daniel) [09:16:18] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.37.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T281156 (10daniel) [09:59:39] 10Phabricator: Allow others than admins to edit forms - https://phabricator.wikimedia.org/T181031 (10Aklapper) p:05Triageβ†’03Low [10:11:03] (03CR) 10Jforrester: "> Patch Set 2:" [integration/config] - 10https://gerrit.wikimedia.org/r/701523 (https://phabricator.wikimedia.org/T220763) (owner: 10Hashar) [11:03:33] James_F: I tried to cascade those images for LC_ALL and it went a bit crazy ;D [11:13:10] (03CR) 10Hashar: [C: 03+2] dockerfiles: drop en_US.UTF-8 [integration/config] - 10https://gerrit.wikimedia.org/r/701523 (https://phabricator.wikimedia.org/T220763) (owner: 10Hashar) [11:14:19] (03Merged) 10jenkins-bot: dockerfiles: drop en_US.UTF-8 [integration/config] - 10https://gerrit.wikimedia.org/r/701523 (https://phabricator.wikimedia.org/T220763) (owner: 10Hashar) [11:20:38] Lucas_WMDE: I forgot about that doc.wm.o patch https://gerrit.wikimedia.org/r/c/integration/docroot/+/702098 :D [11:20:49] going to depoy it [11:21:07] (03CR) 10Hashar: [C: 03+2] "Sorry I missed the +1 notification from July 2nd :(" [integration/docroot] - 10https://gerrit.wikimedia.org/r/702098 (owner: 10Lucas Werkmeister (WMDE)) [11:21:30] \o/ [11:21:42] thanks! that’ll make Dan very happy :) [11:22:37] what I would love foro that page is to have some nice icon for each project [11:22:42] (03Merged) 10jenkins-bot: Support linking to individual doc.wikimedia.org tiles [integration/docroot] - 10https://gerrit.wikimedia.org/r/702098 (owner: 10Lucas Werkmeister (WMDE)) [11:27:26] (03CR) 10Hashar: "I have deployed the change using scap. It can take up to an hour for the change to be reflected due to our frontend caching." [integration/docroot] - 10https://gerrit.wikimedia.org/r/702098 (owner: 10Lucas Werkmeister (WMDE)) [11:39:16] (03CR) 10Hashar: "Successfully published image docker-registry.discovery.wmnet/releng/ci-bullseye:0.2.0" [integration/config] - 10https://gerrit.wikimedia.org/r/701523 (https://phabricator.wikimedia.org/T220763) (owner: 10Hashar) [11:40:14] 10Release-Engineering-Team (Backlog), 10Machine-Learning-Team, 10Release Pipeline, 10Wikidata, and 2 others: Stretch in docker registry forces ascii encoding - https://phabricator.wikimedia.org/T210260 (10hashar) [11:40:17] one less 2+years old task ;D [11:40:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Technical-Debt: Rebuild CI Docker images to drop ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8' - https://phabricator.wikimedia.org/T220763 (10hashar) 05Openβ†’03Resolved Rest of images wil... [13:58:31] (03PS1) 10C. Scott Ananian: Revert "Load Parsoid from vendor as fallback, and configure" [integration/quibble] - 10https://gerrit.wikimedia.org/r/705907 [14:01:31] (03PS2) 10C. Scott Ananian: Revert "Load Parsoid from vendor as fallback, and configure" [integration/quibble] - 10https://gerrit.wikimedia.org/r/705907 (https://phabricator.wikimedia.org/T218534) [14:05:20] (03CR) 10Zabe: "See https://gerrit.wikimedia.org/r/c/integration/quibble/+/705746" [integration/quibble] - 10https://gerrit.wikimedia.org/r/705907 (https://phabricator.wikimedia.org/T218534) (owner: 10C. Scott Ananian) [14:16:48] (03PS3) 10Hashar: Update its-phabricator: Urlencode POST to conduit [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/705650 (https://phabricator.wikimedia.org/T280197) [14:17:07] (03CR) 10Hashar: [V: 03+2 C: 03+2] Update its-phabricator: Urlencode POST to conduit [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/705650 (https://phabricator.wikimedia.org/T280197) (owner: 10Hashar) [14:17:46] 10Release-Engineering-Team (Next), 10MW-on-K8s, 10serviceops: Perform l10n cache rebuild using initContainers instead of including it in the image - https://phabricator.wikimedia.org/T286952 (10Joe) The problem is that we'd be forced to mount hostPath as 'read-write' in all pods and allow the first one that... [14:21:34] 10Release-Engineering-Team (Doing), 10GerritBot, 10Developer Productivity, 10Patch-For-Review, 10Regression: Gerritbot turns "+" into space, thus breaking most Gerrit URLs - https://phabricator.wikimedia.org/T280197 (10hashar) Gerrit reload the plugins automatically ` [2021-07-21T14:20:06.975+0000] [Plug... [14:33:39] 10Release-Engineering-Team (Next), 10MW-on-K8s, 10serviceops: Perform l10n cache rebuild using initContainers instead of including it in the image - https://phabricator.wikimedia.org/T286952 (10JMeybohm) I would agree that adding PV(C) stuff potentially makes thinks way more complicated then they would be us... [14:36:19] 10Release-Engineering-Team (Doing), 10GerritBot, 10Developer Productivity, 10Patch-For-Review, 10Regression: Gerritbot turns "+" into space, thus breaking most Gerrit URLs - https://phabricator.wikimedia.org/T280197 (10hashar) @Dzahn assisted with the deployment of the new Soy template. Changes are tak... [14:44:11] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705904 [14:44:13] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705904 (owner: 10TrainBranchBot) [14:45:22] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705904 (owner: 10TrainBranchBot) [14:47:55] 10Release-Engineering-Team (Doing), 10Gerrit (Gerrit 3.3): Upgrade Gerrit to 3.3 - https://phabricator.wikimedia.org/T262241 (10hashar) Next is to upgrade Gerrit to 3.3 :-] I would like to do it next week if we are good enough. [15:00:16] !log deployment-prep: Change password for `Martin Urbanec` at votewiki [15:00:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:04:24] (03PS1) 10Hashar: Merge 'wmf/stable-3.2' into wmf/stable-3.3 [software/gerrit/plugins/gitiles] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/705929 (https://phabricator.wikimedia.org/T262241) [15:24:18] (03PS1) 10C. Scott Ananian: Allow Flow to use Parsoid for API tests [integration/config] - 10https://gerrit.wikimedia.org/r/705932 (https://phabricator.wikimedia.org/T218534) [15:26:10] (03PS1) 10C. Scott Ananian: Run parsoid integration tests with the Translate extension installed [integration/config] - 10https://gerrit.wikimedia.org/r/705933 [15:26:35] (03PS1) 10Hashar: Merge branch 'wmf/stable-3.2' into wmf/stable-3.3 [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/705934 (https://phabricator.wikimedia.org/T262241) [15:27:39] (03CR) 10Hashar: "This "might" be sufficient. Then I have used wmf-plugins-update.sh which update our submodules from the remote. We might bring unrelated" [software/gerrit] (wmf/stable-3.3) - 10https://gerrit.wikimedia.org/r/705934 (https://phabricator.wikimedia.org/T262241) (owner: 10Hashar) [15:39:28] 10Release-Engineering-Team (Radar), 10Abstract Wikipedia team (Phase ΞΆ), 10Release Pipeline (Blubber): More flexible treatment of requirements in `python`; control over `PYTHONPATH` - https://phabricator.wikimedia.org/T282795 (10Jdforrester-WMF) a:03cmassaro [15:43:42] (03CR) 10C. Scott Ananian: "> Patch Set 2:" [integration/quibble] - 10https://gerrit.wikimedia.org/r/705746 (https://phabricator.wikimedia.org/T287001) (owner: 10Zabe) [15:45:18] (03CR) 10C. Scott Ananian: "> See https://gerrit.wikimedia.org/r/c/integration/quibble/+/705746" [integration/quibble] - 10https://gerrit.wikimedia.org/r/705907 (https://phabricator.wikimedia.org/T218534) (owner: 10C. Scott Ananian) [15:46:07] (03CR) 10C. Scott Ananian: "See also Ia9ade86376c80d4d9016471924683bb5eae52d3f which should solve the Flow team's actual needs." [integration/quibble] - 10https://gerrit.wikimedia.org/r/705907 (https://phabricator.wikimedia.org/T218534) (owner: 10C. Scott Ananian) [15:56:07] (03CR) 10C. Scott Ananian: "Note that this patch *may* cause some of the same effects as T287001 unless/until the Flow (and other?) tests are fixed, since it will in " [integration/config] - 10https://gerrit.wikimedia.org/r/705932 (https://phabricator.wikimedia.org/T218534) (owner: 10C. Scott Ananian) [16:01:32] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10cscott) Can we save the logs for one of the failed builds in GrowthExperiments as wel... [16:07:27] !log Updating plugins on releases-jenkins [16:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:25:01] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10Zabe) >>! In T287001#7227749, @cscott wrote: > Can we save the logs for one of the fa... [16:25:09] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705945 [16:25:13] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705945 (owner: 10TrainBranchBot) [16:25:43] (03CR) 10C. Scott Ananian: [C: 03+1] "> Patch Set 10:" [integration/config] - 10https://gerrit.wikimedia.org/r/655695 (owner: 10C. Scott Ananian) [16:26:08] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705945 (owner: 10TrainBranchBot) [16:29:20] 10Gerrit, 10Release-Engineering-Team, 10User-brennen: Gerrit download dialog is missing branch info after upgrade to 3.2.11 - https://phabricator.wikimedia.org/T287099 (10brennen) [16:31:22] 10Gerrit, 10Release-Engineering-Team, 10User-brennen: Gerrit download dialog is missing branch info after upgrade to 3.2.11 - https://phabricator.wikimedia.org/T287099 (10Paladox) https://gerrit.wikimedia.org/r/admin/plugins doesn't look like the download plugin is installed? Neither is codemirror-editor. CC... [16:32:23] 10Gerrit, 10Release-Engineering-Team, 10User-brennen: Gerrit download dialog is missing branch info after upgrade to 3.2.11 - https://phabricator.wikimedia.org/T287099 (10Paladox) Missing several of the default plugins too https://github.com/GerritCodeReview/gerrit/tree/master/plugins [16:32:59] (03CR) 10C. Scott Ananian: [C: 03+1] "> Patch Set 6:" [integration/config] - 10https://gerrit.wikimedia.org/r/655695 (owner: 10C. Scott Ananian) [16:35:06] brennen: dancy: paladox : so indeed we have lost the downloadable plugin [16:35:11] which is 100% my fault [16:35:34] and also other default plugins like delete-project and codemirror-editor [16:35:39] earlier today I have scap deploy its-phabricator [16:35:50] and scap replaced the plugins directory entirely [16:35:58] ah [16:36:02] doh! [16:36:09] cause scap fetch from the deploy server [16:36:13] so there are only our plugins listed [16:36:25] then it change the symlink from the old dir to the new one [16:36:32] and we are missing all the plugins that are built in gerrit [16:36:46] * hashar <-- newbie admin [16:36:57] that's scap for you [16:37:10] so I think we just have to ask gerrit to extract its plugin [16:37:14] are we using the upstream war now? [16:37:17] and restore the gitiles.jar one that we have forked [16:37:21] yeah upstream war [16:37:22] or do we still maintain patches on top? [16:37:23] nice! [16:37:44] and I have to update the doc again [16:38:04] dancy found the issue :] [16:38:13] so in https://wikitech.wikimedia.org/wiki/Gerrit/Upgrade#Deploying [16:38:25] it says "if you are only deploying plugins, you are done" [16:38:29] but clearly not [16:38:39] we should java -jar bin/gerrit.war init --batch --install-all-plugins [16:38:44] to restore the bundled plugins [16:38:49] (03PS11) 10C. Scott Ananian: Zuul: Add Parsoid to the gatedextensions list [integration/config] - 10https://gerrit.wikimedia.org/r/655695 [16:38:55] and checkout gitiles.jar [16:39:02] interesting [16:39:18] something I found earlier today is that Gerrit automatically reload plugins [16:39:20] I guess we used to unzip the war and pull out the plugins to the plugin directory [16:39:22] it seems to track changes to that dir [16:39:38] (which is probably an anti-pattern) [16:40:14] (aside: you can have scap run that command automagically in future if you'd like) [16:41:04] 10Gerrit, 10Release-Engineering-Team, 10User-brennen: Gerrit download dialog is missing branch info after upgrade to 3.2.11 - https://phabricator.wikimedia.org/T287099 (10hashar) The root cause is that I have upgraded the its-phabricator plugin. scap replaces the plugins directory with the one to deploy and... [16:41:11] so yeah definitely [16:41:20] dancy: brennen: wanna ddo the plugins restoration? [16:41:30] Sure [16:41:48] from https://wikitech.wikimedia.org/wiki/Gerrit/Upgrade#Deploying that is the commands we talked about yesterday [16:41:54] --install-all-plugins and git checkout '*.jar' [16:42:02] which 100% should be done by scap to save troubles [16:42:13] but I think we should just do them manually for now to restore the service [16:42:18] and find out how scap can do it after [16:42:45] I'll give it a shot. [16:42:53] I am amending the deploy doc [16:42:57] dancy: cool, thx. [16:43:57] 10Continuous-Integration-Infrastructure, 10DC-Ops, 10Infrastructure-Foundations, 10netops, 10serviceops: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 (10Dzahn) ACKed some more today, gerrit2001.mgmt, wdqs2002.mgmt [16:44:26] Done on gerrit-replica [16:45:23] hmm. I do not appear to be authorized to declare downtimes in Icinga [16:46:03] I have updated the doc [16:46:04] https://wikitech.wikimedia.org/wiki/Gerrit/Upgrade#Deploying [16:46:12] for Icinga yeah I will look that up [16:46:12] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10Urbanecm) >>! In T287001#7227861, @Zabe wrote: >>>! In T287001#7227749, @cscott wrote... [16:46:26] !log restarting Gerrit to fix plugins [16:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:46:56] define contactgroup { [16:46:56] contactgroup_name gerrit [16:46:56] members irc-releng,irc,thcipriani,hashar,twentyafterfour,qchris [16:47:02] hashar: does the change from earlier need to be reverted? [16:47:30] mutante: nop it is fine. I just screwed up the deployment by not following the doc (which was wrong anyway) [16:47:35] alright [16:47:54] there are different ways to allow people to do Icinga downtimes [16:47:56] mutante: my deploy of its-phabricator earlier today had the side effect of removing plugins that are shipped with gerrit core [16:48:24] a) add to global privs in icinga config (for all services) b) make sure is in contact group belonging to service c) grant shell on cumin host to run cookbook ...etc [16:48:31] well maybe the whole team should be made a contact of gerrit [16:48:44] d) make sure user is capitalized correct way [16:49:05] yea, it should be a contact, but I dont think Icinga sends email anymore [16:49:14] it will still fix the privileges issue though [16:49:33] hmm [16:49:46] or wait until we have cookbooks that dont need root [16:49:46] bind address already in use again dancy :\ [16:49:48] ongoing ticket [16:49:58] What the hell! [16:50:09] interesting, so the issue as not related to DNS / IPv6 [16:50:09] [2021-07-21T16:49:17.461+0000] [main] ERROR org.apache.sshd.common.io.nio2.Nio2Acceptor : bind(gerrit.wikimedia.org/2620:0:861:2:208:80:154:137:29418) - failed (BindException) to bind: Address already in use [16:50:29] what does netstat say is using i? [16:50:31] yeah it has two ipv6 :/ [16:50:41] listenAddress = [2620:0:861:2:208:80:154:137]:29418 [16:50:41] listenAddress = [2620:0:861:2:208:80:154:137]:29418 [16:50:46] I thought that got fixed yesterday? [16:50:51] i thought the same. [16:51:03] PROBLEM - SSH access on gerrit1001 is CRITICAL: connect to address 208.80.154.137 and port 29418: Connection refused https://wikitech.wikimedia.org/wiki/Gerrit [16:51:12] Fixing manually in the meantime [16:51:25] how does this not happen on codfw first? [16:52:03] RECOVERY - SSH access on gerrit1001 is OK: SSH OK - GerritCodeReview_3.2.11 (APACHE-SSHD-2.4.0) (protocol 2.0) https://wikitech.wikimedia.org/wiki/Gerrit [16:52:11] maybe puppet fails to regenerate the config file [16:52:19] Plugins list is much bigger now. [16:52:31] we fixed the file manually yesterday too, so puppet re-broke it [16:52:52] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10cscott) I *believe* that the failure mode here is (a) the final integrated test on ma... [16:53:18] even though the puppet compiler was all happy about it bah [16:53:25] can we try this on gerrit-replica? edit config, run puppet to see if it adds it again ? [16:53:45] I'll try that. [16:53:58] +1 [16:54:35] hieradata/hosts/gerrit1001.yaml:profile::gerrit::ipv4: '208.80.154.137' [16:54:35] hieradata/hosts/gerrit2001.yaml:profile::gerrit::ipv4: '208.80.153.107' [16:54:42] (03PS12) 10C. Scott Ananian: Zuul: Add Parsoid to the gatedextensions list [integration/config] - 10https://gerrit.wikimedia.org/r/655695 [16:54:44] so gerrit-replica's config is bad too.. I think it just hasn't been restarted since it managed to start up right the other day [16:54:44] (03PS1) 10C. Scott Ananian: Run extension-gate jobs experimentally for Parsoid [integration/config] - 10https://gerrit.wikimedia.org/r/705966 (https://phabricator.wikimedia.org/T287001) [16:54:46] that is what we should have [16:54:46] (03PS1) 10C. Scott Ananian: Parsoid: Turn on extension-gate jobs [integration/config] - 10https://gerrit.wikimedia.org/r/705967 [16:55:12] that's right, puppet will not restart gerrit [16:55:26] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10cscott) @Zabe @Urbanecm Thanks. Examination shows that's still the same set of Flow... [16:56:07] (03CR) 10C. Scott Ananian: "This should be safe, and will help us look into the Flow issues in T287001." [integration/config] - 10https://gerrit.wikimedia.org/r/705966 (https://phabricator.wikimedia.org/T287001) (owner: 10C. Scott Ananian) [16:56:08] dancy: but if you manually remove one of the 2 lines, does puppet add it back (still, after the change from yesteday) or not? [16:56:28] (03CR) 10jerkins-bot: [V: 04-1] Parsoid: Turn on extension-gate jobs [integration/config] - 10https://gerrit.wikimedia.org/r/705967 (owner: 10C. Scott Ananian) [16:56:31] 10Gerrit, 10Release-Engineering-Team, 10User-brennen: Gerrit download dialog is missing branch info after upgrade to 3.2.11 - https://phabricator.wikimedia.org/T287099 (10hashar) 05Openβ†’03Resolved a:03dancy We were missing the Gerrit bundled plugins after I have updated its-phabricator earlier today.... [16:56:32] I'll manually fix gerrit.config and see if it reverts. [16:57:37] re-running puppet [16:58:14] maybe the hiera setting is not properly taken in account, but hte puppet compiler definitely showed the update was working [16:58:32] or puppet does not regenerate the file on a hiera change, but that would sound crazy [16:58:35] while re-running puppet I see a bunch of the unexpected changes that we saw on Monday. [16:58:54] that sounds like config was manually edited in the past? [16:59:13] that doesn't seem right. why would those changes recur? [16:59:30] nothing _should_ have changed it in the intervening time [16:59:38] Indeed. [16:59:44] hieradata/role/common/gerrit.yaml:profile::gerrit::config: 'gerrit.config.erb' [16:59:48] ^ same config for all [17:00:00] gerrit.config is still good after the puppet run on gerrit2002. [17:00:08] this is /var/lib/gerrit2/review_site/etc/gerrit.config , right? [17:00:33] dancy: nice, that's something at least [17:00:45] yeah, that's the file [17:00:47] hmm [17:01:01] I saved the old one to gerrit.config.bad [17:01:02] 10Release-Engineering-Team (Doing), 10User-brennen, 10User-greg: Underrun and CI capacity - https://phabricator.wikimedia.org/T244515 (10greg) 05Openβ†’03Declined Stuff (gitlab) happened. [17:01:13] does gerrit itself edit the config when you run some action to update plugis? [17:01:21] Not that I'm aware of. [17:01:23] so maybe gerrit init --batch --install-all-plugins does mangle the gerrit.config :\ [17:01:31] shall we try it? [17:01:35] that ^ what I was thinking.. hmm [17:01:41] the init part [17:01:47] ok, I'll try on the replica [17:01:49] yea, try it on 2001 [17:02:24] (03CR) 10C. Scott Ananian: "> Patch Set 10:" [integration/config] - 10https://gerrit.wikimedia.org/r/655695 (owner: 10C. Scott Ananian) [17:03:31] confirmed.. the init process is changing gerrit.config [17:03:37] :-\ [17:03:59] ok, so.. the process needs to be: downtime, shut down service, run init, run puppet, re-enable [17:04:02] i guess [17:04:02] diff incoming shortly. [17:04:16] (03Abandoned) 10C. Scott Ananian: Parsoid: Turn on extension-gate jobs [integration/config] - 10https://gerrit.wikimedia.org/r/705967 (owner: 10C. Scott Ananian) [17:04:32] (03PS2) 10C. Scott Ananian: Run extension-gate jobs experimentally for Parsoid [integration/config] - 10https://gerrit.wikimedia.org/r/705966 (https://phabricator.wikimedia.org/T287001) [17:04:34] (03PS13) 10C. Scott Ananian: Zuul: Add Parsoid to the gatedextensions list [integration/config] - 10https://gerrit.wikimedia.org/r/655695 [17:04:35] https://www.irccloud.com/pastebin/Wn8E5Gph/gerrit.config.diff [17:05:03] and gerrit init "helpfully" fix the listenAddress great [17:05:20] which would also explain the oddity yesterday [17:05:36] maybe that's even why the DNS name was used in the past [17:05:46] and changing from a fqdn to the ipv4 did not fix anything [17:05:56] but there have also been changes where we tried to sync the generated gerrit config with puppet [17:06:00] (03CR) 10jerkins-bot: [V: 04-1] Run extension-gate jobs experimentally for Parsoid [integration/config] - 10https://gerrit.wikimedia.org/r/705966 (https://phabricator.wikimedia.org/T287001) (owner: 10C. Scott Ananian) [17:06:16] yea, nothing to do with DNS resolution [17:06:58] paladox: ^ did you know that part ? can plugins be installed without that init part? [17:07:27] mutante: we are fleeing to a team meeting. will catch up after that [17:08:10] ok, I can't wait until after your team meeting though [17:08:21] yeh plugins can be installed without the init part. Though if you want to install the default plugins via the war then you have to do it via init i think. [17:08:24] let's keep it disabled, i can add a downtime for the puppet run [17:08:42] paladox: thanks! [17:08:45] dancy: ^ [17:09:00] sp yeah my idea was to simply extract the bundled plugins [17:09:58] and `gerrit init --install-all-plugins` sounded appealing [17:10:11] I have totally missed init is used to update the gerrit.config [17:10:49] puppet run alert is in sccheduled downtime now for both gerrit servers until tomorrow [17:10:51] (03PS3) 10C. Scott Ananian: Run extension-gate jobs experimentally for Parsoid [integration/config] - 10https://gerrit.wikimedia.org/r/705966 (https://phabricator.wikimedia.org/T287001) [17:10:53] (03PS14) 10C. Scott Ananian: Zuul: Add Parsoid to the gatedextensions list [integration/config] - 10https://gerrit.wikimedia.org/r/655695 [17:10:57] mutante: thank you! [17:11:36] if it's more urgent you can text me at the number on office wiki [17:11:42] afk for now [17:12:15] then given the plugins are in gerrit.war which is a zip file [17:12:20] we can probablyuse unzip instead :] [17:21:18] unzip -o -j gerrit.war 'WEB-INF/plugins/*' -d plugins [17:21:34] 10Release-Engineering-Team, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, and 2 others: Core tests failing due to Flow HTTP requests and ServiceContainer access - https://phabricator.wikimedia.org/T287001 (10cscott) Brief to-do, to summarize: 1. Revert the original parsoid-related patch in qu... [17:24:45] (03CR) 10C. Scott Ananian: "To clarify: it will probably break Flow's own CI in the same way T287001 does, but it will *not* break the world in the way T287001 did, s" [integration/config] - 10https://gerrit.wikimedia.org/r/705932 (https://phabricator.wikimedia.org/T218534) (owner: 10C. Scott Ananian) [17:31:00] James_F: it should be every 24h, but it's possible something is broken [17:31:17] legoktm: :-( [17:31:41] It's now been well over a week and it's still showing that. [17:32:21] We've just landed an hour ago a patch that bumps mwcs from 36 to 37; if that shows up tomorrow, then it's running but there's a bug with removal or something. Will keep an eye. [17:34:41] Active: failed (Result: exit-code) since Fri 2021-07-09 01:03:16 UTC; 1 weeks 5 days ago [17:36:46] restarted and requeued everything [17:39:50] oh ffs [17:39:58] the push worker has been down for a month [17:42:21] spaaaaaaam [17:43:55] https://github.com/postcss/postcss/releases/tag/7.0.36 we'll be getting updates for this [18:14:25] hashar: i thought we removed singleusergroup a while ago? [18:14:32] also plugin-manager seems to be installed [18:15:00] paladox: I guess cause they come from gerritupstream [18:15:35] ok [18:15:58] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705975 [18:16:00] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705975 (owner: 10TrainBranchBot) [18:16:07] we might have to revisit [18:17:08] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705975 (owner: 10TrainBranchBot) [18:17:11] we will see [18:17:13] I am away [18:49:30] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705977 [18:49:32] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705977 (owner: 10TrainBranchBot) [18:50:26] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705977 (owner: 10TrainBranchBot) [18:55:50] (03PS1) 10TrainBranchBot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705981 [18:55:52] (03CR) 10TrainBranchBot: [C: 03+2] Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705981 (owner: 10TrainBranchBot) [18:56:51] (03Merged) 10jenkins-bot: Update state/train-versions.json [tools/release] - 10https://gerrit.wikimedia.org/r/705981 (owner: 10TrainBranchBot) [18:58:04] thcipriani: $ ./check.sh '0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465' [18:58:05] gives me [18:58:18] % Total % Received % Xferd Average Speed Time Time Time Current [18:58:18] 100 2228k 0 2228k 0 0 1267k 0 --:--:-- 0:00:01 --:--:-- 1267k [18:58:18] jq: error (at :5): Cannot iterate over null (null) [18:58:24] I'm on jq 1.6 from homebrew [18:58:38] (it asked for LDAP creds first, which were fine afaik) [18:59:11] I think that hash is meant to be a local json file [18:59:25] if you run "check.sh" without that argument does it give you the same thing? [18:59:27] yeah, the file was created and looks fine (non-empty) [18:59:32] huh [18:59:41] f 2.2M Jul 21 19:57 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465.json [19:00:16] thcipriani: without argument same issue [19:01:30] If I run `jq -r . 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465.json` then it prints it back to stdout without errors [19:01:30] hrm, I...had that problem when I did: ./check.sh 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465.json but not ./check.sh 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465 [19:01:48] ack, I didn't pass check.sh a file extension [19:03:38] jq -r '.objects[0].attributes.kibanaSavedObjectMeta.searchSourceJSON | fromjson | .filter | .[].meta.alias' < 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465.json work for you? [19:05:51] i can also take a look at this in a few minutes [19:06:41] !log gitlab1001: running ansible to deploy nginx logging and status changes (T274462, T275170) [19:06:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:06:47] T275170: Define monitoring for gitlab - https://phabricator.wikimedia.org/T275170 [19:06:47] T274462: Logging for GitLab - https://phabricator.wikimedia.org/T274462 [19:06:51] $ jq -r '.objects[0].attributes.kibanaSavedObjectMeta.searchSourceJSON | fromjson | .filter | .[].meta.alias' 0a9ecdc0-b6dc-11e8-9d8f-dbc23b470465.json [19:06:55] That returns a list of tasks for me [19:09:07] thcipriani: the sed command might be the problem? [19:09:20] https://paste.toolforge.org/view/a1acbaf7 [19:09:28] it's leaving behind some nulls it seems [19:10:06] the grep after that matches nothing naturally [19:10:19] perhaps some non-portal sed isue [19:10:35] not improbable if you're not using a gnu sed [19:10:39] BSD/Darwin 'sed' [19:10:42] yeah [19:11:47] posix seems to cover sed, so there's probably a portable way to do it [19:12:18] it might also be reasonable to rewrite this script in python rather than expend too much debugging effort on the shell pipeline. [19:12:28] it's kind of on the edge of "make this a real program" already. [19:12:59] as long as there isn't a docker file, I'll run anything [19:13:03] haha [19:13:21] I do realize how backwards that sounds in terms of security, but reality disagrees [19:13:58] you won't get any argument from me. [19:13:59] * thcipriani embeds bitcoin miner [19:14:05] (in check.sh) [19:14:40] little snitch and macos' "system integrity protection" together actually make it somewhat more comfortable to be running CLI stuff knowing it can't (easily) connect to things on the network and can't (easily) read or write to directories outside CWD. [19:15:05] e.g. if I run a shell script reading ~/Documents or ~/Photos, the process is halted with a system native dialog asking for permission [19:15:30] handy [19:16:22] ^ [19:22:48] Anyone knows about that TrainBranchBot. I am not sure I get itd purpose ot why that adds ephemereal commits to mediawiki/tools/release [19:23:12] That's part of the mediawiki container image build process. [19:23:19] You can ignore it. And I plan to remove it soon [19:23:41] Ah excellent 😁 [19:24:50] Thx dancy [19:24:57] (03PS21) 10Ahmon Dancy: Prototype of incremental image build process [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) [19:25:35] (03PS22) 10Ahmon Dancy: Incremental mediawiki container image build process [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) [19:26:17] (03CR) 10Ahmon Dancy: [C: 03+1] "Ready for review. You can see this in action at https://releases-jenkins.wikimedia.org/job/new-build-mw-container-image/" [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [20:08:05] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T281158 (10thcipriani) a:05mmodellβ†’03dduvall [20:08:20] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T281159 (10thcipriani) a:05dduvallβ†’03jeena [20:09:13] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T281160 (10thcipriani) a:05jeenaβ†’03brennen [20:09:55] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T281161 (10thcipriani) p:05Triageβ†’03Medium a:03dancy [20:16:52] jeena & dduvall: I'd like to have a sit down with y'all to clear up in my head what my next steps are to get Toolhub building images and sticking them in a docker registry. It looks like all 3 of us have some open time tomorrow afternoon. Is it ok with both of you if I send a meet invite? [20:17:45] bd808: πŸ‘ [20:21:34] sweet. invite sent [20:22:18] 10Release-Engineering-Team, 10GitLab (Initialization): Establish a routine GitLab deployment / update window - https://phabricator.wikimedia.org/T287117 (10brennen) [20:24:17] (03CR) 10DannyS712: Incremental mediawiki container image build process (039 comments) [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [20:24:19] (03CR) 10Thcipriani: [C: 03+2] "<3" [tools/release] - 10https://gerrit.wikimedia.org/r/705772 (https://phabricator.wikimedia.org/T287054) (owner: 10Reedy) [20:25:36] (03Merged) 10jenkins-bot: Stop doing deprecated API token calls [tools/release] - 10https://gerrit.wikimedia.org/r/705772 (https://phabricator.wikimedia.org/T287054) (owner: 10Reedy) [20:35:48] 10Release-Engineering-Team (Doing), 10GitLab (Initialization), 10User-brennen: Investigate whether issues, operations, wikis, etc. can be disabled globally on GitLab - https://phabricator.wikimedia.org/T264231 (10greg) p:05Triageβ†’03Medium [20:36:03] !sal Newest scap deployed to beta cluster [20:36:03] https://tools.wmflabs.org/sal/releng [20:36:14] !log Newest scap deployed to beta cluster [20:36:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:42:41] (03CR) 10Ahmon Dancy: [C: 03+1] Incremental mediawiki container image build process (031 comment) [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [20:45:06] (03PS1) 10Ahmon Dancy: Update beta cluster deployment notes [tools/scap] - 10https://gerrit.wikimedia.org/r/706037 [20:47:28] (03PS1) 10Hashar: scap: automatize plugins handling [software/gerrit] (deploy/wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/706038 [21:02:39] 10Gerrit, 10Release-Engineering-Team: resolve gerrit.config dispredancy between managed config and gerrit init - https://phabricator.wikimedia.org/T287122 (10hashar) [21:05:25] bd808: sounds good. accepted :) [21:06:16] !log gitlab1001: running ansible for logging typo fix (T274462) [21:06:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:06:20] T274462: Logging for GitLab - https://phabricator.wikimedia.org/T274462 [21:07:26] (03PS23) 10Ahmon Dancy: Incremental mediawiki container image build process [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) [21:08:21] (03CR) 10jerkins-bot: [V: 04-1] Incremental mediawiki container image build process [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [21:09:28] (03CR) 10Ahmon Dancy: Incremental mediawiki container image build process (038 comments) [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [21:11:01] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: resolve gerrit.config disprepancy between managed config and gerrit init - https://phabricator.wikimedia.org/T287122 (10brennen) [21:12:10] (03PS24) 10Ahmon Dancy: Incremental mediawiki container image build process [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) [21:13:14] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review: resolve gerrit.config disprepancy between managed config and gerrit init - https://phabricator.wikimedia.org/T287122 (10hashar) For `sshd.listenAddress`, I am afraid we have to dig into Gerrit core to find out what kind of logic it uses. @Dzahn sugges... [21:15:56] brennen: dancy: so I did some patches to normalize the gerrit.config to whatever "gerrit init" generates [21:16:26] which is easy after dancy had the good idea to capture the diff between the puppet managed config and the state after "gerrit init" ran [21:16:44] Cool. I'll take a look tomorrow. Feel free to nag. [21:16:55] sshd.listenAddress well .. I am convinced mutante is right. Setting it to gerrit.wikimedia.org was a hack to workaround init resolving it to a v6 [21:17:03] yeah don't worry [21:17:10] will had you as reviewers tomorrow [21:17:21] and look at the code for listenAddress and get that fixed [21:17:31] or well most probably just listen on all IP [21:21:04] (03CR) 10DannyS712: Incremental mediawiki container image build process (032 comments) [tools/release] - 10https://gerrit.wikimedia.org/r/705003 (https://phabricator.wikimedia.org/T286505) (owner: 10Ahmon Dancy) [21:22:08] good work [21:25:24] you know like opensource software when you end up looking up InetSocketAddress.getAddress() implementation [21:25:44] trying to find how java resolves a fqdn D [21:30:30] 10Phabricator, 10Release-Engineering-Team, 10Security-Team: Grant access to OIT-LDAP Diffusion repo to contractor - https://phabricator.wikimedia.org/T287124 (10sbassett) [21:30:52] Shawn Pearce 2009-12-17 12:16:45 [21:31:00] so yeah the feature has been around since the early da [21:31:03] y [21:31:19] 10Phabricator, 10Release-Engineering-Team, 10Security-Team: Grant access to OIT-LDAP Diffusion repo to contractor - https://phabricator.wikimedia.org/T287124 (10sbassett) [21:41:02] (03CR) 10Legoktm: [C: 04-1] "I don't think limiting ourselves to 72 characters because of Gerrit makes sense. I don't think https://www.mediawiki.org/w/index.php?diff=" [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/705438 (owner: 10TK-999) [22:07:58] 10Release-Engineering-Team (Next), 10MW-on-K8s, 10serviceops: Perform l10n cache rebuild using initContainers instead of including it in the image - https://phabricator.wikimedia.org/T286952 (10dduvall) >>! In T286952#7226319, @Joe wrote: > I have one doubt about the idea of using persistent local volumes...... [22:26:52] 10Release-Engineering-Team, 10Community-Relations: Expand the list of group 1 wikis to contain at least one (preferably 2) smaller "top ten size" wikis - https://phabricator.wikimedia.org/T286664 (10Jdlrobson) Having Italian Wikipedia in group 1 would be wonderful. There are lots of really helpful volunteers t... [22:37:15] (03PS1) 10RLazarus: zuul: Add a new project for operations/docker-images/imagecatalog [integration/config] - 10https://gerrit.wikimedia.org/r/706048 (https://phabricator.wikimedia.org/T287130) [22:39:34] 10Gerrit, 10Release-Engineering-Team (Doing), 10Patch-For-Review: Upgrade Gerrit to 3.2.11 - https://phabricator.wikimedia.org/T278990 (10hashar) Follow up for sshd.listenAddress is done via T287122 [23:42:53] 10Release-Engineering-Team (Doing), 10Release, 10Train Deployments: 1.37.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T281158 (10Jdlrobson)