[07:41:27] !log experimenting with PKI and kafka logging on deployment-prep, logstash dashboard/traffic may be down (please ping me in case it is a problem) [07:41:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:10:40] (03CR) 10Hashar: [C: 03+2] "I am updating the jobs to Quibble 1.4.4 :]" [integration/config] - 10https://gerrit.wikimedia.org/r/771640 (https://phabricator.wikimedia.org/T300340) (owner: 10Hashar) [08:12:38] (03Merged) 10jenkins-bot: jjb: switch jobs to Quibble 1.4.4 [integration/config] - 10https://gerrit.wikimedia.org/r/771640 (https://phabricator.wikimedia.org/T300340) (owner: 10Hashar) [08:34:18] (03PS1) 10Kosta Harlan: jjb: Remove parallel-npm-install flag [integration/config] - 10https://gerrit.wikimedia.org/r/771820 (https://phabricator.wikimedia.org/T303270) [08:38:58] (03CR) 10Hashar: [C: 03+2] jjb: Remove parallel-npm-install flag [integration/config] - 10https://gerrit.wikimedia.org/r/771820 (https://phabricator.wikimedia.org/T303270) (owner: 10Kosta Harlan) [08:40:54] (03Merged) 10jenkins-bot: jjb: Remove parallel-npm-install flag [integration/config] - 10https://gerrit.wikimedia.org/r/771820 (https://phabricator.wikimedia.org/T303270) (owner: 10Kosta Harlan) [08:41:07] (03Abandoned) 10Kosta Harlan: jjb: Bump Quibble jobs to use memcached [integration/config] - 10https://gerrit.wikimedia.org/r/770468 (https://phabricator.wikimedia.org/T300340) (owner: 10Kosta Harlan) [08:41:31] kostajh: oops sorry for the duplicate change :D [08:41:37] I have deployed the parallel npm install [08:42:19] hashar: cool, thank you [08:43:09] I did spend a few hours trying to reproduce the Parallel deadlock but could not :-\ [08:49:18] sounds painful :( [09:03:05] 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.5.0 - https://phabricator.wikimedia.org/T304134 (10jnuche) [09:08:13] (03PS1) 10Kosta Harlan: parameter_functions: Remove parsoid as a dependency for GrowthExperiments [integration/config] - 10https://gerrit.wikimedia.org/r/771827 [09:18:42] (03CR) 10Jaime Nuche: [C: 03+2] Increase robustness of DeployPromote._get_train_task() [tools/scap] - 10https://gerrit.wikimedia.org/r/771748 (https://phabricator.wikimedia.org/T302488) (owner: 10Ahmon Dancy) [09:19:55] hashar: please let me know if you deploy https://gerrit.wikimedia.org/r/771827 so I can do a recheck on a GrowthExperiments patch, it's possible it will need to be rolled back [09:20:38] (03CR) 10Hashar: [C: 03+2] parameter_functions: Remove parsoid as a dependency for GrowthExperiments [integration/config] - 10https://gerrit.wikimedia.org/r/771827 (owner: 10Kosta Harlan) [09:20:49] kostajh: doing :] [09:20:54] (03Merged) 10jenkins-bot: Increase robustness of DeployPromote._get_train_task() [tools/scap] - 10https://gerrit.wikimedia.org/r/771748 (https://phabricator.wikimedia.org/T302488) (owner: 10Ahmon Dancy) [09:20:59] Cscott mentionned the issue on Slack as well [09:23:06] (03Merged) 10jenkins-bot: parameter_functions: Remove parsoid as a dependency for GrowthExperiments [integration/config] - 10https://gerrit.wikimedia.org/r/771827 (owner: 10Kosta Harlan) [09:30:39] hashar: btw there are ParserTests failing again (https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-noselenium-docker/141634/console) which is what prompted me to make this patch :) [09:30:50] hashar: is it deployed now? should I do a recheck? [09:32:57] kostajh: deployed, sorry I am in a call :D [09:33:06] hmm [09:33:10] deployed now [09:35:52] cscott talked about that parsoid / growthexperiment issue on a slack thread [09:36:25] my understand is the team is willing to fix the situation by having parsoid enabled from mediawiki/core and using the version from composer/vendor.git [09:36:48] instead of relying on the hack of injecting parsoid.git in CI in order to have Quibble to enable it [09:37:09] then of course, back compatibility is complicated :-\ [09:37:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [09:38:07] https://app.slack.com/client/T024KLHS4/C024Z8K9CAU/thread/C024Z8K9CAU-1647378000.085489 is the thread [09:38:43] hmm maybe that was a different issue bah [09:51:55] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [09:52:31] running a recheck now [09:52:53] that Zuul alarm is due to a ContentTranslation chain of patches [10:04:49] yeah, geez [10:06:59] 10Project-Admins: Create a project tag for "Inguma Wikibase" project - https://phabricator.wikimedia.org/T304143 (10DL2204) [10:17:05] 10Project-Admins: Create a project tag for "Inguma Wikibase" project - https://phabricator.wikimedia.org/T304143 (10DL2204) [10:23:25] hashar: looks like something is still pulling in parsoid https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/108299/consoleFull [10:27:19] I have no clue what is doing that, though [10:27:40] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [10:31:50] 10Project-Admins: Create a project tag for "Inguma Wikibase" project - https://phabricator.wikimedia.org/T304143 (10DL2204) 05Open→03Resolved a:03DL2204 We will use existing project tag #wikimedia-user-group-basque [10:33:01] we have some failing Wikibase builds that are probably due to the new Quibble version: https://phabricator.wikimedia.org/T304147 [10:33:14] (also, shouldn’t wikibugs have mentioned the task in here? did we forget a relevant tag?) [10:36:03] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [10:37:27] Lucas_WMDE: I don't think that goes to here [10:37:35] Probably best to ask quibble or releng team [10:37:39] I thought it used to, but I might be wrong [10:37:59] (this is the releng channel, right? my IRC client isn’t lying to me? ^^) [10:38:47] Lucas_WMDE: ye it is [10:38:53] * RhinosF1 is glad it's Friday too [10:42:40] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [10:44:57] Lucas_WMDE: it probably needs one of the actual CI or releng projects fo rit to be reported in here, not just the "tag" project [10:45:34] according to the wikibugs config there’s a separate quibble channel that would get notified if the quibble tag was added [10:45:35] 10Continuous-Integration-Config, 10Wikidata, 10wdwb-tech, 10ci-test-error (WMF-deployed Build Failure): jenkis CI wikibase-repo-docker failing with new quibble version - https://phabricator.wikimedia.org/T304147 (10Peachey88) [10:45:43] if we’re sufficiently confident it’s quibble then maybe I should add that tag [10:45:45] ah ok ^^ [10:49:37] ^ #wikimedia-quibble [10:50:39] 10Continuous-Integration-Config, 10Quibble, 10Wikidata, 10wdwb-tech, 10ci-test-error (WMF-deployed Build Failure): jenkis CI wikibase-repo-docker failing with new quibble version - https://phabricator.wikimedia.org/T304147 (10kostajh) Seems related to {T300340}. No idea why it's happening for just these... [10:51:35] Lucas_WMDE: does Wikibase have some cache overrides in a LocalSettings file or something like that? [10:52:35] no idea [10:52:35] hashar: the postmerge job for trigger-research-mwaddlink-pipeline-publish (https://gerrit.wikimedia.org/r/c/research/mwaddlink/+/771828) has been queued for 55 minutes, I guess it got stuck when all the ContentTranslation patches went through. Is there a way to kick it to start again? [10:56:16] kostajh: some of the LocalSettings of those builds is generated by https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/bfe65fa25aab140e839ed22b3797be8bfd199578/build/ci-scripts/mw-apply-wb-settings.sh if that helps [10:56:45] thanks, I don't see anything problematic there [10:57:29] (03PS1) 10Kosta Harlan: Revert "jjb: switch jobs to Quibble 1.4.4" [integration/config] - 10https://gerrit.wikimedia.org/r/771724 (https://phabricator.wikimedia.org/T304147) [11:04:31] 10Release-Engineering-Team (🚂🧪 Trainsperiment Week), 10Performance-Team (Radar): Talk to Performance team and ServiceOps about caching for train experiment week - https://phabricator.wikimedia.org/T303758 (10akosiaris) Talked a bit with the team, we don't think opcache wise we see any big risk with going from... [12:10:35] 10Project-Admins: Create a project tag for "Inguma Wikibase" project - https://phabricator.wikimedia.org/T304143 (10Aklapper) 05Resolved→03Declined a:05DL2204→03None [12:59:47] kostajh: Lucas_WMDE; I was busy lunching with kids [13:00:36] kostajh: so `parsoid` is added as a dependency of multi repositories. If GrowthExperiments depends on one of those it would end up having parsoid as well :-\ [13:01:35] hashar: yeah, I just couldn't find which one :| [13:02:17] https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/108299/consoleFull shows how much a madness our CI tests are :-\ [13:02:33] it depends on soooo maany extensions [13:02:58] 'Disambiguator': ['VisualEditor', 'parsoid'], [13:05:26] I guess we can use an util script which given a repo yields the dependency tree [13:06:08] so [13:06:45] GrowthExperiments > PageViewInfo > Graph > JsonConfig > Kartographer > parsoid [13:06:48] that is one of them [13:08:26] Lucas_WMDE: I have pushed an update of Quibble 1.4.4 this morning indeed [13:08:29] AND [13:08:31] I did test it [13:08:46] but went on a meeting and did not check the result [13:08:50] the dummy change was https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/767776 [13:09:07] ah, I see [13:09:15] someone on our end commented that had been the first failed change [13:09:15] LogicException from line 408 of /workspace/src/includes/cache/MessageCache.php: Process cache for 'en' should be set by now. [13:09:16] fun [13:09:31] and I was wondering why someone had even submitted a Jenkins job validation change without a prior failure ^^ [13:10:09] I am pretty sure we generate the l10n cache in quibble [13:10:14] but maybe that happens AFTER update.php [13:12:04] grblblbl [13:12:40] it looks like the Language was only used at the end of update.php too [13:12:49] to localize the “in X secods” message, I think? [13:14:01] then we run update.php in all/most jobs [13:14:09] so I don't get why some jobs pass [13:15:09] I think they fail cause they invoke a specific command using `quibble --command` [13:20:58] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T300202 (10Krinkle) [13:21:19] I guess I will rollback [13:21:52] (03CR) 10Hashar: [C: 03+2] Revert "jjb: switch jobs to Quibble 1.4.4" [integration/config] - 10https://gerrit.wikimedia.org/r/771724 (https://phabricator.wikimedia.org/T304147) (owner: 10Kosta Harlan) [13:22:34] !log Rolling back Quibble jobs from 1.4.4 T304147 [13:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:22:36] T304147: jenkis CI wikibase-repo-docker failing with new quibble version - https://phabricator.wikimedia.org/T304147 [13:22:37] kostajh: thx for the patch [13:22:42] thanks! [13:22:42] and of course [13:22:47] I cant reproduce locally :-\\\ [13:22:56] at least not with just mediawiki/core [13:23:12] hashar: oh, is it because we are not starting supervisord for those jobs? [13:23:16] what I don't get is that iirc update.php explicitly set the localisation cache to be empty [13:23:45] kostajh: maybe, then nothing should be hitting the web service until AFTER update.php has run [13:24:12] (03Merged) 10jenkins-bot: Revert "jjb: switch jobs to Quibble 1.4.4" [integration/config] - 10https://gerrit.wikimedia.org/r/771724 (https://phabricator.wikimedia.org/T304147) (owner: 10Kosta Harlan) [13:24:30] yeah but we have LocalSettings.php saying to use memcache because the extension is loaded, but the service isn't started, and so the update.php process fails [13:24:51] OH [13:26:06] yeah that would be it [13:26:10] * Reedy grins [13:33:31] kostajh: Lucas_WMDE: it is rolling back still [13:34:39] 10Release-Engineering-Team, 10PageCuration, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: quibble-vendor-mysql-php72-noselenium-docker fails for a noop PageTriage patch - https://phabricator.wikimedia.org/T303092 (10kostajh) [13:35:27] 10Release-Engineering-Team, 10PageCuration, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: quibble-vendor-mysql-php72-noselenium-docker fails for a noop PageTriage patch - https://phabricator.wikimedia.org/T303092 (10kostajh) a:03kostajh [13:42:42] (Queue (Jenkins jobs + Zuul functions) alert) firing: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [13:48:34] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T300202 (10hashar) 05Open→03Resolved Looks like it is a success! [13:54:57] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [13:56:12] 10Continuous-Integration-Config, 10Quibble, 10Wikidata, 10wdwb-tech, and 2 others: jenkis CI wikibase-repo-docker failing with new quibble version - https://phabricator.wikimedia.org/T304147 (10kostajh) >>! In T304147#7788130, @Silvan_WMDE wrote: > Failing jobs: > https://integration.wikimedia.org/ci/job/l... [14:11:03] T304156 - worthy of a train rollback? [14:11:03] T304156: InvalidArgumentException when trying to move a page in the german language version - https://phabricator.wikimedia.org/T304156 [14:12:42] (Queue (Jenkins jobs + Zuul functions) alert) firing: (2) Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [14:16:00] I would say so [14:17:13] Well, that bug has been duped.. :P [14:18:46] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T300202 (10Zabe) [14:18:51] !log restart testing of kafka logging TLS certificates (may affect logstash in beta, ping me in case it is a problem) [14:18:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:19:10] and has been reported by users on multiple wikis [14:20:23] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [14:26:22] kostajh: thank you to have findout we lacked memcached due to not running supervisord! [14:26:25] huge time saver :] [14:27:00] will check later :) [14:32:42] (Queue (Jenkins jobs + Zuul functions) alert) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://alerts.wikimedia.org [14:35:44] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T300202 (10Ladsgroup) [15:26:09] 10Continuous-Integration-Config, 10Quibble, 10Wikidata, 10wdwb-tech, 10ci-test-error (WMF-deployed Build Failure): jenkis CI wikibase-repo-docker failing with new quibble version - https://phabricator.wikimedia.org/T304147 (10Lucas_Werkmeister_WMDE) Looks like CI is working again with the Quibble revert. [15:28:09] 10Release-Engineering-Team (🚂🧪 Trainsperiment Week): Delete wmf branches from Gerrit repositories - https://phabricator.wikimedia.org/T303828 (10hashar) [18:09:55] at least I have managed to do a patch for make-release/branch.py [18:10:18] add test for delete_branch https://gerrit.wikimedia.org/r/771955 [18:10:29] abort branch deletion if tag failed https://gerrit.wikimedia.org/r/771956 [18:12:27] end of the day