[01:01:59] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:04:11] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:33:38] 10Release-Engineering-Team (Next), 10Release, 10Train Deployments: 1.38.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T293959 (10Ladsgroup) There might be some new logs due to https://gerrit.wikimedia.org/r/751858 being merged, ignore those. I will clean them up. [10:32:59] (03CR) 10Hashar: [C: 03+2] Add EpicPupper to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752327 (owner: 10AntiCompositeNumber) [10:33:04] (03CR) 10Hashar: [C: 03+2] Add Operator873 to the zuul allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752354 (owner: 10Zabe) [10:33:09] (03CR) 10Hashar: [C: 03+2] Add Thomas-topway-it to the CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752997 (owner: 10Zoranzoki21) [10:34:56] (03Merged) 10jenkins-bot: Add EpicPupper to CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752327 (owner: 10AntiCompositeNumber) [10:34:58] (03Merged) 10jenkins-bot: Add Operator873 to the zuul allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752354 (owner: 10Zabe) [10:35:00] (03Merged) 10jenkins-bot: Add Thomas-topway-it to the CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/752997 (owner: 10Zoranzoki21) [10:35:55] (03CR) 10Hashar: [C: 03+2] inference: add publishing pipelines for postmerge [integration/config] - 10https://gerrit.wikimedia.org/r/752714 (https://phabricator.wikimedia.org/T297823) (owner: 10Accraze) [10:36:35] (03CR) 10Hashar: "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/752997 (owner: 10Zoranzoki21) [10:36:38] (03CR) 10Hashar: "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/752354 (owner: 10Zabe) [10:36:42] (03CR) 10Hashar: "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/752327 (owner: 10AntiCompositeNumber) [10:38:23] (03Merged) 10jenkins-bot: inference: add publishing pipelines for postmerge [integration/config] - 10https://gerrit.wikimedia.org/r/752714 (https://phabricator.wikimedia.org/T297823) (owner: 10Accraze) [10:40:31] (03CR) 10Hashar: "I have deployed the 14 new Jenkins jobs and manually triggered a build against the last changed that touched .pipeline/config.yaml https:/" [integration/config] - 10https://gerrit.wikimedia.org/r/752714 (https://phabricator.wikimedia.org/T297823) (owner: 10Accraze) [10:53:10] 10Release-Engineering-Team, 10Scap, 10serviceops: Deploy Scap version 4.1.1 - https://phabricator.wikimedia.org/T298986 (10Joe) p:05Triage→03Medium [11:12:44] 10Continuous-Integration-Infrastructure, 10Browser-Tests, 10User-zeljkofilipin: specFileRetries does not seem to apply in extension wdio.conf.js - https://phabricator.wikimedia.org/T296826 (10zeljkofilipin) >>! In T296826#7597944, @kostajh wrote: > @zeljkofilipin this will probably be useful for T285649, do... [11:13:12] 10Continuous-Integration-Infrastructure, 10Browser-Tests, 10User-zeljkofilipin: specFileRetries does not seem to apply in extension wdio.conf.js - https://phabricator.wikimedia.org/T296826 (10zeljkofilipin) a:05zeljkofilipin→03None [11:15:45] 10Continuous-Integration-Infrastructure, 10Browser-Tests, 10User-zeljkofilipin: specFileRetries does not seem to apply in extension wdio.conf.js - https://phabricator.wikimedia.org/T296826 (10zeljkofilipin) >>! In T296826#7540986, @kostajh wrote: > @zeljkofilipin no rush. I can also publish a release if you... [12:22:23] 10Gerrit: fatal: fetch-pack: invalid index-pack output on git fetch for mediawiki/core - https://phabricator.wikimedia.org/T298967 (10Kizule) I did `git fetch && git checkout -B master origin/master && git rebase` again, and I had this issue again. After three tries, it worked. [12:41:59] 10Phabricator: Delete Selenium tests in phabricator/deployment - https://phabricator.wikimedia.org/T299047 (10zeljkofilipin) [12:47:00] 10Phabricator: Delete Selenium tests in phabricator/deployment - https://phabricator.wikimedia.org/T299047 (10zeljkofilipin) [12:47:36] 10Phabricator, 10User-zeljkofilipin: Delete Selenium tests in phabricator/deployment - https://phabricator.wikimedia.org/T299047 (10zeljkofilipin) p:05Triage→03Medium [12:49:55] 10Phabricator, 10serviceops, 10Patch-For-Review: move phabricator to new hardware generation - https://phabricator.wikimedia.org/T280597 (10LSobanski) Should be preceded by https://phabricator.wikimedia.org/T296022. [12:58:45] 10Phabricator, 10Patch-For-Review, 10User-zeljkofilipin: Delete Selenium tests in phabricator/deployment - https://phabricator.wikimedia.org/T299047 (10zeljkofilipin) a:05zeljkofilipin→03None [13:01:16] 10Phabricator, 10Patch-For-Review, 10User-zeljkofilipin: Delete Selenium tests in phabricator/deployment - https://phabricator.wikimedia.org/T299047 (10zeljkofilipin) 05Open→03Resolved a:03zeljkofilipin [13:36:41] (03PS1) 10Hashar: jjb: document docker-run environment parameter [integration/config] - 10https://gerrit.wikimedia.org/r/753454 [14:05:10] (03PS1) 10Hashar: jjb: normalize usage of docker run --workdir [integration/config] - 10https://gerrit.wikimedia.org/r/753457 [14:05:12] (03PS1) 10Hashar: jjb: add 'working' to Jinja2 docker macro [integration/config] - 10https://gerrit.wikimedia.org/r/753458 [14:10:53] 10Beta-Cluster-Infrastructure, 10SRE, 10Traffic: Make varnish-frontend-restart work on Beta Cluster - https://phabricator.wikimedia.org/T299054 (10Majavah) [15:03:53] PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The following units failed: helm-repo-update.timer https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:08:38] ^ I'll take a look [15:10:23] RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:17:22] is jenkins in some weird state? cherry picked a commit for wmf18 and there's no jenkins activity https://gerrit.wikimedia.org/r/c/mediawiki/core/+/753085 [15:27:12] nm it just showed up [15:46:24] ftr: I removed helm2 from contin1001 and contint2001, see T251305 for more detail [15:46:25] T251305: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 [16:12:53] I have to pause shared GitLab Runners for a short amount of time (around 15m) [16:20:44] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10daniel) >>! In T293958#7614436, @dduvall wrote: > I did see a small number of replication lag related errors today following group0 deploym... [16:57:32] (03PS1) 1020after4: New scap saying: someone copying and pasting [tools/scap] - 10https://gerrit.wikimedia.org/r/753517 [16:59:36] (03CR) 10Ahmon Dancy: [C: 03+2] New scap saying: someone copying and pasting [tools/scap] - 10https://gerrit.wikimedia.org/r/753517 (owner: 1020after4) [17:00:31] (03Merged) 10jenkins-bot: New scap saying: someone copying and pasting [tools/scap] - 10https://gerrit.wikimedia.org/r/753517 (owner: 1020after4) [17:01:34] bd808: somehow randal monroe is channeling the flying pig: https://xkcd.com/2565/ [17:02:12] twentyafterfour: heh. when I saw that one I also thought of WMCS :) [17:02:24] (03CR) 10Hashar: [C: 03+2] Add 'parsoid' to the dependencies of 'Translate' [integration/config] - 10https://gerrit.wikimedia.org/r/753097 (https://phabricator.wikimedia.org/T295170) (owner: 10Isabelle Hurbain-Palatin) [17:04:20] (03Merged) 10jenkins-bot: Add 'parsoid' to the dependencies of 'Translate' [integration/config] - 10https://gerrit.wikimedia.org/r/753097 (https://phabricator.wikimedia.org/T295170) (owner: 10Isabelle Hurbain-Palatin) [17:32:23] (03PS4) 10Ahmon Dancy: scap/git.py: Replace get_disclosable_head with git merge-base HEAD origin [tools/scap] - 10https://gerrit.wikimedia.org/r/753177 [17:41:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Radar), 10SRE, 10ops-codfw, 10serviceops-radar: contint2001.mgmt disappeared from Icinga - https://phabricator.wikimedia.org/T298861 (10herron) [17:43:01] (03CR) 10Hashar: [C: 03+2] jjb: document docker-run environment parameter [integration/config] - 10https://gerrit.wikimedia.org/r/753454 (owner: 10Hashar) [17:43:59] (03CR) 10Hashar: "It is a noop in job configuration after parent change https://gerrit.wikimedia.org/r/c/integration/config/+/753457/ \o/" [integration/config] - 10https://gerrit.wikimedia.org/r/753458 (owner: 10Hashar) [17:46:01] (03Merged) 10jenkins-bot: jjb: document docker-run environment parameter [integration/config] - 10https://gerrit.wikimedia.org/r/753454 (owner: 10Hashar) [18:14:03] 10Phabricator: Editing a maniphest form has wrong preamble - https://phabricator.wikimedia.org/T295934 (10mmodell) 05Open→03Resolved a:03mmodell [18:58:19] !log Applied plugins update to https://releases-jenkins.wikimedia.org/ [18:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:07:03] dancy: :] thanks for all the work on scap auto! [19:08:44] I aim to please [19:08:48] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SecTeam-Processed, 10Security: 2022-01-12 Jenkins security advisory pre-announcement - https://phabricator.wikimedia.org/T298691 (10hashar) [19:09:22] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SecTeam-Processed, 10Security: 2022-01-12 Jenkins security advisory pre-announcement - https://phabricator.wikimedia.org/T298691 (10hashar) [19:16:39] (03CR) 1020after4: [C: 03+1] "I'd like to see a quick rollback feature and we will need to lock down the staging directory on deployment servers... but with those cavea" [tools/scap] - 10https://gerrit.wikimedia.org/r/753103 (owner: 10Ahmon Dancy) [19:32:40] I am going to upgrade the CI Jenkins for T298691 [19:32:40] T298691: 2022-01-12 Jenkins security advisory pre-announcement - https://phabricator.wikimedia.org/T298691 [19:55:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Radar), 10SRE, 10ops-codfw, 10serviceops-radar: contint2001.mgmt disappeared from Icinga - https://phabricator.wikimedia.org/T298861 (10Papaul) The IDRAC on this server needs reset. Please coordinate a day and time that is best for this... [20:05:44] 10Continuous-Integration-Infrastructure, 10Jenkins, 10SecTeam-Processed, 10Security: 2022-01-12 Jenkins security advisory pre-announcement - https://phabricator.wikimedia.org/T298691 (10hashar) 05Open→03Resolved a:03hashar The master node got renamed to `(built-in)` https://integration.wikimedia.org/... [20:13:19] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10dduvall) Thanks for looking into that, @daniel ! I always err greatly on the side of paranoia during train. :) [20:34:15] (03PS5) 10Ahmon Dancy: scap/git.py: Replace get_disclosable_head with git merge-base HEAD origin [tools/scap] - 10https://gerrit.wikimedia.org/r/753177 [20:59:48] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10dduvall) [21:04:35] (03CR) 1020after4: [C: 03+1] "Seems like an improvement" [tools/scap] - 10https://gerrit.wikimedia.org/r/753177 (owner: 10Ahmon Dancy) [21:04:54] 10Release-Engineering-Team (Next), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T293958 (10dduvall) >>! In T293958#7612230, @daniel wrote: > ##### Risky Patch! 🚂🔥 > > * **Change**: https://gerrit.wikimedia.org/r/c/mediawiki/cor... [21:05:55] (03PS6) 10Ahmon Dancy: scap/git.py: Replace get_disclosable_head with git merge-base HEAD origin [tools/scap] - 10https://gerrit.wikimedia.org/r/753177