[01:14:54] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:15:58] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:44:14] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10Ladsgroup) >>! In T296098#7518943, @daniel wrote: > > Hold on, now I'm confused. My take it that wmf.9 was rolled back because... [07:58:24] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Majavah) >>! In T296125#7518549, @AlexisJazz wrote: >>>! In T296125#75183... [08:53:19] hey releng, do we know what's the fate of wmf.9 train? Is it likely it will be re-deployed today, or should i not count with that? πŸ™‚ Thanks! [08:53:58] hashar: I just enabled CSP on both GitLab instances. Sorry for the delay, I was a bit busy last week. Feel free to ping/tag me in case anything comes up. For me GitLab instances look fine [08:54:26] (cc hashar who offered his hands at https://phabricator.wikimedia.org/T293950#7517015) [09:00:25] bonjour [09:00:50] jelto: awesome thank you! I will check the reports in Kibana and follow up from there [09:01:18] jelto: and no worries about the delay :-] [09:01:25] urbanecm: I am there and warming up ;) [09:01:53] that's great hashar :). [09:01:56] hashar: thank you too :) [09:17:18] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10hashar) [10:10:52] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10hashar) For CentralNotice not showing up on Mobile web view (T296077) it is scheduled for backport at 19:00 UTC, I have ping... [10:39:22] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [11:39:57] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:11:55] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10hashar) [12:19:51] 10Beta-Cluster-Infrastructure, 10Commons: Commons BETA: SSL peer certificate or SSH remote key was not OK - https://phabricator.wikimedia.org/T296185 (10Jeff_G) [12:22:35] 10Beta-Cluster-Infrastructure, 10Commons: Commons BETA: SSL peer certificate or SSH remote key was not OK - https://phabricator.wikimedia.org/T296185 (10Lucas_Werkmeister_WMDE) [12:23:06] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Lucas_Werkmeister_WMDE) [12:25:18] 10Project-Admins, 10Commons: Create project "Commons" to supersede T39883 - https://phabricator.wikimedia.org/T86444 (10FriedrickMILBarbarossa) [14:00:35] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10Ladsgroup) I don't see a massive increase in memory after rolling out of wmf.9 but it might take some time to show itself [14:09:49] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10akosiaris) >>! In T296098#7520236, @Ladsgroup wrote: > I don't see a massive increase in memory after rolling out of wmf.9 but... [14:35:37] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10Ladsgroup) The memory usage of Mysql parts of a request has been reduced from 19MB to 6MB (https://performance.wikimedia.org/xh... [14:48:31] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10hashar) On the [[ https://grafana-rw.wikimedia.org/d/000000607/cluster-overview | Grafana cluster-overview dashboard ]] I have... [14:55:18] 10Beta-Cluster-Infrastructure: *.beta.wmflabs.org Certificate has expired (November 2021 edition) - https://phabricator.wikimedia.org/T296000 (10Jdforrester-WMF) >>! In T296000#7514703, @Urbanecm wrote: > After workarounding that issue (by copying old cert versions in the nearly-empty new directory), it complain... [15:12:48] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Jdforrester-WMF) p:05Triageβ†’03Unbreak! Within the context of the Beta... [15:16:35] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Urbanecm) >>! In T296125#7520450, @Jdforrester-WMF wrote: > Within the co... [15:18:44] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Jdforrester-WMF) [15:38:49] !log update wmf-certificates on deployment-mediawiki11 T296125 [15:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:38:52] T296125: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 [15:40:06] could someone re-enable the beta-scap-sync-world job? https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/ [15:40:58] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Majavah) >>! In T296125#7520534, @Stashbot wrote: > {nav icon=file, name=... [15:41:10] 08:36 disabled beta-scap-sync-world [15:41:20] that was on saturday, I am wondering why [15:41:40] 08:25:54 Fatal error: Uncaught ConfigException: Failed to load configuration from etcd: (curl error: 60) SSL peer certificate or SSH remote key was not OK in /srv/mediawiki-staging/php-master/includes/config/EtcdConfig.php:205 [15:41:41] ouch [15:42:01] trying to reach etcd apparently [15:42:05] hashar: T296125 [15:43:52] !log Enabling beta-scap-sync-world # T296125 [15:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:43:55] T296125: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 [15:44:53] majavah: it is running at https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/28004/console [15:58:26] majavah: looks like it is still broken [16:00:23] I'll fix that in a moment [16:00:37] the package just needs upgrading everywhere, I only did deploy01 and mediawiki11 yet [16:00:55] Project beta-scap-sync-world build #28004: 04STILL FAILING in 16 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/28004/ [16:05:01] hashar: fixed probably [16:06:15] kostajh: I finally start doing those Unit > unit reviews and at the time I look at Lingo it get abandoned :]] [16:12:10] Yippee, build fixed! [16:12:11] Project beta-scap-sync-world build #28005: 09FIXED in 9 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/28005/ [16:13:51] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Majavah) [16:14:35] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Scap, 10Infrastructure-Foundations, 10Puppet: Fatal error: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T296125 (10Majavah) 05Openβ†’03Resolved a:03Majavah [16:15:13] majavah: \o/ [16:15:17] congratulations [16:15:37] \o/ thank you! [16:46:31] 10Release-Engineering-Team, 10Security-Team: Determine a better way to track Wikimedia production security patches - https://phabricator.wikimedia.org/T295925 (10sbassett) p:05Triageβ†’03Low [17:14:52] extremely minor complaint: can we delete the `main` branch from Gerrit again? [17:15:12] ever since it was created (the commit it points to is from november 11), `git switch ma[TAB]` no longer autocompletes to master :S [17:15:34] I’m all for renaming it at some point (probably during the GitLab migration?), but it doesn’t look like it’s happening yet [17:15:39] and in the meantime it’s ever so slightly annoying ^^ [17:15:54] (this is in MediaWiki core, I should’ve said) [17:20:15] Lucas_WMDE: hehe, just found this: "Because this entails quite a bit of work and potential breakage in many overlapping systems, we are going to make this change as we migrate projects to GitLab, and not before." [17:20:25] leave a comment on https://phabricator.wikimedia.org/T281593 I'd say [17:21:03] I think I’ll create a subtask to avoid cluttering that one [17:21:05] thanks! [17:29:28] 10GitLab (Project Migration), 10Release-Engineering-Team (Next), 10User-brennen, 10Voice & Tone: Delete temporary main branch in mediawiki/core.git until rename from master to main is ready - https://phabricator.wikimedia.org/T296205 (10Lucas_Werkmeister_WMDE) [17:29:31] created ^ [17:31:41] cool! yep [17:44:58] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:16:00] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Release, 10Train Deployments: 1.38.0-wmf.9 seems to have introduced a memory leak - https://phabricator.wikimedia.org/T296098 (10akosiaris) For what is worth, I don't see that pattern. Memory usage increases indeed, but by the usual rates and patterns it h... [18:46:02] RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:29:59] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10SecTeam-Processed, 10Security: Research and design basic ci processing scripts (to exit 1 for tools that report errors and generate report artifacts) - https://phabricator.wikimedia.org/T294307 (10sbassett) 05Openβ†’03Invalid I'm g... [19:30:01] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10Security: Create initial proof of concept application security pipeline repository - https://phabricator.wikimedia.org/T289293 (10sbassett) [19:30:13] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10SecTeam-Processed, 10Security: Research and design basic ci processing scripts (to exit 1 for tools that report errors and generate report artifacts) - https://phabricator.wikimedia.org/T294307 (10sbassett) a:05Mstylesβ†’03None [19:30:33] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10Security: Finish node/npm initial tool ci templates for npm audit (Node 10, 12, 14) - https://phabricator.wikimedia.org/T294309 (10sbassett) a:05Mstylesβ†’03sbassett [19:30:47] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10Security, 10user-sbassett: Finish node/npm initial tool ci templates for npm audit (Node 10, 12, 14) - https://phabricator.wikimedia.org/T294309 (10sbassett) [19:31:49] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10Security, 10user-sbassett: Finish for node/npm initial tool ci templates for npm outdated (Node 10, 12, 14) - https://phabricator.wikimedia.org/T294310 (10sbassett) a:05Mstylesβ†’03sbassett [19:42:46] 10RelEng-Archive-FY201718-Q1, 10SRE, 10Patch-For-Review: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655 (10Xqt) [20:04:18] jeena: dduvall: hi! so I pushed 1.38.0-wmf.9 earlier today and so far memory usage looks fine ;) [20:04:37] :) [20:04:39] I haven't triage any of the log messages though, will look at doing a few tomorrow [20:34:28] 10Release-Engineering-Team (Done by Wed 24 Nov πŸ”₯), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T293950 (10Umherirrender) [21:51:16] 10GitLab (CI & Job Runners), 10Security Team AppSec, 10Security-Team, 10Security: Investigate SAST template options now included with Gitlab CE and formulate use-cases and documentation - https://phabricator.wikimedia.org/T294312 (10sbassett) a:05Reedyβ†’03mmartorana [21:57:38] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Done by Thu 04 Nov 🧟), 10Security-Team, 10User-brennen: Enable runners for projects under gitlab.wikimedia.org security group - https://phabricator.wikimedia.org/T294050 (10sbassett) 05Resolvedβ†’03Open Hey @brennen et al- Did this get reverted?... [22:12:03] 10Phabricator, 10Project-Admins: Unarchive WMUA-tech project and create a custom security policy for its members - https://phabricator.wikimedia.org/T286866 (10mmodell) Here is a custom form for WMUA-Tech: https://phabricator.wikimedia.org/maniphest/task/edit/form/107/ [22:18:38] 10Phabricator, 10Developer Productivity: Implement "Labels" for workboard cards - https://phabricator.wikimedia.org/T261498 (10mmodell) 05Openβ†’03Declined Probably not going to have time to work on this. [22:32:53] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (Done by Thu 04 Nov 🧟), 10Security-Team, 10User-brennen: Enable runners for projects under gitlab.wikimedia.org security group - https://phabricator.wikimedia.org/T294050 (10brennen) What project is this for? Everything under `/repos/security` should... [23:51:02] PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook