[00:32:08] (03PS1) 10Krinkle: wmf-config-wg-vars: Switch API query to codesearch-backend [tools/code-utils] - 10https://gerrit.wikimedia.org/r/897393 (https://phabricator.wikimedia.org/T263354) [00:44:06] (03CR) 10Krinkle: [C: 03+2] wmf-config-wg-vars: Switch API query to codesearch-backend [tools/code-utils] - 10https://gerrit.wikimedia.org/r/897393 (https://phabricator.wikimedia.org/T263354) (owner: 10Krinkle) [00:44:38] (03Merged) 10jenkins-bot: wmf-config-wg-vars: Switch API query to codesearch-backend [tools/code-utils] - 10https://gerrit.wikimedia.org/r/897393 (https://phabricator.wikimedia.org/T263354) (owner: 10Krinkle) [09:32:04] (03CR) 10Lucas Werkmeister (WMDE): Allow more BuildKit frontend image names (v2) (031 comment) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/896056 (https://phabricator.wikimedia.org/T329553) (owner: 10Lucas Werkmeister (WMDE)) [10:40:05] 10GitLab (CI & Job Runners), 10Release-Engineering-Team (GitLab V: Event Horizon 🌄), 10serviceops-collab, 10serviceops-radar: Set up mirror of the docker hub registry for gitlab-runners - https://phabricator.wikimedia.org/T329679 (10Jelto) A docker registry container is running on `runner-1029` in WMCS now... [11:01:20] Monitoring for releases jenkins hosts is firing since 10 minutes: https://logstash.wikimedia.org/goto/b99320e7fc9f3231ee6ffc52a24024bc https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=releases [11:01:20] Jenkins returns 503 on codfw and 403 in eqiad for prometheus probes. [11:16:34] jelto: Jenkins in codfw is the spare one and the service there is not running [11:16:54] the active Jenkins in eqiad seems healthy and the web UI is reachable at 443 [11:17:20] hashar: you wouldn't happen to have been doing changes to any of the instances? [12:22:40] 10Project-Admins: Creation of a new project "All-and-every-Wikibooks" - https://phabricator.wikimedia.org/T330600 (10Lionel_Scheepmans) Hello, I am not informed about the habits of computer scientists and I trust JackPotte on this point. My interest, as a new administrator of fr.wikibook is to help the community... [12:31:53] jnuche: jelto: I have upgraded the Jenkins last week iirc [12:32:09] and on Friday March 3rd we rolled back the switchover [12:32:16] for release jenkins [12:32:26] so its primary is on eqiad [12:32:39] and the spare one on codfw has jenkins shut down iirc [12:35:33] and I don't think there are any Prometheus probe being active to monitor Jenkins [12:44:37] The probes were added in T327975. It seems to be an issue with the probes and prometheus and they were just enabled and installed at 10:45 utc. I'll check with observability and re-open T327975 if needed. Sorry for the false alert :) I just thought the probes worked fine until 10:45 [12:44:37] T327975: create blackbox::http monitoring for releases.wikimedia.org - https://phabricator.wikimedia.org/T327975 [14:18:11] jnuche: dancy: trying to update scap on deployment-prep, am I doing something wrong? https://phabricator.wikimedia.org/P45775 [14:28:09] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops-collab, 10Datacenter-Switchover: switch releases.wikimedia.org from eqiad to codfw - https://phabricator.wikimedia.org/T330960 (10hashar) >>! In T330960#8679878, @LSobanski wrote: > @hashar A follow up question, is there a pl... [14:44:43] taavi: Hmm.. We'll look into it. [14:54:18] (03CR) 10Hashar: Allow more BuildKit frontend image names (v2) (031 comment) [integration/pipelinelib] - 10https://gerrit.wikimedia.org/r/896056 (https://phabricator.wikimedia.org/T329553) (owner: 10Lucas Werkmeister (WMDE)) [15:17:40] (03PS1) 10Zoranzoki21: Zuul: Archive the GlobalCheckUser extension [integration/config] - 10https://gerrit.wikimedia.org/r/897908 (https://phabricator.wikimedia.org/T299287) [15:18:14] Project beta-code-update-eqiad build #434779: 15ABORTED in 3 min 23 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/434779/ [15:19:44] taavi: I fixed the problem in beta and upated scap to 4.45 [15:19:53] one thing though, scap now needs docker to self-update [15:20:13] I manually installed docker on the beta deployment server, but that should be added to the Puppet config [15:34:08] 10Gerrit, 10Abstract Wikipedia team, 10Tool-ducttape, 10WikiLambda: [wm-checks-api] support kindrobot - https://phabricator.wikimedia.org/T331651 (10Jdforrester-WMF) [15:39:45] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10Security-Team, 10serviceops-collab, 10SecTeam-Processed, 10Security: Address Gerrit WMCS instance authenticating against LDAP (breaching WMCS policy) - https://phabricator.wikimedia.org/T330312 (10sbassett) p:05Triage→03Medium [15:50:14] (03CR) 10Hashar: [C: 03+2] Zuul: Archive the GlobalCheckUser extension [integration/config] - 10https://gerrit.wikimedia.org/r/897908 (https://phabricator.wikimedia.org/T299287) (owner: 10Zoranzoki21) [15:51:27] (03Merged) 10jenkins-bot: Zuul: Archive the GlobalCheckUser extension [integration/config] - 10https://gerrit.wikimedia.org/r/897908 (https://phabricator.wikimedia.org/T299287) (owner: 10Zoranzoki21) [17:10:06] 10Scap: helm env vars not set when running sudo -u mwpresync scap stage-train - https://phabricator.wikimedia.org/T331479 (10dancy) 05Open→03Resolved scap 4.46.0 has been deployed which includes the fix. [17:41:01] !log gitlab: removing dzahn (mutante) as member of /repos to test notification behavior [17:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:43:31] thanks [17:44:39] !log Manually changed cloudmetrics1002 to cloudmetrics1003 on deployment-docker-cpjobqueue01 whilst debugging T326192 [17:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:44:42] T326192: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 [17:44:47] 10Beta-Cluster-Infrastructure, 10ChangeProp: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 (10Jdforrester-WMF) This is specified in `/etc/cpjobqueue/config.yaml` on disc on deployment-docker-cpjobqueue01; FWICT tha... [17:50:15] 10Phabricator (Upstream), 10Upstream: Phabricator dashboard tab panel loads two panels' content - https://phabricator.wikimedia.org/T328200 (10Dzahn) >>! In T328200#8685619, @Aklapper wrote: >>>! In T328200#8683902, @Dzahn wrote: >> No, I don't see "Good first newcomer task" link at all > > In that case you h... [17:52:10] 10Beta-Cluster-Infrastructure, 10ChangeProp: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 (10Jdforrester-WMF) I'd also note that Beta Cluster is running `docker-registry.wikimedia.org/wikimedia/mediawiki-services-... [17:57:00] !log Moved deployment-docker-cpjobqueue01 from v0.9.5 to v0.10.5 of change-prop whilst debugging T326192 [17:57:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:57:03] T326192: changeprop-jobqueue@deployment-prep fails with: getaddrinfo ENOTFOUND cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326192 [18:22:27] 10Deployments, 10Phabricator, 10Release-Engineering-Team, 10User-brennen: Phabricator deployment 2023-03-14 - https://phabricator.wikimedia.org/T331915 (10brennen) [18:23:44] 10Deployments, 10Phabricator, 10Release-Engineering-Team, 10User-brennen: Phabricator deployment 2023-03-14 - https://phabricator.wikimedia.org/T331915 (10brennen) 05Open→03In progress p:05Triage→03High [18:32:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops-collab, 10Datacenter-Switchover: switch releases.wikimedia.org from eqiad to codfw - https://phabricator.wikimedia.org/T330960 (10Dzahn) for the record, it would not be hard (on the ATS side) to seperate releases.wikimedia.or... [18:45:59] (03PS1) 10Urbanecm: Zuul: Add Msz2001 to the CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/897963 [18:47:53] (03PS2) 10Urbanecm: Zuul: Add Msz2001 to the CI allowlist [integration/config] - 10https://gerrit.wikimedia.org/r/897963 [20:13:17] 10Phabricator, 10PM, 10Patch-For-Review: Merge the Phabricator Priority values "Low" and "Lowest" - https://phabricator.wikimedia.org/T228759 (10Xaosflux) Frankly I don't see much difference between "LOWEST" and "untriaged, opened 10 years ago"..... there is a much bigger expectation management problem for p... [20:17:30] 10Gerrit, 10Upstream: Gerrit commit-msg hook scp: subsystem request failed on channel 0 - https://phabricator.wikimedia.org/T331923 (10thcipriani) [20:19:22] 10Gerrit, 10Upstream: Gerrit commit-msg hook scp: subsystem request failed on channel 0 - https://phabricator.wikimedia.org/T331923 (10thcipriani) [21:06:27] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10sbassett) [21:11:19] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10Dzahn) @Aklapper Looks like this could solve T306708 [21:12:27] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10sbassett) >>! In T331928#8689597, @Dzahn wrote: > @Aklapper Looks like this could solve T306708 Help but probably not solve. [21:13:40] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10Dzahn) On T306708#8661705 you said your team can't handle 2fa reset requests while on this ticket you request admin access to handle 2fa requests. This is confusing me a bit. [21:18:35] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10sbassett) >>! In T331928#8689603, @Dzahn wrote: > On T306708#8661705 you said your team can't handle 2fa reset requests while on this ticket you request admin access to handle 2fa r... [21:25:03] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10serviceops-collab, 10Datacenter-Switchover, 10Patch-For-Review: switch releases.wikimedia.org from eqiad to codfw - https://phabricator.wikimedia.org/T330960 (10Dzahn) ^ We need to limit monitoring to the active server for now to avoi... [21:32:51] 10Phabricator, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10Aklapper) Handling Phab 2FA reset requests (T306708) is about a better process to verify request, and the resetting itself requires shell access. This ticket is about allow more Sec... [21:34:30] 10Phabricator, 10Release-Engineering-Team, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10Aklapper) AFAIK there is no defined process how to handle admin requests, thus adding RelEng for input/approval [21:37:12] 10Phabricator, 10Release-Engineering-Team, 10Security-Team: Phabricator Admin Access Request for Scott Bassett - https://phabricator.wikimedia.org/T331928 (10Dzahn) Gotcha, thanks @sbassett and @Aklapper for the clarification. [21:38:37] (Queue (Jenkins jobs + Zuul functions) alert) firing: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [21:43:37] (Queue (Jenkins jobs + Zuul functions) alert) resolved: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [22:28:39] (Queue (Jenkins jobs + Zuul functions) alert) firing: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [22:31:11] 10Gerrit, 10Upstream: OpenSSH 9 scp does not work with Gerrit due to lack of sftp subsystem - https://phabricator.wikimedia.org/T330740 (10hashar) https://gerrit-review.googlesource.com/c/plugins/download-commands/+/359823 got merged. The plugin is bundled in Gerrit, so I guess that will be included in the nex... [22:38:05] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [400.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [22:45:19] 10Phabricator, 10serviceops-collab: create aphlict2001 (Phabricator realtime notifications codfw) - https://phabricator.wikimedia.org/T322369 (10eoghan) It looks like the dummy config variables for the phabricator config at least got us as far as a successful puppet deployment, including a scap deploy of phabr... [22:48:39] (Queue (Jenkins jobs + Zuul functions) alert) resolved: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [22:51:09] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [200.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/d/000000322/zuul-gearman?orgId=1&viewPanel=10 [23:31:01] 10Gerrit, 10Upstream: Gerrit commit-msg hook scp: subsystem request failed on channel 0 - https://phabricator.wikimedia.org/T331923 (10thcipriani) [23:31:12] 10Gerrit, 10Upstream: OpenSSH 9 scp does not work with Gerrit due to lack of sftp subsystem - https://phabricator.wikimedia.org/T330740 (10thcipriani) [23:38:35] 10MediaWiki-Releasing, 10MW-1.40-release: Branch REL1_40 for MediaWiki and all extensions and skins - https://phabricator.wikimedia.org/T329079 (10Jdforrester-WMF) a:03Jdforrester-WMF [23:46:07] 10GitLab (Integrations): https://gitlab.wikimedia.org/repos/releng/gitlab-phabricator needs a LICENSE file - https://phabricator.wikimedia.org/T331943 (10bd808)