[00:25:29] (03CR) 10Arlolra: Tweak scoring to not flag pages with > 99.99% identical rendering (031 comment) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/724158 (owner: 10Subramanya Sastry) [00:38:12] (03CR) 10Subramanya Sastry: Tweak scoring to not flag pages with > 99.99% identical rendering (031 comment) [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/724158 (owner: 10Subramanya Sastry) [07:07:02] As of yesterday my giltab CIs are unabale to claim workers for Pipelines is this somehow related to https://phabricator.wikimedia.org/T292094? I'm currently doing some PoC work that I use hoping to base upon Gitlab's CI, instead of Github. Would it be possible to include me as a trusted contributor (I'm employed by WMF)? [08:06:11] 10Release-Engineering-Team (Doing), 10Security-Team, 10GitLab (CI & Job Runners), 10User-brennen: Limit GitLab shared runners to trusted contributors - https://phabricator.wikimedia.org/T292094 (10brennen) > That default sure seems backwards though. Yeah, agreed. I'd much prefer a fail closed kind of situ... [08:39:47] gmodena: I don't think there's a mechanism to do that yet, registration is now just open for everyone so they needed to get something done before someone starts mining crypto there [08:48:46] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10GitLab, 10dev-images: Add pipeline to build and publish images on merge request - https://phabricator.wikimedia.org/T292151 (10kostajh) [09:14:25] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10Majavah) [09:14:42] majavah: have you got a trace for https://phabricator.wikimedia.org/T292153 [09:15:24] I don't get a fatal but I get different results on en v it [09:15:28] it is now group1 [09:16:42] Lucas_WMDE: I'm confused [09:17:01] I see what that task would indicate is expected on itwiki but it works fine on en [09:17:05] I get a fatal nowhere [09:17:19] majavah: don't think it's a blocker [09:17:33] majavah: the logstash entry for that reqId indicates it was on wmf.1, not .2 [09:18:28] Lucas_WMDE: can you check the url too [09:18:36] I can't reproduce with the one on the task [09:18:40] you mean, any requests for that URL? [09:18:44] That doesn't fatal for me [09:18:53] Lucas_WMDE: the url in the task loads fine anyway [09:19:06] * RhinosF1 does not like IN filing tasks [09:19:08] still fatals for me [09:19:26] but Special:Version also claims wmf.1 [09:19:38] Lucas_WMDE: yeah enwiki is [09:20:07] https://en.m.wikipedia.org/wiki/Special:Contributions/101.99.60.0/19?uselang=en loads perfectly fine [09:20:21] ? [09:20:23] Ah fails logged out [09:20:24] not for me [09:20:29] Anyway it's not a blocker [09:20:43] I'm going to merge with the task you linked [09:20:47] yeah, I agree [09:22:02] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10RhinosF1) [09:22:10] Removed as blocker and merged [09:22:40] Not sure why logged in I see different to logged out but the error gets nicer on .2 as expected [09:22:59] though I wonder if there’s a reason why the fix for that issue wasn’t backported to wmf.1 [09:23:15] it seems straightforward enough [09:23:21] Lucas_WMDE: no clue [09:23:24] Have to drop though [09:23:27] ok [09:23:40] College lesson, will have a break in an hour [10:10:17] I’ve just gotten practically-instant “unable to be automatically merged” errors in CI, could there be an issue with Zuul? [10:10:25] *in unrelated repositories, I forgot to add [10:10:46] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MathSearch/+/724951 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MobileFrontend/+/724947 [10:11:50] hm, but the MobileFrontend change is actually running in the Zuul dashboard [10:22:17] same at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/724016 despite no actual git conflict as far as I can tell on the command line [10:24:12] and at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/724939, just after the change was successfully merged o_O [10:31:55] https://gerrit.wikimedia.org/r/q/commentby:%2522jenkins-bot%2522 seems to show more examples, at least the first two I found also have the same comment [10:32:10] I’ll open a Phabricator task, it seems clear enough to me that I’m not imagining the issue [10:38:43] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10Lucas_Werkmeister_WMDE) [10:39:01] ^, I’ll add examples in a second [10:39:52] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10Lucas_Werkmeister_WMDE) Examples: - [MathSearch](https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MathSearch/+/724951) - [MobileFront... [10:41:33] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10jcrespo) It happened to me on puppet repo temporarily for https://gerrit.wikimedia.org/r/c/operations/puppet/+/724950 which is weird becaus... [10:42:21] 10Continuous-Integration-Infrastructure, 10Jenkins, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10Lucas_Werkmeister_WMDE) More examples from [commentby:jenkins-bot](https://gerrit.wikimedia.org/r/q/commentby:%2522jenkins-bot%2522): - [M... [10:43:20] I’m not seeing any changes that look related in SAL [11:49:08] 10Scap, 10serviceops, 10Patch-For-Review: Scap error when deploying kartotherian - https://phabricator.wikimedia.org/T291990 (10Jgiannelos) I just run a deployment in one of the maps nodes and looks like it works with 3.17.1. Thanks @jijiki. [11:49:18] 10Release-Engineering-Team, 10Scap, 10serviceops, 10Patch-For-Review: scap's canary check gives confusing logstash link - https://phabricator.wikimedia.org/T291870 (10hashar) The [[ https://logstash.wikimedia.org/app/dashboards#/view/1c3a4d80-35c2-11e7-b186-d1bc9cbdde4c | scap canary dashboard ]] is essent... [11:50:25] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: Deploy Scap version 4.0.1 - https://phabricator.wikimedia.org/T291095 (10jijiki) @dancy it would be lovely if we can speed this up, right now we have `deploy1002` and `maps*` on version 3.17.1, and the rest on version 4.0.0. [11:53:36] (03CR) 10Majavah: [C: 03+2] fix: template.py Python 3 fallout [tools/scap] - 10https://gerrit.wikimedia.org/r/724527 (https://phabricator.wikimedia.org/T291990) (owner: 10Ahmon Dancy) [11:54:17] (03Merged) 10jenkins-bot: fix: template.py Python 3 fallout [tools/scap] - 10https://gerrit.wikimedia.org/r/724527 (https://phabricator.wikimedia.org/T291990) (owner: 10Ahmon Dancy) [11:54:41] 10Release-Engineering-Team (Doing), 10Scap, 10serviceops: Deploy Scap version 4.0.1 - https://phabricator.wikimedia.org/T291095 (10hashar) There are some more Python 3 related issues that need to be addressed such as {T291990} - https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/724527 [11:56:06] Lucas_WMDE: will check those Jenkins-bot merge failure in a few [11:56:50] ok thanks [11:58:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10hashar) a:03hashar If there is an issue that is in the git repositories used by the `zuul-merger` service. It runs on co... [12:50:26] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10hashar) From `/var/log/zuul/merger.log`, the zuul-merger were unable to fetch the repositories from Gerrit yielding: ` Git... [12:50:44] Lucas_WMDE: looks like it was transient from 9:50 to 10:30 UTC (11:50 / 12:30 CEST) [13:04:57] ok [13:05:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Zuul: jenkins-bot reports many merge failures for no apparent reason - https://phabricator.wikimedia.org/T292167 (10hashar) 05Open→03Resolved That is from the Gerrit side. zuul-merger connects to Gerrit over ssh as the user `jenkins-b... [13:06:40] Lucas_WMDE: essentially CI tries to do more than 4 ssh connections at the same time. Maybe some session did not get properly terminated on the Gerrit side due to whatever reason and eventually timed out freeing the slot [13:06:52] I don't have a good explanation, I am blaming that to cosmic rays [13:07:20] ah, okay [13:28:52] Hello, will someone be doing the Morning Backport Window today? [13:29:02] https://wikitech.wikimedia.org/wiki/Deployments#Thursday,_September_30 [13:29:19] Yesterday there wasn't a deployer (until dan cy volunteered), and my patches weren't deployed [13:36:41] That window was nearly 2 hours ago... [13:36:56] oh, titles [13:36:58] silly thing [13:40:01] (03CR) 1020after4: [C: 03+1] DNM: deploy-promote: Use scap wikiversions-inuse --staging [tools/release] - 10https://gerrit.wikimedia.org/r/685024 (owner: 10Ahmon Dancy) [13:41:32] yeah 'morning' in sf time i guess (it won't be my morning either) [13:42:47] thcipriani: maybe you know? I'm not sure if the deployment calendar is accurate these days [13:44:56] Reedy, https://phabricator.wikimedia.org/T290859 [14:03:33] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10ContentSecurityPolicy, 10GitLab (Administration, Settings & Policy), and 3 others: Define a Content Security Policy for GitLab - https://phabricator.wikimedia.org/T285363 (10hashar) Thank you, that is a great summary. My digging so far is... [14:14:57] ottomata: the Deployments calendar is definitely maintained. Releng owns the process and each team can open windows, adjusts list of deployers etc [14:16:12] for the Backports window, I don't think releng explicitly run them. I have run the European friendly one in ages, it is run by non releng people. For the "Morning" one, I have no idea who is regularly present to process them [14:16:51] I have poked our channel about it [14:25:35] Project beta-scap-sync-world build #22300: 04FAILURE in 7 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/22300/ [14:27:35] hashar: thank you [14:28:16] I guess i meant the backport windows on the deployment calendar might not be accurate (really just wanted to make sure i had a deployer today!) [14:31:05] Project beta-scap-sync-world build #22301: 04STILL FAILING in 3 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/22301/ [14:37:18] Yippee, build fixed! [14:37:19] Project beta-scap-sync-world build #22302: 09FIXED in 2 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/22302/ [14:51:20] hey ottomata sorry I missed your ping yesterday (meeting marathon) the calendar is up-to-date (although I haven't explicitly verified with deployers in a while). Backports are always best-effort by volunteers as its a significant time commitment. We have been trying to get more volunteers/train folks to do backports 2x per week: https://wikitech.wikimedia.org/wiki/Deployments/Training [14:51:58] ^ useful if you don't want to get stuck trying to find a deployer :) [14:52:49] also: re:window names: it's a work in progress: https://gerrit.wikimedia.org/r/c/mediawiki/tools/release/+/720826 [14:53:39] ah ok, didn't realize that it was volunteer/best effort [14:53:40] thanks [14:53:57] <3 [15:01:21] 10Release-Engineering-Team (Radar), 10Infrastructure-Foundations, 10GitLab (Infrastructure), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10Jelto) The preparation of GitLab puppet code is mostly done. I would like to deploy https://gerrit.... [15:14:15] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10ContentSecurityPolicy, 10GitLab (Administration, Settings & Policy), and 4 others: Define a Content Security Policy for GitLab - https://phabricator.wikimedia.org/T285363 (10sbassett) @hashar - Sounds good, I left a few comments on the cha... [15:19:36] (03CR) 10Thcipriani: "Now I'm seeing:" [tools/scap] - 10https://gerrit.wikimedia.org/r/724527 (https://phabricator.wikimedia.org/T291990) (owner: 10Ahmon Dancy) [15:59:08] (03PS2) 10Thcipriani: fix: accuracy of deployment window names [tools/release] - 10https://gerrit.wikimedia.org/r/720826 (https://phabricator.wikimedia.org/T290859) [15:59:34] (03CR) 10Thcipriani: fix: accuracy of deployment window names (034 comments) [tools/release] - 10https://gerrit.wikimedia.org/r/720826 (https://phabricator.wikimedia.org/T290859) (owner: 10Thcipriani) [16:38:01] 10Release-Engineering-Team (Next), 10Release Pipeline, 10GitLab (CI & Job Runners), 10cloud-services-team (dcaro): Proof of concept for Buildpacks as WMF image build tool in GitLab - https://phabricator.wikimedia.org/T291017 (10dcaro) [17:08:13] brennen: hi! Do you know where I can find the gitlab version number we are using? I have no idea how we deploy it ;) [17:09:36] hashar: currently @ 14.2.3. it's listed in admin area. i will make your user an admin. [17:10:54] hashar: while i'm thinking about it, if you need to change any settings that are stored in the db for the CSP work, let me know - there is a script at https://gitlab.wikimedia.org/releng/gitlab-settings for doing that, rather than manually in the GUI. [17:11:07] (03CR) 1020after4: [C: 03+1] fix: accuracy of deployment window names [tools/release] - 10https://gerrit.wikimedia.org/r/720826 (https://phabricator.wikimedia.org/T290859) (owner: 10Thcipriani) [17:12:13] (03CR) 1020after4: [C: 03+1] Fix build-mw-image-loop.py startup race [tools/train-dev] - 10https://gerrit.wikimedia.org/r/724168 (owner: 10Ahmon Dancy) [17:12:58] probably also need to coordinate https://gerrit.wikimedia.org/r/c/operations/puppet/+/725012/ with jelto's https://gerrit.wikimedia.org/r/c/operations/puppet/+/724430 [17:14:23] (03CR) 1020after4: [C: 03+1] Zuul: Drop CI support for REL1_31 branch [integration/config] - 10https://gerrit.wikimedia.org/r/683031 (https://phabricator.wikimedia.org/T281294) (owner: 10Jforrester) [17:26:26] brennen: was doing something else sorry. I proposed a change in puppet to update some gitlab_settings[] hash [17:26:32] but those are stored in the db? [17:27:50] hashar: there is configuration both in /etc/gitlab/gitlab.rb (which gets converted to yaml) and some settings stored in database. [17:29:15] we have been managing gitlab.rb with https://gerrit.wikimedia.org/r/plugins/gitiles/operations/gitlab-ansible/ but jelto is in the process of moving that entirely to puppet so we can get rid of the ansible repo [17:29:38] I hacked some file in Puppet, guess he will indicate me where to write the change ;) [17:30:02] for Content-Security-Policy it is a bit of a mess since the set of rules is still being polished by upstream [17:44:53] Reedy: +1 to the replace text uploads [17:48:56] 10MediaWiki-Releasing, 10Security: Write and send release announcements for MediaWiki 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285408 (10Reedy) 05In progress→03Resolved p:05Triage→03Medium [17:49:12] 10MediaWiki-Releasing, 10Security: Write and send release announcements for MediaWiki 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285408 (10Reedy) [17:50:32] 10MediaWiki-Releasing, 10Security: Update Wikidata Q83 - https://phabricator.wikimedia.org/T285412 (10Reedy) 05Open→03Resolved p:05Triage→03Medium a:03Reedy [17:50:35] 10MediaWiki-Releasing, 10Security: Update Wikidata Q83 - https://phabricator.wikimedia.org/T285412 (10Reedy) [17:55:44] 10MediaWiki-Releasing, 10Security: Update onwiki news and Module:Version - https://phabricator.wikimedia.org/T285410 (10Reedy) 05Open→03Resolved p:05Triage→03Medium a:03Reedy [17:55:50] 10MediaWiki-Releasing, 10Security: Update onwiki news and Module:Version - https://phabricator.wikimedia.org/T285410 (10Reedy) [17:58:10] (03CR) 10Rajeshjntu: "testing review" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/724978 (owner: 10Rajeshjntu) [17:59:11] (03Abandoned) 10Rajeshjntu: de [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/724978 (owner: 10Rajeshjntu) [18:02:22] 10Release-Engineering-Team (Yak Shaving 🐃🪒), 10dev-images, 10GitLab (CI & Job Runners): Add pipeline to build and publish images on merge request - https://phabricator.wikimedia.org/T292151 (10brennen) [18:05:06] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10ContentSecurityPolicy, 10GitLab (Administration, Settings & Policy), and 4 others: Define a Content Security Policy for GitLab - https://phabricator.wikimedia.org/T285363 (10brennen) cc: @Jelto [18:05:47] (03CR) 10Rajeshjntu: "test" [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 (owner: 10Rajeshjntu) [18:06:30] (03Abandoned) 10Rajeshjntu: test [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 (owner: 10Rajeshjntu) [18:06:34] (03Restored) 10Rajeshjntu: test [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 (owner: 10Rajeshjntu) [18:07:04] (03Abandoned) 10Rajeshjntu: test [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 (owner: 10Rajeshjntu) [18:10:08] (03Restored) 10Rajeshjntu: test [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 (owner: 10Rajeshjntu) [18:13:45] (03PS2) 10Rajeshjntu: test [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724980 [18:17:03] (03CR) 10Rajeshjntu: "This change is ready for review." [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724981 (owner: 10Rajeshjntu) [18:17:52] (03Abandoned) 10Rajeshjntu: first time testing [integration/gearman-java] - 10https://gerrit.wikimedia.org/r/724981 (owner: 10Rajeshjntu) [18:19:24] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Limit GitLab shared runners to images from Wikimedia Docker registry - https://phabricator.wikimedia.org/T291978 (10Krinkle) As I understand it, the status quo with regards to th... [18:22:21] 10Release-Engineering-Team (Radar), 10GitLab (Administration, Settings & Policy), 10Upstream, 10User-brennen: Look into whether GitLab time tracking can be disabled - https://phabricator.wikimedia.org/T264230 (10brennen) p:05Triage→03Lowest A look at the issues and docs linked above doesn't suggest tha... [18:23:28] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Drop CI for REL1_31 branch once it's EOL - https://phabricator.wikimedia.org/T281294 (10Reedy) [18:23:30] 10Project-Admins: Archive MSSQL project - https://phabricator.wikimedia.org/T230583 (10Reedy) [18:23:32] 10Project-Admins: Archive Oracle Database project - https://phabricator.wikimedia.org/T230582 (10Reedy) [18:23:46] 10MediaWiki-Releasing, 10Documentation: Formally EOL REL1_31 - https://phabricator.wikimedia.org/T279858 (10Reedy) 05In progress→03Resolved Done... Probably some further #documentation updates needed on mediawiki.org (see {T2001}), but otherwise done. [18:29:32] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Limit GitLab shared runners to images from Wikimedia Docker registry - https://phabricator.wikimedia.org/T291978 (10Legoktm) >>! In T291978#7392346, @Krinkle wrote: > But, I woul... [18:31:35] 10Release-Engineering-Team (Done by Wed 06 Oct), 10GitLab (Administration, Settings & Policy), 10User-brennen: Investigate whether issues, operations, wikis, etc. can be disabled globally on GitLab - https://phabricator.wikimedia.org/T264231 (10brennen) > The sidebar distraction might be something to forward... [18:35:09] 10MediaWiki-Releasing, 10Security: Tag 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285409 (10Reedy) 05Open→03Resolved ` $ git push --tags Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Delta compression using up to 2 threads Compressing objects: 100% (3/3), done. Writing o... [18:35:14] 10MediaWiki-Releasing, 10Security: Tag 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285409 (10Reedy) [18:36:38] 10MediaWiki-Releasing, 10Security: Tracking bug for MediaWiki 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285405 (10Reedy) [18:36:57] 10MediaWiki-Releasing, 10Security: Tracking bug for MediaWiki 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285405 (10Reedy) [18:39:55] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:43:01] legoktm: are you takling about Jenkins CI or GitLab CI when you say "apt-get" is allowed? [18:43:12] GitLab CI [18:43:14] I was referring to status quo, which does not include GitLab [18:43:47] whatever we happened to configure right now is Math.random() as far as Im concerned, even if we may end up somewhere close to it [18:44:04] The ticket is to decide the future I suppose, right? [18:44:30] 10MediaWiki-Releasing, 10Security: Update onwiki release notes for 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285411 (10Reedy) [18:45:38] the question on that bug is whether GitLab CI should be restricted to our docker-registry or any public docker image (or maybe just dockerhub). AIUI root access in containers is allowed either way (and intended to be that way) [18:46:29] OK. I guess I don't have enough information to know whether that is intended. [18:46:51] 10MediaWiki-Releasing, 10Security: Update HISTORY in master after 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285413 (10Reedy) [18:46:52] That suggests to me that either 1) a major decision was made recently or 2) we were okay with it all these years but never bothered to enable it? [18:47:00] I'm basing that on https://phabricator.wikimedia.org/T291978#7389819 [18:47:19] a decision about what? [18:47:24] 10MediaWiki-Releasing, 10Patch-For-Review, 10Security: Update HISTORY in master after 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285413 (10Reedy) p:05Triage→03Medium a:03Reedy [18:47:37] allowing root access in containers? [18:47:54] I think that's just how GitLab CI works...at least that's how I've always used it on gitlab.com [18:50:01] Right, but presumably GitLab.com doesn't run their user-provided root-enabled containers directly on hardware they care about. They presumably just some kind of disposable VM pool. [18:50:20] as Travis and GitHub do (despite Travis's short-lived "raw non-root container experment" which didn't last long) [18:53:00] I assume so, yeah [18:54:21] gitlab.com uses google cloud engine. They spawn VM using the cloud provider API which run a single job and are dropped once the job has completed [18:54:49] ah https://about.gitlab.com/handbook/engineering/infrastructure/production/architecture/ci-architecture.html#ci-general-arch [18:54:58] from their "internal" doc [18:55:46] they are the autoscaled-machines [18:55:47] 10Release-Engineering-Team (Done by Wed 06 Oct), 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Limit GitLab shared runners to images from Wikimedia Docker registry - https://phabricator.wikimedia.org/T291978 (10Krinkle) >>! In T291978#7392436, @Legoktm wrote: >>>! In T2919... [18:55:56] OK, so the current security model we have for our gitlab instance is not supported by upstream GitLab. [18:56:35] might be a false alarm but every other run in sqlite is red https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-sqlite-php72-docker/ [18:56:36] I'm happy to move my comment to a different task if there's more general conversation in progress about this. [18:56:44] the gitlab runner should be able to spawn VM ? [18:57:04] I haven't looked at its doc in a while though [18:58:35] right now it's runing root-enabled containers directly on our "persistent" VMs, which I'm guessing is not what we want. [18:59:00] I wrongly thought that they were in the `integration` project where we have the docpublishing stuff, but I see that's not the case. Only addshore's instance is there. The others are in a separate WMCS project. [18:59:37] so that isolates the risk at least for now (until we start to do doc publishing and other kinds of export/deployment related things) [18:59:43] ah the doc I looked at once https://docs.gitlab.com/runner/executors/docker_machine.html , their system is based on a fork of Docker Machine (which itself has been phased out by Docker) [19:00:02] but they might have another kind of executors [19:04:04] a possibility is to use Kubernetes and having it spawn containers inside VM (using Kata containers) [19:04:42] anyway, it is not straightforward [19:07:58] https://docs.gitlab.com/runner/security/ [19:11:20] TLDR that I get from this page: 1) Don't allow root in docker, 2) If you have a simple setup, using the "parallels/VirtualBox" driver is the most secure recomemndation and 3) If you're big you're on your own to figure out some kind of VM-based cloud solution or other pooling system. [19:11:42] There doesn't appear to be a mention of any opensource driver they support for creating or maintaining such pool, e.g. no openstack/AWS/GCE integration? [19:13:02] maybe a few hardware machines network isolated with their parallels driver could work for us. It wouldn't be very elastic though, and no oppertunity to sprinkle k8s on top just for fun. [19:13:34] anyway, I'm my sarcastic voice, I'll say, glad to see GitLab is such a well fitted solution that has all this complexity figured out out of hte box for us :P [19:16:04] I actually thought it would just plug in with openstack or k8s and figure out a way that somehow solves all this incl image caching and isolated root. k8s does look good as wel, but presumably means we either don't allow third-party containers and no root, or do something novel that isn't well-tested or recommended by upstream (like Kata Containers). [19:17:42] Given GitLab CI was in consideration for us before we considered it for code review, I do recall seeing a document floating around a while ago that had a plan for this, so maybe that's still sitting somewhere but we haven't gotten to that stage yet. [19:38:53] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:44:06] 10Release-Engineering-Team (Doing), 10Patch-For-Review, 10Release, 10Train Deployments: 1.38.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T281166 (10Zabe) There is {T292243}, but I don't know if it is new in wmf.2. [19:46:25] dduvall, jeena ^ [19:46:43] does that warrant a rollback? [19:47:19] jeena: hoping I can get a tad more info [19:49:37] https://www.mediawiki.org/wiki/MediaWiki_1.38/wmf.2/Changelog has no Kartographer changes [19:49:48] .1 had https://gerrit.wikimedia.org/r/q/52473c4c [19:50:07] RhinosF1: that map is generated by a Toolforge tool -- //wikivoyage.toolforge.org/w/poimap2.php?lat=44.1764&lon=8.3272&zoom=16&layer=W&lang=it&name=Finalborgo -- so I kind of doubt it's a train problem [19:50:50] (folks on wiki tend not to know the difference between "official" software and hacks made by the local community) [19:50:54] lols [19:51:12] bd808: oh fun [19:51:14] wow [19:51:20] Do we know who maintains it [19:51:42] https://toolsadmin.wikimedia.org/tools/id/wikivoyage [19:51:45] I thought we didn't like toolforge embeds too [19:52:00] That depends on the "we" :) [19:52:03] bd808: that's the task filer [19:52:28] "we" WMF SRE's and WMCS SRE's really don't. "we" the community just want nice things [19:53:03] we as in SREs [19:53:28] Actually the task states 'In the maps (not kartographer based)' [19:53:48] untag! [19:53:50] It does [19:54:25] source for the misbehaving tool is https://phabricator.wikimedia.org/source/tool-wikivoyage/browse/master/public_html/w/poimap2.php [19:54:25] Moved to #Tools [19:54:27] 10Release-Engineering-Team, 10Infrastructure-Foundations, 10Puppet, 10User-brennen: logspam-watch: UTF-8 errors for some input - https://phabricator.wikimedia.org/T292246 (10brennen) [19:54:33] * bd808 will climb out of this rabbit hole [19:54:35] 10Release-Engineering-Team, 10Infrastructure-Foundations, 10Puppet, 10User-brennen: logspam-watch: UTF-8 errors for some input - https://phabricator.wikimedia.org/T292246 (10brennen) [19:56:36] That repo ain't been updated [19:56:48] File has no change since initial commit [19:58:11] the error.log for the tool is full of "Undefined variable: " stuff. [19:59:28] ugh. I think this is broken due to lets encrypt signing cert expiration actually. [19:59:58] still not a train blocker, at least [20:00:45] bd808: there's another in -cloud too [20:01:01] At least we know the issue [20:20:51] PROBLEM - SSH on contint2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:25:44] jeena: issue fixed fwiw [20:26:04] thanks for the update :) [20:37:50] SSH on contint2001.mgmt alarm is known , has been going on since april/may T283582 [20:37:51] T283582: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ) - https://phabricator.wikimedia.org/T283582 [20:37:59] I should escalate it I guess [21:00:11] 10Release-Engineering-Team, 10serviceops, 10GitLab (Infrastructure), 10User-brennen: GitLab minor release: 14.3.1 - https://phabricator.wikimedia.org/T292256 (10brennen) [21:00:37] 10Release-Engineering-Team, 10serviceops, 10GitLab (Infrastructure), 10User-brennen: GitLab minor release: 14.3.1 - https://phabricator.wikimedia.org/T292256 (10brennen) p:05Triage→03Medium [21:21:51] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [21:35:30] (03PS5) 10Reedy: Zuul: Drop CI support for REL1_31 branch [integration/config] - 10https://gerrit.wikimedia.org/r/683031 (https://phabricator.wikimedia.org/T281294) (owner: 10Jforrester) [21:35:39] (03CR) 10Reedy: [C: 03+2] "RIP" [integration/config] - 10https://gerrit.wikimedia.org/r/683031 (https://phabricator.wikimedia.org/T281294) (owner: 10Jforrester) [21:37:27] 10MediaWiki-Releasing, 10Patch-For-Review, 10Security: Update HISTORY in master after 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285413 (10Reedy) 05Open→03In progress [21:37:28] (03Merged) 10jenkins-bot: Zuul: Drop CI support for REL1_31 branch [integration/config] - 10https://gerrit.wikimedia.org/r/683031 (https://phabricator.wikimedia.org/T281294) (owner: 10Jforrester) [21:38:24] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/683031 [21:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:42:06] (03PS1) 10Reedy: assert-no-mediawiki-errors.bash: Remove REL1_31 guard [integration/config] - 10https://gerrit.wikimedia.org/r/725118 (https://phabricator.wikimedia.org/T281294) [21:44:08] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Drop CI for REL1_31 branch once it's EOL - https://phabricator.wikimedia.org/T281294 (10Reedy) ^ After that one... There's just this one `lang=yaml - job: name: 'quibble-integration' project-type: matrix concurrent: false execution-s... [21:46:44] (03CR) 10Reedy: [C: 04-1] "Needs rebasing... And moving to at least 2.0.14 - https://github.com/composer/composer/releases/tag/2.0.14" [integration/config] - 10https://gerrit.wikimedia.org/r/683030 (https://phabricator.wikimedia.org/T279857) (owner: 10Jforrester) [21:47:00] 10Continuous-Integration-Infrastructure, 10Composer, 10Patch-For-Review: Re-build CI containers with Composer 2.x - https://phabricator.wikimedia.org/T279857 (10Reedy) [21:49:16] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash, 10observability, 10SRE Observability (FY2021/2022-Q2): Logstash in beta fails periodically - https://phabricator.wikimedia.org/T211984 (10lmata) p:05Triage→03Medium [21:50:21] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): beta logstash servers run out of disk space - https://phabricator.wikimedia.org/T288989 (10lmata) [21:50:29] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar): beta logstash servers run out of disk space - https://phabricator.wikimedia.org/T288989 (10lmata) our bit here should be good to go. please re-add observability if you require additional help [21:52:04] (03CR) 10Daimona Eaytoy: "(Just noting that this duplicates I95bf0eeb3aaedcfa26e912ece60c08bd2febcb06. There are also some more patches in that chain.)" [integration/config] - 10https://gerrit.wikimedia.org/r/725118 (https://phabricator.wikimedia.org/T281294) (owner: 10Reedy) [21:52:29] (03Abandoned) 10Reedy: assert-no-mediawiki-errors.bash: Remove REL1_31 guard [integration/config] - 10https://gerrit.wikimedia.org/r/725118 (https://phabricator.wikimedia.org/T281294) (owner: 10Reedy) [21:52:48] (03PS3) 10Reedy: jjb: [mediawiki-quibble*] Drop REL1_31 support [integration/config] - 10https://gerrit.wikimedia.org/r/683751 (owner: 10Jforrester) [21:52:52] (03PS3) 10Reedy: jjb: [quibble] Update assert-no-errors to drop REL1_31 skip [integration/config] - 10https://gerrit.wikimedia.org/r/683752 (owner: 10Jforrester) [21:53:03] (03PS4) 10Reedy: [DNM] dockerfiles: [composer-scratch] Upgrade to 2.0.13 and cascade [integration/config] - 10https://gerrit.wikimedia.org/r/683030 (https://phabricator.wikimedia.org/T279857) (owner: 10Jforrester) [21:54:10] 10Continuous-Integration-Infrastructure, 10Composer, 10Patch-For-Review: Re-build CI containers with Composer 2.x - https://phabricator.wikimedia.org/T279857 (10Reedy) [21:54:17] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Drop CI for REL1_31 branch once it's EOL - https://phabricator.wikimedia.org/T281294 (10Reedy) 05In progress→03Resolved a:03Jdforrester-WMF [23:36:28] 10MediaWiki-Releasing, 10Patch-For-Review, 10Security: Update HISTORY in master after 1.31.16/1.35.4/1.36.2 - https://phabricator.wikimedia.org/T285413 (10Reedy) 05In progress→03Resolved