[02:20:01] Project beta-update-databases-eqiad build #77387: 04FAILURE in 0.46 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77387/ [03:20:01] Project beta-update-databases-eqiad build #77388: 04STILL FAILING in 0.59 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77388/ [04:20:01] Project beta-update-databases-eqiad build #77389: 04STILL FAILING in 0.54 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77389/ [05:20:01] Project beta-update-databases-eqiad build #77390: 04STILL FAILING in 0.51 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77390/ [06:20:01] Project beta-update-databases-eqiad build #77391: 04STILL FAILING in 0.49 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77391/ [07:20:01] Project beta-update-databases-eqiad build #77392: 04STILL FAILING in 0.53 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77392/ [08:08:35] (03CR) 10Arthur taylor: "I think this change needs to depend on I73f465d86eafc377ad69c5054404673108c3f182 - `$wgEntitySchemaIsRepo = false;` needs to be set in the" [integration/config] - 10https://gerrit.wikimedia.org/r/1052669 (https://phabricator.wikimedia.org/T367156) (owner: 10Arthur taylor) [08:11:32] 06Release-Engineering-Team: Fix/remove deployment-charts update_version.py - https://phabricator.wikimedia.org/T369884 (10JMeybohm) 03NEW [08:20:01] Project beta-update-databases-eqiad build #77393: 04STILL FAILING in 0.55 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77393/ [08:47:06] (03CR) 10Hashar: [C:03+2] zuul: [mediawiki/extensions/FanBoxes] Add SpamRegex for phan [integration/config] - 10https://gerrit.wikimedia.org/r/1053744 (owner: 10Jack Phoenix) [08:48:14] (03Merged) 10jenkins-bot: zuul: [mediawiki/extensions/FanBoxes] Add SpamRegex for phan [integration/config] - 10https://gerrit.wikimedia.org/r/1053744 (owner: 10Jack Phoenix) [09:20:01] Project beta-update-databases-eqiad build #77394: 04STILL FAILING in 0.51 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77394/ [10:15:13] 10GitLab, 06Release-Engineering-Team, 06collaboration-services: Update ldap-sync-bot token - https://phabricator.wikimedia.org/T369532#9976112 (10Jelto) [10:20:01] Project beta-update-databases-eqiad build #77395: 04STILL FAILING in 0.73 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77395/ [10:31:12] 10GitLab, 06Release-Engineering-Team, 06collaboration-services: Update ldap-sync-bot token - https://phabricator.wikimedia.org/T369532#9976141 (10Jelto) I created a new token `for-automated-group-management-2` which expires July 11, 2025 at 2:00:00 AM GMT+2. The job succeeded with the new token: ` Jul 12 1... [10:32:11] 10GitLab, 06Release-Engineering-Team, 06collaboration-services: Update ldap-sync-bot token - https://phabricator.wikimedia.org/T369532#9976142 (10Jelto) [10:34:13] 10GitLab, 06Release-Engineering-Team, 06collaboration-services: Update ldap-sync-bot token - https://phabricator.wikimedia.org/T369532#9976143 (10Jelto) 05Openβ†’03Resolved I deleted the old token `for-automated-group-management`. So this task is done. Thanks @dancy for the detailed checklist, I appre... [11:20:01] Project beta-update-databases-eqiad build #77396: 04STILL FAILING in 0.66 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77396/ [11:31:45] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for CanonNi - https://phabricator.wikimedia.org/T369892 (10CanonNi) 03NEW [12:06:14] 10GitLab (Pipeline Services Migration🐀), 06collaboration-services, 06Data-Platform-SRE, 10Wikidata, and 3 others: move commons-query.wikimedia.org and query.wikidata.org to kubernetes - https://phabricator.wikimedia.org/T350793#9976315 (10Jelto) Let me know the preferred location for the repository. It wou... [12:18:28] 06Release-Engineering-Team, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Setup a test project to validate upload to the Gitlab package registry - https://phabricator.wikimedia.org/T367391#9976320 (10Gehel) Thanks @brennen ! Looks like this is working on my end as well. * pack... [12:20:01] Project beta-update-databases-eqiad build #77397: 04STILL FAILING in 0.53 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77397/ [12:32:29] 06Release-Engineering-Team, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Setup a test project to validate upload to the Gitlab package registry - https://phabricator.wikimedia.org/T367391#9976382 (10Gehel) Full release working: https://gitlab.wikimedia.org/repos/maven/maven-te... [12:32:59] 06Release-Engineering-Team, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Setup a test project to validate upload to the Gitlab package registry - https://phabricator.wikimedia.org/T367391#9976383 (10Gehel) [12:45:00] 00:00:00.479 oojs/oojs-ui: 0.50.4 installed, 0.50.3 required. [12:45:00] :( [12:49:59] cause https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1053811 [12:50:08] and core change failed on gate https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1053816 [12:50:12] so I have +2ed it again [12:54:41] 10Diffusion, 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Make https://git.wikimedia.org not redirect to Phabricator Diffusion - https://phabricator.wikimedia.org/T323073#9976429 (10Bugreporter) I think we can redirect it to https://www.mediawiki.org/wiki/Git [13:03:19] 10Diffusion, 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Make https://git.wikimedia.org not redirect to Phabricator Diffusion - https://phabricator.wikimedia.org/T323073#9976483 (10hashar) 05Stalledβ†’03Open >>! In T323073#9976429, @Bugreporter wrote: > I th... [13:14:48] !log deployment-prep: fixed https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ which was block due to oojs-ui being updated ( https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1053811 & https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1053816 ) [13:14:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:14:52] * hashar flexes [13:20:23] 10Gerrit (Gerrit 3.10): Use Gerrit 3.10 built-in log rotation - https://phabricator.wikimedia.org/T367505#9976539 (10hashar) 05Openβ†’03Resolved [13:22:40] 10GitLab (Infrastructure), 06collaboration-services: Increase disk size for GitLab test instance - https://phabricator.wikimedia.org/T369837#9976542 (10Jelto) p:05Triageβ†’03Medium a:03Jelto [13:24:36] Yippee, build fixed! [13:24:36] Project beta-update-databases-eqiad build #77398: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/77398/ [13:26:24] 10GitLab (Infrastructure), 06collaboration-services, 13Patch-For-Review: Increase disk size for GitLab test instance - https://phabricator.wikimedia.org/T369837#9976557 (10Jelto) >>! In T369837#9974059, @bd808 wrote: > https://wikitech.wikimedia.org/wiki/Help:Adding_disk_space_to_Cloud_VPS_instances Thanks... [13:27:06] Just curious, has anyone used git-sync? https://github.com/kubernetes/git-sync/tree/master . I'm about to start on an image and if y'all have any caveats/advice LMK [13:34:52] inflatador: I have never of it. That sounds like the equivalent of doing `git::clone { ensure => latest }` and deploying arbitrary code as long as it just got merged in the git repo :d [13:35:06] (or running `npm install` without a lockfile but I digress) [13:36:02] hashar as scary as that sounds, that is the idea. But we aren't really deploying code per se, it's airflow DAGs which are basically batch jobs https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags [13:37:15] I don't see how different that is from code [13:38:05] I guess you're right. Maybe I should've said, the stakes are low here. If we screw up a DAG, that just means we have to fix it and re-run. Which happens all the time already ;) [13:38:19] yeah maybe [13:38:25] I am missing too much context [13:38:38] in the old day we would avoid having code magically landing in production without a human internvetion / +1 [13:39:10] I imagine the process to deploy the DAG involves some manual verification step [13:39:33] Generally I agree with that ;). Although yes, deploying the DAG itself needs permission. And the git-sync workflow is supported natively by airflow itself [13:39:35] (an attack vector is someone sneaking some code in the git repo, for example by hijacking a user account) [13:39:46] which has happened at least twice in the past [13:40:25] then I imagine it is not that different than having a Dockerfile doing a `git clone` of whatever the HEAD branch is pointing at [13:40:35] without checking the expected checksum or a signed tag [13:40:36] ;) [13:40:55] yeah, CI/CD is a huge attack vector as always ;( [13:41:11] that depends how you do it ;) [13:41:28] anyway, I missing too many bits of context [13:41:37] You're still bringing up important points [13:41:45] and I don't know much about how we build and deploy other things for k8s [13:42:14] I also probably haven't adjusted my mental model to CD / k8s things [13:42:30] Me neither. I'm trying ;) [13:42:34] !!!! [13:44:46] oh [13:44:47] well [13:45:08] it sounds super scary to automatically fetch from the git repo and trigger a hook / autodeploy [13:45:20] I am not entirey sure I would hide it there [13:46:02] for a dev environment probably, but a for a production one it feels too magic to me :] [13:46:13] then I guess I like having things under tight control when possible [13:46:22] anyway those were my 2 cents ;] [14:07:43] eh, now that I think of it, an authenticated human still has to run the DAG manually. So the exposure shouldn't be worse than what we have already [15:04:37] 10GitLab (Administration, Settings & Policy), 06Release-Engineering-Team, 06collaboration-services, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Create a global Maven package registry in Gitlab - https://phabricator.wikimedia.org/T367322#9976856 (10Gehel) [15:07:05] 10GitLab (Administration, Settings & Policy), 06Release-Engineering-Team, 06collaboration-services, 06Java-Scala-Standardization, 10Data-Platform-SRE (2024.07.08 - 2024.07.28): Create a global Maven package registry in Gitlab - https://phabricator.wikimedia.org/T367322#9976880 (10Gehel) This task is most... [15:44:57] 10Diffusion, 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Make https://git.wikimedia.org not redirect to Phabricator Diffusion - https://phabricator.wikimedia.org/T323073#9977034 (10Pppery) I agree with bugreporter here - there's no reason to deliberately break... [16:09:29] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace deployment-cumin with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T361380#9977146 (10Andrew) 05Openβ†’03Resolved I don't know of useful thing to do with metatdata here since the OS i... [16:09:38] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove deployment-echostore02.deployment-prep.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T361383#9977154 (10Andrew) Hello folks! Is anyone planning to replace this? [16:10:10] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace deployment-prep kafka hosts with Bullseye or Bookworm - https://phabricator.wikimedia.org/T361382#9977158 (10Andrew) 05Openβ†’03Resolved [16:10:26] (03PS2) 10Zoranzoki21: Zuul: [mediawiki/extensions/UserVerification] Add new extension [integration/config] - 10https://gerrit.wikimedia.org/r/1053949 [16:19:09] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Rebuild or delete deployment-docker-changeprop01 - https://phabricator.wikimedia.org/T369913 (10Andrew) 03NEW [16:19:12] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Rebuild or delete deployment-docker-cpjobqueue01 - https://phabricator.wikimedia.org/T369914 (10Andrew) 03NEW [16:19:14] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Rebuild or delete deployment-docker-mobileapps01 - https://phabricator.wikimedia.org/T369915 (10Andrew) 03NEW [16:19:19] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Rebuild or delete deployment-docker-proton01 - https://phabricator.wikimedia.org/T369916 (10Andrew) 03NEW [16:21:51] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9977248 (10Andrew) [16:23:20] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace deployment-eventlog08 with Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369918 (10Andrew) 03NEW [16:26:59] 10Beta-Cluster-Infrastructure, 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace deployment-ircd02 with a Bullseye or Bookworm host - https://phabricator.wikimedia.org/T369919 (10Andrew) 03NEW [16:27:32] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9977286 (10Andrew) [16:28:36] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9977303 (10Andrew) [16:28:49] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation): Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9977306 (10Andrew) [16:29:36] (03PS1) 10Pppery: Zuul: [mediawiki/extensions/PageCreationNotif] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053952 (https://phabricator.wikimedia.org/T367673) [16:45:51] 10GitLab (Account Approval), 06Release-Engineering-Team: Requesting GitLab account activation for CanonNi - https://phabricator.wikimedia.org/T369892#9977392 (10Aklapper) 05Openβ†’03Resolved It looks like that account is already approved in GitLab (though I don't know how that happened), thus resolving [16:49:27] 10Diffusion, 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Make https://git.wikimedia.org not redirect to Phabricator Diffusion - https://phabricator.wikimedia.org/T323073#9977406 (10Aklapper) I believe `git clone https://www.mediawiki.org/wiki/Git` will also br... [16:50:46] 10Diffusion, 10Phabricator, 06Release-Engineering-Team, 06collaboration-services, 13Patch-For-Review: Make https://git.wikimedia.org not redirect to Phabricator Diffusion - https://phabricator.wikimedia.org/T323073#9977407 (10Pppery) ` $ git clone https://git.wikimedia.org Cloning into 'git.wikimedia.org... [16:57:53] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 05Release, 05Train Deployments: 1.43.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T366959#9977414 (10thcipriani) p:05Triageβ†’03Medium a:03dancy [16:58:39] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 05Release, 05Train Deployments: 1.43.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T366960#9977419 (10thcipriani) p:05Triageβ†’03Medium a:03dduvall [16:59:40] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 05Release, 05Train Deployments: 1.43.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T366961#9977425 (10thcipriani) p:05Triageβ†’03Medium a:03brennen [17:03:01] Project beta-code-update-eqiad build #504177: 04FAILURE in 0.34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504177/ [17:13:01] Project beta-code-update-eqiad build #504178: 04STILL FAILING in 0.34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504178/ [17:13:02] (03PS1) 10Pppery: Zuul: [mediawiki/extensions/StickToThatLanguage] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053958 (https://phabricator.wikimedia.org/T367670) [17:13:03] (03PS1) 10Pppery: Zuul: [mediawiki/extensions/DeleteOwn] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053959 (https://phabricator.wikimedia.org/T366663) [17:13:05] (03PS1) 10Pppery: Zuul: [mediawiki/extensions/CollaborationKit] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053960 [17:13:53] (03PS2) 10Pppery: Zuul: [mediawiki/extensions/DeleteOwn] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053959 (https://phabricator.wikimedia.org/T366663) [17:13:57] (03PS2) 10Pppery: Zuul: [mediawiki/extensions/CollaborationKit] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053960 [17:15:17] (03PS3) 10Pppery: Zuul: [mediawiki/extensions/CollaborationKit] Mark as archived [integration/config] - 10https://gerrit.wikimedia.org/r/1053960 (https://phabricator.wikimedia.org/T368092) [17:23:01] Project beta-code-update-eqiad build #504179: 04STILL FAILING in 0.46 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504179/ [17:26:34] well that's a bizarre, I wonder why that just started failing [17:27:49] thcipriani: "Aborting: This scap command is disabled on this host." ? [17:28:06] yeah [17:28:30] hmmmm... "If you really need to run it, you can override by passing "-Dblock_deployments:False" to the call" [17:28:45] I remember seeing that code review go by for scap, but it's been a bit. And I don't think anyone deployed a new version today. [17:28:52] yeah, I'm just about to add that to the job [17:30:06] right.. nothing in SAL it seems [17:31:03] hrm, maybe there was some config in puppet...not that I see anything that just merged. [17:33:01] Project beta-code-update-eqiad build #504180: 04STILL FAILING in 0.39 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504180/ [17:36:29] oh...I think I see what happened. abogott has this cherry-picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053956 which makes deploy03 think it's no longer the deployment host. [17:36:51] I'll make a patch to the job and a revert that depends on ^ [17:37:34] aha! [17:40:26] (03PS1) 10Thcipriani: Beta: update deploy while moving servers [integration/config] - 10https://gerrit.wikimedia.org/r/1053963 [17:42:29] (03PS1) 10Thcipriani: Revert "Beta: update deploy while moving servers" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 [17:43:01] Project beta-code-update-eqiad build #504181: 04STILL FAILING in 0.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504181/ [17:43:03] (03CR) 10Thcipriani: [C:03+2] Beta: update deploy while moving servers [integration/config] - 10https://gerrit.wikimedia.org/r/1053963 (owner: 10Thcipriani) [17:43:58] (03CR) 10CI reject: [V:04-1] Revert "Beta: update deploy while moving servers" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 (owner: 10Thcipriani) [17:44:21] (03Merged) 10jenkins-bot: Beta: update deploy while moving servers [integration/config] - 10https://gerrit.wikimedia.org/r/1053963 (owner: 10Thcipriani) [17:49:39] !log reconfigure beta-code-update-eqiad beta-scap-sync-world beta-update-databases-eqiad pending merge of https://gerrit.wikimedia.org/r/1053956 [17:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:53:01] Project beta-code-update-eqiad build #504182: 04STILL FAILING in 0.35 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504182/ [17:55:41] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation), 13Patch-For-Review: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9977598 (10Andrew) [18:53:46] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 05Release, 05Train Deployments: 1.43.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T366961#9977801 (10brennen) [19:55:03] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Buster Deprecation), 13Patch-For-Review: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm - https://phabricator.wikimedia.org/T327742#9978016 (10Andrew) [20:01:16] Project beta-code-update-eqiad build #504183: 04STILL FAILING in 1.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504183/ [20:02:01] !log Restarted Jenkins agent on deployment-deploy03 [20:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:02:53] Project beta-code-update-eqiad build #504184: 04STILL FAILING in 0.42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504184/ [20:03:01] Project beta-code-update-eqiad build #504185: 04STILL FAILING in 0.32 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504185/ [20:03:49] The beta-code-update-eqiad error is `scap: error: extra arguments found: -Dblock_deployments:False` [20:04:31] The first failure was `Aborting: This scap command is disabled on this host. If you really need to run it, you can override by passing "-Dblock_deployments:False" to the call` [20:07:19] Project beta-code-update-eqiad build #504186: 04STILL FAILING in 0.36 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504186/ [20:12:50] !log disable beta-code-update-eqiad/beta-scap-sync-world until server tinkering concludes [20:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:35:44] boy howdy, setting up a deployment server shows a lot of puppet failures. [20:47:34] take two [21:07:14] lookin' good [21:08:40] ...two puppet runs and a reboot fixed us right up. [21:20:15] 06Release-Engineering-Team, 07Epic, 10Quality-and-Test-Engineering-Team (Test Infrastructure): Group -1 pre-train QTE validation environment - https://phabricator.wikimedia.org/T369112#9978316 (10Jrbranaa) [21:27:58] 10Release-Engineering-Team (Seen), 07Code-Health, 07Epic, 10Quality-and-Test-Engineering-Team (SonarCloud Admin): [EPIC] Encourage developers to increase code coverage - https://phabricator.wikimedia.org/T100294#9978368 (10Jrbranaa) [21:28:01] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10SonarQube Bot, 07Code-Health, 10Quality-and-Test-Engineering-Team (SonarCloud Admin): Improve speed of codehealth checks - https://phabricator.wikimedia.org/T351561#9978364 (10Jrbranaa) [21:28:11] 10Scap: scap prep auto fails on new deployment host - https://phabricator.wikimedia.org/T369954 (10thcipriani) 03NEW [21:28:47] thcipriani: should we reenable the update jobs now? [21:29:14] (03CR) 10Thcipriani: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 (owner: 10Thcipriani) [21:29:35] bd808: I think we're all ready, if I can merge ^ [21:30:28] bah, failing on my commit message [21:31:01] (03PS2) 10Thcipriani: Revert "Beta: update deploy while moving servers" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 [21:32:36] (03CR) 10Thcipriani: [C:03+2] Revert "Beta: update deploy while moving servers" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 (owner: 10Thcipriani) [21:33:31] * bd808 is lost in a maze of twisty toolforge-build passages [21:33:40] (03Merged) 10jenkins-bot: Revert "Beta: update deploy while moving servers" [integration/config] - 10https://gerrit.wikimedia.org/r/1053965 (owner: 10Thcipriani) [21:38:30] !log update beta-* CI jobs, pool deployment-deploy04 in jenkins, offline deployment-deploy03 [21:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:38:32] * thcipriani waits [21:44:28] Yippee, build fixed! [21:44:28] Project beta-code-update-eqiad build #504187: 09FIXED in 1 min 27 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/504187/ [21:44:35] half way there. [21:45:02] Project beta-scap-sync-world build #163311: 04FAILURE in 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163311/ [21:45:06] bah [21:47:08] * thcipriani disables + fixes [21:55:22] ok, private dir in place, keyholder armed [21:55:26] * thcipriani reenables [22:05:17] (03open) 10bking: data-engineering: Allow git-sync image to build on trusted runners [repos/releng/gitlab-trusted-runner] - 10https://gitlab.wikimedia.org/repos/releng/gitlab-trusted-runner/-/merge_requests/91 (https://phabricator.wikimedia.org/T364387 https://phabricator.wikimedia.org/T368033) [22:23:41] wonder if rsync is going to cause the beta-scap-sync-world job to timeout [22:29:30] Project beta-scap-sync-world build #163312: 04STILL FAILING in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163312/ [22:30:14] oh good. can't reach logstash [22:33:01] assuming this is probably a firewall thing [22:33:06] Project beta-scap-sync-world build #163313: 04STILL FAILING in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163313/ [22:33:12] but don't have access to that project [22:35:15] Project beta-scap-sync-world build #163314: 04STILL FAILING in 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163314/ [22:45:06] Project beta-scap-sync-world build #163315: 04STILL FAILING in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163315/ [22:55:07] Project beta-scap-sync-world build #163316: 04STILL FAILING in 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163316/ [22:59:28] !log skipping logstash checks in beta [22:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:05:02] Project beta-scap-sync-world build #163317: 04STILL FAILING in 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163317/ [23:05:52] too many open files. The gift that keeps on giving [23:09:10] 10Release-Engineering-Team (Radar), 06SRE Observability: New beta deployment server unable to connect to logging logstash server in WMCS - https://phabricator.wikimedia.org/T369962 (10thcipriani) 03NEW [23:09:39] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar), 06SRE Observability: New beta deployment server unable to connect to logging logstash server in WMCS - https://phabricator.wikimedia.org/T369962#9978751 (10thcipriani) [23:11:28] we do set a low ulimit here. I wonder why --force would cause it to explode. [23:12:31] thcipriani: My guess is that it is udp logging sockets that never get closed. [23:13:15] I figured out the problem for T369962. I just need the IPv4 of the new deploy server instance to fix it for you. [23:13:16] T369962: New beta deployment server unable to connect to logging logstash server in WMCS - https://phabricator.wikimedia.org/T369962 [23:13:46] 172.16.1.63 [23:14:46] also: well done. I was spinning my grep wheels [23:15:09] Project beta-scap-sync-world build #163318: 04STILL FAILING in 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163318/ [23:15:19] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar), 06SRE Observability: New beta deployment server unable to connect to logging logstash server in WMCS - https://phabricator.wikimedia.org/T369962#9978758 (10bd808) Somebody got really paranoid and only opened the OpenStack level firewall to th... [23:15:32] ^ grep would have never found it [23:15:57] ah ha! [23:16:05] well, now I feel less bad about grep failure [23:19:58] thcipriani: I think the port should be open for you now [23:20:07] confirmed! [23:20:19] thanks bd808 <3 [23:20:48] !log un-skipping logstash checks in beta [23:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:21:07] now I guess we'll see if I need to raise that ulimit for some reason... [23:21:29] curious it started happening when I used the scap --force flag [23:22:27] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Radar), 06SRE Observability: New beta deployment server unable to connect to logging logstash server in WMCS - https://phabricator.wikimedia.org/T369962#9978769 (10bd808) 05Openβ†’03Resolved a:03bd808 `lang=irc [23:19] < bd808> thcipriani... [23:24:58] Project beta-scap-sync-world build #163319: 04STILL FAILING in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163319/ [23:25:27] hrm, watching lsof -i UDP showed like 6 udp connections when it died [23:25:35] * thcipriani raises ulimit [23:28:06] climbed up to 7 during the rebuild, but doubling udp seems to have gotten it through [23:28:11] er ulimit [23:29:01] now...it'll probably time out [23:41:27] Yippee, build fixed! [23:41:28] Project beta-scap-sync-world build #163320: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/163320/ [23:41:36] wahoo [23:42:12] !log beta deployments now running from deployment-deploy04 (new bullseye host) [23:42:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL