[03:12:49] !log gitlab-webhooks restart after failed webhook requests (presumably earlier outage) [03:12:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:22:28] 10GitLab (Integrations), 10Phabricator, 10Release-Engineering-Team (GitLab III: GitLab in LA 🪃), 10User-brennen: Sandbox task for gitlab-phabricator comment integration - https://phabricator.wikimedia.org/T324164 (10CodeReviewBot) brennen closed https://gitlab.wikimedia.org/brennen/test/-/merge_requests/12... [04:56:02] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10User-brennen: Requesting GitLab non-external access/account unlock for Mvolz - https://phabricator.wikimedia.org/T336578 (10Mvolz) 05Open→03Resolved [06:19:46] 10GitLab (Project Migration), 10Release-Engineering-Team: Create new GitLab project group: wikimedia-france - https://phabricator.wikimedia.org/T336830 (10Poslovitch) [07:02:16] 10GitLab (Auth & Access), 10Release-Engineering-Team (They Live 🕶️🧟), 10User-brennen: Requesting GitLab non-external access/account unlock for Mvolz - https://phabricator.wikimedia.org/T336578 (10Mvolz) Thank you so much! There's now a gitlab repo in prod! 🎉 [07:59:57] 10Continuous-Integration-Config, 10Gerrit, 10Patch-For-Review, 10Upstream: mwext-phpunit-coverage-patch-docker votes, gets overridden - https://phabricator.wikimedia.org/T336660 (10hashar) > Non-voting CI jobs can appear voting if another job using the same user account already voted on the patchset. (Mayb... [09:27:36] (03PS3) 10Hashar: Do not carry Verified score on no change [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/919831 (https://phabricator.wikimedia.org/T336660) [09:29:08] (03CR) 10Hashar: "I have mixed up the two very similar settings. For the Verified label we need both copyAllScoresIfNoCodeChange (default false) and copyAll" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/919831 (https://phabricator.wikimedia.org/T336660) (owner: 10Hashar) [09:30:24] 10Gerrit, 10VPS-project-Codesearch, 10VPS-project-Extdist, 10serviceops-collab, 10Patch-For-Review: Move clients off of gerrit-replica.wikimedia.org back to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T336710 (10Ladsgroup) >>! In T336710#8857046, @hashar wrote: > I am replying here to the [[... [09:35:42] I forget who does what, but CI failures probably is releng right... T336840 [09:35:43] T336840: wmf-quibble-vendor-mysql-php74-docker failing due to insufficient space - https://phabricator.wikimedia.org/T336840 [09:50:59] 10Release-Engineering-Team (They Live 🕶️🧟), 10Patch-For-Review, 10Release, 10Train Deployments: 1.41.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T330215 (10hashar) I have done a bit of log triage and there is nothing new showing up in the server side error logs. That is encouraging I gu... [09:53:35] !log integration-agent-docker-1029: erased a few files under `/srv`. I suspect the 36GBytes partition is not large enough nowadays for multiple concurrent builds of MediaWiki # T336840 [09:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:53:39] T336840: wmf-quibble-vendor-mysql-php74-docker failing due to insufficient space - https://phabricator.wikimedia.org/T336840 [09:53:53] TheresNoTime: I think we would need large disk space on /srv and thus rebuild all instances [09:54:01] that has to be done to upgrade them to the new Debian OS [09:54:08] anyway food for later, I am off for lunch [09:55:48] o7 [12:04:33] (Queue (Jenkins jobs + Zuul functions) alert) firing: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [12:14:33] (Queue (Jenkins jobs + Zuul functions) alert) resolved: - https://alerts.wikimedia.org/?q=alertname%3DQueue+%28Jenkins+jobs+%2B+Zuul+functions%29+alert [12:59:16] (03CR) 10Tchanders: Rename security-api to ipoid (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/919873 (https://phabricator.wikimedia.org/T336218) (owner: 10Tchanders) [13:40:56] maintenance-disconnect-full-disks build 491932 integration-agent-docker-1031 (/: 29%, /srv: 99%, /var/lib/docker: 46%): OFFLINE due to disk space [13:45:52] maintenance-disconnect-full-disks build 491933 integration-agent-docker-1031 (/: 29%, /srv: 70%, /var/lib/docker: 46%): RECOVERY disk space OK [14:20:26] 10GitLab (Infrastructure), 10SRE, 10ops-eqiad, 10serviceops-collab: gitlab-runner1003 is not coming back online - https://phabricator.wikimedia.org/T336737 (10Jclark-ctr) opened Service request with dell. Confirmed: Service Request 168325397 was successfully submitted. While waiting for response i have p... [14:20:51] 10GitLab (Infrastructure), 10SRE, 10ops-eqiad, 10serviceops-collab: gitlab-runner1003 is not coming back online - https://phabricator.wikimedia.org/T336737 (10Jclark-ctr) a:03Jclark-ctr [14:25:39] 10GitLab (Infrastructure), 10SRE, 10ops-eqiad, 10serviceops-collab: gitlab-runner1003 is not coming back online - https://phabricator.wikimedia.org/T336737 (10Jclark-ctr) @Jelto Server has booted properly with no errors. Can you put server back in service. To see if error returns? [14:30:17] 10Phabricator, 10Release-Engineering-Team (They Live 🕶️🧟), 10serviceops-collab, 10Patch-For-Review, 10User-brennen: Migrate phabricator.wikimedia.org to Phorge as upstream - https://phabricator.wikimedia.org/T333885 (10CodeReviewBot) brennen opened https://gitlab.wikimedia.org/repos/phabricator/deploymen... [14:30:38] 10Phabricator, 10Release-Engineering-Team (They Live 🕶️🧟), 10serviceops-collab, 10Patch-For-Review, 10User-brennen: Migrate phabricator.wikimedia.org to Phorge as upstream - https://phabricator.wikimedia.org/T333885 (10CodeReviewBot) [14:39:47] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team (Seen), 10Zuul, 10Patch-For-Review: Display Zuul status of jobs for a change on Gerrit UI - https://phabricator.wikimedia.org/T214068 (10hashar) I had some trouble with triggering the `reload` cause it is not attached to `docum... [15:29:07] (03CR) 10Thcipriani: "Some questions/thoughts inline" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/919831 (https://phabricator.wikimedia.org/T336660) (owner: 10Hashar) [16:44:49] (03CR) 10Hashar: "Maybe I can test it on test/gerrit-ping , I would have to find a repro case." [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/919831 (https://phabricator.wikimedia.org/T336660) (owner: 10Hashar) [17:13:06] 10Phabricator: How to handle outing content - https://phabricator.wikimedia.org/T336888 (10Bugreporter) [17:13:58] 10Phabricator: How to handle outing content - https://phabricator.wikimedia.org/T336888 (10matmarex) [17:56:53] mutante: extension distributor actually kicks ini every 30 minutes via modules/extdist/manifests/init.pp :/ [17:56:59] and hits the gerrit-replica [17:57:15] hashar: oh, so they DO use the replica still and were right [17:57:15] I am not sure what it does really, hopefully it is just fetching [17:57:25] yeah for generating the dist file [17:57:34] but the extension and mediawiki-config point to gerrit.wikimedia.org [17:57:38] I forgot to check Puppet earlier [17:57:39] :D [17:57:44] mystery solved [17:57:45] I see. hmm.. so again the option to either ask them to switch [17:57:48] or not [17:57:51] I think it is fine if they are stall [17:58:03] then I don't know much about that system [17:58:10] I am off for real, it is dinner time [17:58:18] ok, enjoy dinner [17:59:08] well, since it's puppet I can make a patch just to discuss it [18:20:13] 10Gerrit: Gerrit LFS objects lack an automatic sync to gerrit replicas - https://phabricator.wikimedia.org/T257741 (10Dzahn) Hey, new subscribers. In our meeting today we talked about this and said that LFS data is not needed on replicas. But we also said there is this ticket about not having replication for LF... [18:21:01] 10Gerrit, 10serviceops-collab: Gerrit LFS objects lack an automatic sync to gerrit replicas - https://phabricator.wikimedia.org/T257741 (10Dzahn) [18:21:03] mutante: i dont know what will happen to ext dist if the source is down :( Hopefully it is juts tarball being stall for a few hours [18:21:32] worse case, there are no tarballs so end uszrs would have trouble fetching extension/skins [18:22:52] I guess you can switch extdist to the primary via a puppet patch. Things to watch on Gerrit would be latrncy/cache hit ratiobut it might be tricky [18:22:59] hashar: how do you feel about just switching it to gerrit.wm.org before? [18:23:14] I see [18:23:35] Yeah we might want to try. Then tomorrow is an holiday :) [18:23:55] ok, yes, not this week. I am thinking more like Monday [18:24:04] not now but not Thursday morning either [18:24:09] It might be fine. Deployment prep already fetch all those repos every few minutes [18:24:29] So they would be in Gerrit cache. It might not change anything [18:25:15] nod [18:25:35] hashar: I have one more question. I found the ticket about LFS data https://phabricator.wikimedia.org/T257741#8860393 [18:25:45] so we said today we dont need this data on the replica [18:25:56] but that ticket says it should end up there [18:26:05] and both are true statements.. is that right [18:26:24] Phabricator and Code Seaech are different beast though cause they crawls way more repos and that might disrupt cache. Then the new machine is larger so we could just double the JVM heap to be sure [18:26:25] as in "right now it's irrelevant but in the future it should be copied" [18:26:39] but it doesnt have to be copied after reimage [18:27:03] LFS data is only fetched for scap deployed repo and all use case point to the primary [18:27:08] ok [18:27:39] 10Gerrit, 10serviceops-collab: Gerrit LFS objects lack an automatic sync to gerrit replicas - https://phabricator.wikimedia.org/T257741 (10Dzahn) <+hashar> LFS data is only fetched for scap deployed repo and all use case point to the primary confirmed! [18:27:40] And i found out they are not replicated. They should be moved to Swift or using a Ceph fs, but that is a bit more work. [18:28:01] hashar: I think soon you can close https://phabricator.wikimedia.org/T333143 too [18:28:21] after I make the new path default.. patch in gerrit for review [18:28:44] ACK@ object storage.. gotcha [18:29:27] Yeah I'd like to have everything under /srv/gerrit ultimately, rely on Gerrit defaults and drop most config from puppet/hiera [18:30:08] this is the cleanup change from my side: https://gerrit.wikimedia.org/r/c/operations/puppet/+/920765 [18:30:16] no need to review now.. just saying [18:30:33] but after that I think it might be done [18:30:59] :) will look at it on friday [18:31:07] Dinner served! [18:31:56] à plus tard [18:54:12] 10Gerrit, 10Release-Engineering-Team, 10serviceops-collab, 10Patch-For-Review, 10Sustainability (Incident Followup): Move Gerrit data out of root partition - https://phabricator.wikimedia.org/T333143 (10Dzahn) I think after the patch above is merged we might be able to close this. [19:03:41] 10Release-Engineering-Team (They Live 🕶️🧟), 10Patch-For-Review: Kokkuri should allow dockerfile.v0 frontend - https://phabricator.wikimedia.org/T326569 (10xcollazo) Awesome! Thanks @dancy, will test this in the near future! [19:18:50] hi releng, for one repo gerrit constantly gives the error 'This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.' [19:19:11] and this is for patches which are either right on top of master or right on top of a patch which does pass [19:19:21] for example, this one: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/919249 [19:19:55] had actually been passing tests until I rebased it on top of master prior to giving it a C+2 [19:20:13] then the rebased patch got a V-1 Merge Failed [19:21:24] mutante: any of your gerrit messing related? [19:22:15] I've noticed this only on the wikimedia/fundraising/tools repo, since last week some time [19:22:26] Though we hadn't touched that repo in a while before last week [19:24:01] Not new this evening then [19:24:18] nope [19:24:24] That's useful [19:24:37] Any indication when it did start [19:26:10] The first I noticed was May 11th on this patch: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/919256/3 [19:26:25] it was the second in a chain where the first was passing tests [19:27:21] and it directly followed the first in my git history - no more rebasing on top of the first was possible in gerrit as 'this change is already up to date with its parent' [19:27:32] yet it got the V-1 Merge Failed [19:28:24] but again, we hadn't made any commits to that repo since last November, so no idea if the breakage happened well before last week [19:29:47] hash.ar will likely know best but it's 9:30pm for him [19:30:02] Might be worth filing a task if no one responds soonish [19:30:03] ok, I can make a phab [19:30:28] Tag it with the test-failure tag [19:30:41] There's one for CI and one for WMF deployed repos [19:31:08] Link it in here though once done :) [19:31:13] ok, will do! [19:31:36] RhinosF1: not with anything from today at least [19:31:50] ejegg: I do see a jenkins + 2 on that patch though now? [19:31:57] mutante: ruled that out by it starting sometime between November and last week [19:32:01] like it worked after recheck [19:32:32] was this a temp issue or is it ongoing now? [19:32:38] mutante: it worked after the parent patch was merged [19:33:07] now the patch after it is getting the same spurious V-1: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/920360/2 [19:33:08] ok. so far it sounds kind like what people refer to as "rebase hell" [19:33:20] hmm, ok [19:33:30] and even this one which is directly on top of master is getting the same: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/919249 [19:33:34] yea, ticket would be great [19:33:37] then [19:33:40] writing it now [19:33:58] thanks [19:35:59] RhinosF1: what happened today were just more things to make it even safer. like gerrit1001 removed from firewall/ssh config etc.. so it can definitely not start replicating from old host [19:36:13] and service there is stopped since 6 days [19:36:30] so I am confident it's not that again that 2 gerrits were fighting over replication or anything [19:38:05] mutante: ah cool [19:38:09] 10Release-Engineering-Team, 10ci-test-error: Gerrit gives spurious V-1 Merge Failed in wikimedia/fundraising/tools repo - https://phabricator.wikimedia.org/T336902 (10Ejegg) [19:38:12] I'm on train with no wifi [19:38:17] So running off data [19:38:25] OK, there's the ticket ^^^ [19:38:31] thanks for taking a look! [19:45:52] at the risk of muddying the evidence I tried a recheck and got it merged this time [19:48:30] 10Release-Engineering-Team (They Live 🕶️🧟), 10serviceops-collab, 10Patch-For-Review: upgrade gerrit servers to bullseye - https://phabricator.wikimedia.org/T334521 (10Dzahn) the plan for gerrit2002: - on Thursday, May 25th: -- schedule downtime, monitoring downtime -- mask gerrit service on gerrit2002 - ht... [20:35:14] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Wikimedia-Portals, 10Discovery-Portal-Sprint: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694 (10MarcoAurelio) [20:36:08] ejegg: glad to hear that you got it merged, expect more on ticket when Europe is back, holiday tomorrow [20:36:29] glad it doesn't block [20:50:03] have a good holiday! [20:51:19] * mutante waves (wait, is it a Euro holiday or a global holiday [21:21:13] is there a holiday tomorrow? [21:27:53] https://en.wikipedia.org/wiki/Feast_of_the_Ascension [21:35:16] 10Phabricator: Uninstall Differential (Phabricator application) - https://phabricator.wikimedia.org/T330797 (10Aklapper) [21:37:48] 10Phabricator: Uninstall Packages (Phabricator application) - https://phabricator.wikimedia.org/T336906 (10Aklapper) [21:37:55] 10Phabricator: Uninstall Owners (Phabricator application) - https://phabricator.wikimedia.org/T336907 (10Aklapper) [21:41:10] 10Release-Engineering-Team: Give @aklapper access to Drydock (Phabricator application) - https://phabricator.wikimedia.org/T336908 (10Aklapper) [22:43:54] zabe: ah, thanks; not here though :)