[00:00:39] (DatasourceError) firing: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [00:05:39] (DatasourceError) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [00:05:54] (DatasourceError) firing: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [00:10:54] (DatasourceError) resolved: Queue (Jenkins jobs + Zuul functions) alert - https://grafana.wikimedia.org/alerting/grafana/iS0FSjJ4z/view - https://wikitech.wikimedia.org/wiki/Monitoring/DatasourceError - https://alerts.wikimedia.org/?q=alertname%3DDatasourceError [01:58:29] 10GitLab (CI & Job Runners), 10mwbot-rs: GitLab CI keeps SIGKILLing Rust jobs - https://phabricator.wikimedia.org/T349786 (10Legoktm) [01:58:35] 10GitLab (CI & Job Runners), 10mwbot-rs: GitLab CI keeps SIGKILLing Rust jobs - https://phabricator.wikimedia.org/T349786 (10Legoktm) p:05Triage→03High [04:15:45] PROBLEM - Check systemd state on doc2002 is CRITICAL: CRITICAL - degraded: The following units failed: rsync-doc-host-data-sync.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:12:17] RECOVERY - Check systemd state on doc2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:21:35] (03PS1) 10Zoranzoki21: Zuul: Archive the WikiLexicalData extension [integration/config] - 10https://gerrit.wikimedia.org/r/968949 (https://phabricator.wikimedia.org/T271379) [07:21:46] 10Release-Engineering-Team, 10Wikibase Product Platform Team WPP (Sprint 5), 10ci-test-error (WMF-deployed Build Failure): Wikibase CI is broken - https://phabricator.wikimedia.org/T348243 (10Silvan_WMDE) 05Resolved→03Open Re-opening, since this is happening again. Example builds: https://integration.wi... [07:38:39] !log integrastion-castor05: `sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node16-rundoc-docker` # T348243 [07:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:38:43] T348243: Wikibase CI is broken - https://phabricator.wikimedia.org/T348243 [07:49:03] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, 10serviceops: Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10Joe) [07:53:38] 10Release-Engineering-Team (Seen), 10MW-on-K8s, 10SRE, 10Traffic, 10serviceops: Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10Joe) p:05Triage→03High [08:01:16] 10Release-Engineering-Team, 10Patch-For-Review, 10Wikibase Product Platform Team WPP (Sprint 5), 10ci-test-error (WMF-deployed Build Failure): Wikibase CI is broken - https://phabricator.wikimedia.org/T348243 (10hashar) 05Open→03Resolved @Silvan_WMDE I have cleared the cache for this job/branch from th... [08:09:40] 10Release-Engineering-Team, 10Patch-For-Review, 10Wikibase Product Platform Team WPP (Sprint 5), 10ci-test-error (WMF-deployed Build Failure): Wikibase CI is broken - https://phabricator.wikimedia.org/T348243 (10Silvan_WMDE) Thank you, that fixed it. 👍️ [08:40:45] 10Release-Engineering-Team, 10Tech-Docs-Team, 10Documentation: Deployment pipeline (GitLab/Kokkuri/Blubber) documentation improvements - https://phabricator.wikimedia.org/T342317 (10KBach) [08:41:04] (03PS1) 10Zoranzoki21: Zuul: Archive the RevisionCommentSupplement extension [integration/config] - 10https://gerrit.wikimedia.org/r/968991 (https://phabricator.wikimedia.org/T343616) [08:51:22] 10Release-Engineering-Team (Radar), 10Tech-Docs-Team, 10Documentation: Create a high-level deployment pipeline overview - https://phabricator.wikimedia.org/T349799 (10KBach) [08:51:30] 10Release-Engineering-Team (Radar), 10Tech-Docs-Team, 10Documentation: Create a high-level deployment pipeline overview - https://phabricator.wikimedia.org/T349799 (10KBach) 05Open→03In progress p:05Triage→03Medium This work is currently in progress. First sections will be available for review shortly. [08:51:32] 10Release-Engineering-Team, 10Tech-Docs-Team, 10Documentation: Deployment pipeline (GitLab/Kokkuri/Blubber) documentation improvements - https://phabricator.wikimedia.org/T342317 (10KBach) [08:56:48] 10Release-Engineering-Team (Radar), 10Tech-Docs-Team, 10Documentation: User research - https://phabricator.wikimedia.org/T348854 (10KBach) Three interviews conducted so far. Another one scheduled for next week. [09:46:09] 10Phabricator maintenance bot, 10API Platform, 10Content-Transform-Team, 10MediaWiki-Engineering: Stop adding Platform Engineering to new wiki creation tasks - https://phabricator.wikimedia.org/T346811 (10MSantos) We had a discussion at the RESTBase Sunset meeting, @hnowlan said that he could do the techni... [10:50:49] 10Beta-Cluster-Infrastructure, 10serviceops: Unable to upload files on Beta Commons - https://phabricator.wikimedia.org/T340908 (10jijiki) `redis::multidc` was a very complicated and not well maintained part of the infrastructure, so as soon as we moved `MainStash` out of redis T212129, we wanted it out of pup... [11:49:02] 10Continuous-Integration-Infrastructure, 10collaboration-services, 10doc.wikimedia.org, 10Patch-For-Review: doc.wikimedia.org sytemd timer rsync-doc-host-data-sync.service fails due to vanishing files - https://phabricator.wikimedia.org/T349166 (10eoghan) a:03eoghan [11:57:42] 10Diffusion, 10Phabricator (Upstream), 10Upstream: Reduce the number of six default URIs in Diffusion - https://phabricator.wikimedia.org/T244907 (10Aklapper) [11:58:32] 10Continuous-Integration-Infrastructure, 10collaboration-services, 10doc.wikimedia.org, 10Patch-For-Review: doc.wikimedia.org sytemd timer rsync-doc-host-data-sync.service fails due to vanishing files - https://phabricator.wikimedia.org/T349166 (10eoghan) 05Open→03Resolved @hashar Nice idea! We had pre... [12:40:25] 10Diffusion, 10Phabricator, 10Release-Engineering-Team (Social Piranhas 🐟), 10User-brennen: Understand which repositories we mirror, observe, host in Diffusion (and fix some findings) - https://phabricator.wikimedia.org/T347577 (10Aklapper) Sharing notes while I'm trying to understand our state of things (... [13:04:55] (03PS1) 10Hslater: Add VisualEditor as dependency to AtMentions [integration/config] - 10https://gerrit.wikimedia.org/r/969114 [13:27:49] hello! Could I get a quick validation on this change for the restbase scap please? Doesn't even need to be a review, just that this command makes sense/will work https://gerrit.wikimedia.org/r/c/mediawiki/services/restbase/deploy/+/969077 [13:28:13] by that I mean how the command is defined in scap config, not the command itself [14:01:46] <_joe_> hnowlan: ahhh snap [14:01:53] <_joe_> hnowlan: yeah looks allright to me [14:02:52] <_joe_> jnuche: I will need to make commits and merge them on gerrit from a script, like trainbranchbot does. Can I ask you how did you create the gerrit account for it? [14:04:38] <_joe_> I can look at the code in scap backport to figure out the rest ofc, but if you have pointers to how that all works it would be great [14:08:00] _joe_: not sure who created that account originally [14:08:06] pinging dancy, he probably knows more about that [14:09:00] <_joe_> thanks :) [14:09:36] <_joe_> btw, mayybe it would be best to move that repo to gitlab and use any mechanism we have there for automating commits? [14:12:27] 10GitLab (Project Migration), 10collaboration-services, 10SRE Observability (FY2023/2024-Q2): Migrate SRE repositories to GitLab - operations/alerts - https://phabricator.wikimedia.org/T349626 (10LSobanski) [14:13:12] hmm, the only repo using trainbranchbot I'm awayre of rn is the release tools, which is already in gitlab: https://gitlab.wikimedia.org/repos/releng/release [14:13:20] what repo were you thinking of? [14:14:57] <_joe_> jnuche: yeah but it's acting on a gerrit repo, right? [14:15:11] <_joe_> jnuche: operations/docker-images/production-images [14:15:50] <_joe_> so context is I want to make a commit once per week to that repo, automated, to rebuild a refreshed version of images [14:18:01] ack, ok, yeah, on multiple gerrit repos in fact [14:18:07] so yeah, I imagine your use case should be similar [14:18:20] _joe_: thanks! [14:18:38] https://ldap.toolforge.org/user/trainbranchbot probably shouldn't have an email address of a former employee :/ [14:18:39] <_joe_> hnowlan: we removed the nrpe checks, did we [14:19:03] <_joe_> taavi: at least it's a former employee we like :P [14:19:40] <_joe_> but yeah, that was my first doubt - should I just add an ldap user? Or can we have local users just for this function [14:21:16] _joe_: yep, they were too noisy [14:31:57] _joe_: regarding possibly moving the repo to gitlab, I can't find any examples of us doing automated commits atm, but I believe it should be possible to create a user analogous to trainbranchbot and then use an impersonation token to push commits: https://docs.gitlab.com/ee/api/rest/index.html#impersonation-tokens [14:39:16] o/ [14:40:42] <_joe_> ack [14:42:55] <_joe_> my plan long term would be: move these repos to gitlab, and upon merge run docker-pkg with a buildkit backend so we can build and publish images after merge [14:43:21] <_joe_> but that needs for me to take the time to learn how we use buildctl in kokkuri a bit more in detail [14:48:03] _joe_: Sounds like in the meantime you want a new local Gerrit account to be created. Do you have a name in mind? [14:48:22] <_joe_> dancy: $something-bot [14:48:24] <_joe_> :P [14:48:40] <_joe_> jokes aside... imageupdatebot? [14:48:52] Ok. [14:57:28] !log Created imageupdatebot Gerrit account [14:57:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:58:45] <_joe_> <3 [15:07:12] 10GitLab (CI & Job Runners), 10mwbot-rs: GitLab CI keeps SIGKILLing Rust jobs - https://phabricator.wikimedia.org/T349786 (10dancy) Being killed by SIGKILL is a good indicator that the job has exceeded memory limits. The memory limit is 1GB. For jobs that need more memory you can add the `memory-optimized` ta... [15:11:37] hi folks! I need to test eBPF related code what would be the best way in Gitlab CI? could I spawn additional containers that could potentially load the eBPF code to run the tests? [15:26:47] <_joe_> vgutierrez: why would you need /additional/ containers? [15:27:22] <_joe_> you can just run a container directly from gitlab's CI [15:27:44] <_joe_> but I'm not sure you have the ability to load ebpf code in the kernel from one of those containers [15:27:46] I don't wanna potentially mess with the networking of the CI infra [15:27:50] <_joe_> actually I strongly hope not [15:28:06] <_joe_> err doesn't ebpf get loaded in the kernel of the host? [15:28:17] it gets loaded in the kernel [15:28:23] if we are talking about containers, just one kernel around [15:28:38] <_joe_> so I fail to see how launching an additional container would save you from messing with the CI networking [15:28:58] on a different cluster/VM/whatever :) [15:29:21] I'd hate to end up with tests that need to run on the developer computer TBH [15:29:34] <_joe_> no I 100% understand it [15:29:42] <_joe_> OTOH you see why the general CI can't support that [15:29:51] sure [15:29:52] <_joe_> but we can set up our own runners for this repo [15:30:00] <_joe_> this seems a very specific need [15:30:17] I'm just wondering if we have a runner that lets spawn a Vagrant environment [15:30:21] <_joe_> so what you need is to learn how to run your own gitlab CI runner in say digitalocean (whcih we support) [15:30:41] errr do I need to maintain that runner? :) [15:31:06] <_joe_> yes I think [15:31:13] <_joe_> but on that I'm not sure [15:31:47] 10Diffusion, 10Phabricator, 10Release-Engineering-Team (Social Piranhas 🐟), 10User-brennen: Understand which repositories we mirror, observe, host in Diffusion (and fix some findings) - https://phabricator.wikimedia.org/T347577 (10Aklapper) Getting back to the task title: Understand which Diffusion reposit... [15:31:54] <_joe_> I'd suggest to talk to the collab team, it's a pretty peculiar need you have [15:33:33] so.. nobody else testing kernel related stuff? :) [15:33:44] <_joe_> I doubt it :D [15:35:49] 10GitLab (Project Migration), 10Phabricator, 10Release-Engineering-Team (Priority Backlog 📥), 10collaboration-services, and 3 others: Migrate active repositories in Phabricator Differential to GitLab - https://phabricator.wikimedia.org/T191182 (10Aklapper) Per T347577#9284700 last bullet point we have 10 m... [15:38:36] 10Release-Engineering-Team, 10RESTBase, 10serviceops-radar: RESTBase scap deployment failed - https://phabricator.wikimedia.org/T349318 (10Jgiannelos) We just run a succesful scap deploy on restbase. [15:38:47] 10Release-Engineering-Team, 10RESTBase, 10serviceops-radar: RESTBase scap deployment failed - https://phabricator.wikimedia.org/T349318 (10Jgiannelos) 05Open→03Resolved a:03Jgiannelos [15:40:05] _joe_: so I can use something like testify and mock cilium/ebpf but it would be nice to have some integration tests with the whole thing being tested [15:46:33] Hi, I am trying to setup CI for a small node library we are building on gitlab. I am currently facing some issues with the service definition. Is there any way I can get the logs a service emits? [15:47:52] https://docs.gitlab.com/ee/ci/services/#capturing-service-container-logs [15:48:40] nemo-yiannis ^^ [15:50:49] I tried this but didn't show up anything, let me try it again. [15:58:15] 10Beta-Cluster-Infrastructure, 10serviceops: Unable to upload files on Beta Commons - https://phabricator.wikimedia.org/T340908 (10Etonkovidova) [16:27:13] 10Beta-Cluster-Infrastructure, 10serviceops: Unable to upload files on Beta Commons - https://phabricator.wikimedia.org/T340908 (10Tgr) In production, rdb1 / rdb2 / rdb3 (which point to rdb1009 and rdb1011) use the [[https://gerrit.wikimedia.org/g/operations/puppet/+/ba56daa92828a4803cdee60b8fbe198106597976/hi... [16:47:17] dancy: still no luck. Should i file a ticket for that ? [17:09:41] Yes please. Be sure to include links to your .gitlab-ci.yml file and the failing jobs. [19:35:47] 10Phabricator: Make sure anti-vandalism features are up to snuff - https://phabricator.wikimedia.org/T84 (10Aklapper) [21:13:48] 10Phabricator, 10Legalpad, 10WMF-Legal, 10WMF-NDA-Requests, 10User-AKlapper: Clarify if NDAs (to access #WMF-NDA protected Phab tasks) are on paper or in Legalpad's L2 or both - https://phabricator.wikimedia.org/T349595 (10KFrancis) Hi all, my apologies or the delay on this. Would it be possible to eith... [21:56:23] 10Phabricator, 10Legalpad, 10WMF-Legal, 10WMF-NDA-Requests, 10User-AKlapper: Clarify if NDAs (to access #WMF-NDA protected Phab tasks) are on paper or in Legalpad's L2 or both - https://phabricator.wikimedia.org/T349595 (10Aklapper) @KFrancis: No problem! :) I think you should be able to access L2 now. [22:11:07] 10GitLab (Misc), 10Release-Engineering-Team (Priority Backlog 📥): Investigate and document "Depends-On" in GitLab - https://phabricator.wikimedia.org/T349872 (10thcipriani) [22:13:09] 10GitLab (Misc), 10Release-Engineering-Team (Priority Backlog 📥): Investigate and document stacked merge requests - https://phabricator.wikimedia.org/T300819 (10thcipriani) >>! In T300819#9212672, @thcipriani wrote: >>>! In T300819#9046388, @cscott wrote: >> This seems related to the general 'Depends-On' featu... [23:26:20] 10Phabricator, 10Legalpad, 10WMF-Legal, 10WMF-NDA-Requests, 10User-AKlapper: Clarify if NDAs (to access #WMF-NDA protected Phab tasks) are on paper or in Legalpad's L2 or both - https://phabricator.wikimedia.org/T349595 (10KFrancis) Thanks so much! Would you please add access for James Buatti jbuatti@wik...