[03:33:25] (03PS6) 10Subramanya Sastry: Config for running diffs with core-integrated Parsoid [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/933461 [03:33:27] (03PS6) 10Subramanya Sastry: Reorg configs and helper files [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/933466 [03:33:29] (03PS5) 10Subramanya Sastry: Remove stale version code [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/933601 [03:34:05] (03CR) 10CI reject: [V: 04-1] Remove stale version code [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/933601 (owner: 10Subramanya Sastry) [03:34:21] (03CR) 10CI reject: [V: 04-1] Reorg configs and helper files [integration/visualdiff] - 10https://gerrit.wikimedia.org/r/933466 (owner: 10Subramanya Sastry) [04:28:15] 10Gerrit, 10VPS-project-Codesearch, 10VPS-project-Extdist, 10serviceops-collab: Move clients off of gerrit-replica.wikimedia.org back to gerrit.wikimedia.org - https://phabricator.wikimedia.org/T336710 (10hashar) > codesearch was switched to main gerrit in https://gerrit.wikimedia.org/r/c/labs/codesearch/+... [05:49:05] (03PS4) 10Hashar: Fix wm-custom-links to show links in footer again [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/932641 (https://phabricator.wikimedia.org/T340372) (owner: 10Paladox) [05:51:31] (03CR) 10Hashar: [C: 03+1] "I am pretty sure I have tested it when splitting the feature to a standalone javascript file which would imply that got broken while doing" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/932641 (https://phabricator.wikimedia.org/T340372) (owner: 10Paladox) [06:06:24] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Machine-Learning-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10isarantopoulos) The intended use of this image is to be used with GPUs, so torch/lib/rocblas is def needed for operation... [06:16:42] (03Abandoned) 10Hashar: [WIP] dockerfiles: Provide tox-bullseye, dropping Python 2.x etc. [integration/config] - 10https://gerrit.wikimedia.org/r/853301 (owner: 10Jforrester) [09:05:07] (03PS1) 10David Caro: operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 [09:05:33] (03CR) 10David Caro: operations-puppet: install curl for easy functional tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [09:06:38] (03PS2) 10David Caro: operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 [09:06:53] (03CR) 10David Caro: operations-puppet: install curl for easy functional tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [09:07:45] (03CR) 10CI reject: [V: 04-1] operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [09:10:04] (03PS3) 10David Caro: operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 [09:16:27] (03PS4) 10David Caro: operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 [09:17:53] (03CR) 10Jbond: [C: 03+1] "lgtm" [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [09:52:53] I need some hand-holding to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/933863, anyone around to guide me? [09:54:03] dcaro: heh [09:54:29] https://www.mediawiki.org/wiki/Continuous_integration/Docker#Deploy_images [09:54:36] In theory, it should be "simple" [09:55:10] +2 it :D [09:55:40] thanks! [09:56:32] oh, I should +2 it, not you +2Β΄d it xd [09:56:35] (03CR) 10David Caro: [C: 03+2] operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [09:57:46] heh [09:58:24] (03Merged) 10jenkins-bot: operations-puppet: install curl for easy functional tests [integration/config] - 10https://gerrit.wikimedia.org/r/933863 (owner: 10David Caro) [10:01:13] !log Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/933863 [10:01:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:05:18] oops, got `Timeout, server contint.wikimedia.org not responding.` and the fab command failed [10:05:28] should I rerun it? [10:08:23] Do you know if you can successfullly SSH to it? [10:10:04] I do yes: [10:10:06] https://www.irccloud.com/pastebin/4a4mKC1W/ [10:11:13] it started doing things, it was building the docker image for operations-puppet I think [10:11:16] 2023-06-28 10:02:17,791 [docker-pkg-build] INFO - Generated dockerfile for docker-registry.discovery.wmnet/releng/operations-puppet:0.8.11: [10:11:25] https://www.irccloud.com/pastebin/f9KCRiQY/ [10:14:50] My local connection is quite stable (I've been connected through ssh to some VMs for several days already without the connection breaking) [10:16:28] * Reedy tries [10:16:53] !log Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/933863 [10:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:17:39] WFM [10:18:13] dcaro: https://phabricator.wikimedia.org/P49485 [10:18:44] did it build the new image? [10:19:16] it's not listed on https://docker-registry.wikimedia.org/releng/operations-puppet/tags/ [10:19:19] but also, Last updated at: 2023-06-28 09:32. [10:19:35] there's no 0.8.10 either [10:19:44] yep, just noticed that too :/ [10:19:51] does `docker pull docker-registry.wikimedia.org/releng/operations-puppet:0.8.11` work? (I don't really have a docker install locally) [10:20:10] == Step 0: scanning /etc/zuul/wikimedia/dockerfiles == [10:20:10] Will build the following images: [10:20:14] I'm guessing it didn't [10:20:18] pulling latest brings the new changes though [10:24:01] They're on disk [10:24:51] (the changes you merged) [10:25:59] hashar: Is there a way to get contint to force build a specific image? [10:27:13] I'd be half tempted just doing a changelog bump and merge/deploy [10:27:42] dcaro: Want to try a -s1 type "rebuild", and see if it'll let me build the image? [10:28:26] I don't know what that means, but I'm happy to try :) [10:28:58] like, a noop changelog update [10:32:37] * Reedy tries to remember how to make dch play ball [10:32:58] oh, sure, than changelog I made manually, I never remember the command [10:33:08] heh [10:37:05] (03PS1) 10Reedy: operations-puppet: rebuild [integration/config] - 10https://gerrit.wikimedia.org/r/933878 [10:37:29] (03CR) 10David Caro: [C: 03+1] "LGTM" [integration/config] - 10https://gerrit.wikimedia.org/r/933878 (owner: 10Reedy) [10:39:08] (03CR) 10Reedy: [C: 03+2] operations-puppet: rebuild [integration/config] - 10https://gerrit.wikimedia.org/r/933878 (owner: 10Reedy) [10:40:20] (03Merged) 10jenkins-bot: operations-puppet: rebuild [integration/config] - 10https://gerrit.wikimedia.org/r/933878 (owner: 10Reedy) [10:41:19] !log Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/933878 [10:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:42:21] dcaro: I think it just doesn't like you :P [10:42:22] 2023-06-28 10:42:07,079 [docker-pkg-build] INFO - Generated dockerfile for docker-registry.discovery.wmnet/releng/operations-puppet:0.8.11-s1: [10:42:22] FROM docker-registry.discovery.wmnet/releng/ci-buster:0.4.0 [10:42:26] (tis building now) [10:42:42] that's where I got the timeout I think [10:44:10] dcaro, Reedy: `docker pull docker-registry.wikimedia.org/releng/operations-puppet:0.8.11` worked for me [10:44:15] jnuche@erebor:/tmp $ docker images [10:44:15] REPOSITORY TAG IMAGE ID CREATED SIZE [10:44:15] docker-registry.wikimedia.org/releng/operations-puppet 0.8.11 0e23b5282471 27 minutes ago 1.56GB [10:44:23] it's not finished [10:44:28] oh [10:44:39] nice \o/ [10:45:05] maybe it took a bit to finish building+pushing to the registry? dunno [10:46:00] mine still running... [10:46:24] I think my run got half there (released latest, but not the tag), and this run might have just finish creating the tagged version? [10:50:24] whelp, according to docker the image was created ~35 minutes ago, that would match your run dcaro [10:51:27] ah, but that timestamp is for the layer, not the tag [10:51:28] nm [10:54:08] it's working for me locally :) (the new image), thanks a lot! [10:55:03] my rebuild is nearly done [10:55:08] So we can update the jobs to use it [10:57:03] done [11:00:13] https://github.com/wikimedia/operations-puppet/blob/3d3469a01e55ac21e46441ba63232b5f75e635a2/modules/docker_registry_ha/manifests/web.pp#L185 [11:00:19] looks like the registry should update in an hour or so [11:00:38] dcaro: Do you need/want to change the images being used in CI? Or is this mostly for local dev? [11:01:10] I'm still in the local dev phase, but the CI images will have to update too yes [11:01:25] that part is easier(TM) [11:03:16] awesome, gtg, but I'll be back in a bit [11:38:29] * dcaro back [11:39:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Cloud-VPS (Quota-requests): Rebuild WMCS integration instances to larger flavor - https://phabricator.wikimedia.org/T340070 (10rook) @hashar The integration project is currently has gigabytes quota at 400G, I'm not quite understanding how... [11:40:54] (03PS1) 10David Caro: operations-puppet: use image 0.8.11-s1 [integration/config] - 10https://gerrit.wikimedia.org/r/933889 [11:41:52] I'm guessing something like https://gerrit.wikimedia.org/r/c/integration/config/+/933889 is what's needed? (and deploying it) [11:43:28] (03CR) 10Reedy: [C: 03+1] operations-puppet: use image 0.8.11-s1 [integration/config] - 10https://gerrit.wikimedia.org/r/933889 (owner: 10David Caro) [11:43:30] ja [11:46:31] according to the readme, I should run `./jjb-update 'updated-jobs*'` to update the jobs **before** merging it right? [11:46:52] o_0 [11:47:06] oh, yeah [11:47:09] well, meh [11:47:30] it confirms it will build by running it locally before merging [11:48:43] (03CR) 10David Caro: [C: 03+2] operations-puppet: use image 0.8.11-s1 [integration/config] - 10https://gerrit.wikimedia.org/r/933889 (owner: 10David Caro) [11:49:38] jjb-update fails though, it tries to connect to localhost:8080/crumbIssuer/api/json [11:49:51] (03Merged) 10jenkins-bot: operations-puppet: use image 0.8.11-s1 [integration/config] - 10https://gerrit.wikimedia.org/r/933889 (owner: 10David Caro) [11:50:39] how should I deploy it then? [11:51:07] have you done https://www.mediawiki.org/wiki/Continuous_integration/Jenkins_job_builder#Configure_JJB ? [11:51:34] πŸ‘€ [11:55:44] 10Phabricator Antivandalism Extension: AVA init-script.php code checks for non-existent libphutil - https://phabricator.wikimedia.org/T340633 (10Aklapper) p:05Triageβ†’03Low [11:57:50] that worked \o/ nice [11:57:51] thanks! [11:58:23] yay [12:05:53] maintenance-disconnect-full-disks build 504002 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 100%): OFFLINE due to disk space [12:06:29] 10GitLab (Auth & Access), 10Release-Engineering-Team: Requesting access to GitLab for AndrewTavis_WMDE - https://phabricator.wikimedia.org/T338785 (10AndrewTavis_WMDE) @thcipriani, quick question related to GitLab that I wasn't able to see in [[ https://wikitech.wikimedia.org/wiki/GitLab#SSH_fingerprints | the... [12:21:06] maintenance-disconnect-full-disks build 504005 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [12:35:37] maintenance-disconnect-full-disks build 504008 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): OFFLINE due to disk space [12:41:43] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Machine-Learning-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10isarantopoulos) There isn't any cache in the image since only the specific files are copied in the production variant. B... [12:45:52] maintenance-disconnect-full-disks build 504010 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [12:45:52] maintenance-disconnect-full-disks build 504010 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [13:10:32] maintenance-disconnect-full-disks build 504015 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [13:10:32] maintenance-disconnect-full-disks build 504015 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [13:12:47] 10Beta-Cluster-Infrastructure, 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.0 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10dom_walden) I tested API:Compare on most of the beta wikis[1], just checking that it cou... [13:16:51] 10GitLab (Pipeline Services Migration🐀), 10Release-Engineering-Team (They Live πŸ•ΆοΈπŸ§Ÿ), 10serviceops-collab: Provide mechanism to publish to doc.wikimedia.org from GitLab CI - https://phabricator.wikimedia.org/T336168 (10jnuche) 05Openβ†’03Resolved Doc publishing has now been moved to the releases Jenkins ins... [13:17:23] 10GitLab (Project Migration), 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10API Platform, 10Anti-Harassment, and 19 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10jnuche) [13:35:39] (03CR) 10Paladox: [C: 03+1] "@hashar works for me. Tested locally. I'm happy with this." [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/932641 (https://phabricator.wikimedia.org/T340372) (owner: 10Paladox) [13:36:19] maintenance-disconnect-full-disks build 504020 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [13:36:19] maintenance-disconnect-full-disks build 504020 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [13:38:42] 10Phabricator Antivandalism Extension, 10Release-Engineering-Team (They Live πŸ•ΆοΈπŸ§Ÿ): AVA init-script.php code checks for non-existent libphutil - https://phabricator.wikimedia.org/T340633 (10Aklapper) a:03Aklapper Indeed running `php ./libext/ava/scripts/score.php` says triggers [that error](https://gerrit.wik... [13:40:58] (03CR) 10Hashar: [C: 03+2] "Cool!" [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/932641 (https://phabricator.wikimedia.org/T340372) (owner: 10Paladox) [13:41:40] (03Merged) 10jenkins-bot: Fix wm-custom-links to show links in footer again [software/gerrit] (deploy/wmf/stable-3.5) - 10https://gerrit.wikimedia.org/r/932641 (https://phabricator.wikimedia.org/T340372) (owner: 10Paladox) [13:44:39] 10Phabricator Antivandalism Extension, 10Release-Engineering-Team (They Live πŸ•ΆοΈπŸ§Ÿ), 10Patch-For-Review: AVA init-script.php code checks for non-existent libphutil - https://phabricator.wikimedia.org/T340633 (10SLyngshede-WMF) @Aklapper did you run the php ./libext/ava/scripts/score.php round 15:30, so 8 minu... [14:00:47] maintenance-disconnect-full-disks build 504025 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [14:00:47] maintenance-disconnect-full-disks build 504025 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [14:01:22] 10Phabricator Antivandalism Extension, 10Release-Engineering-Team (They Live πŸ•ΆοΈπŸ§Ÿ), 10Patch-For-Review: AVA init-script.php code checks for non-existent libphutil - https://phabricator.wikimedia.org/T340633 (10Aklapper) @SLyngshede-WMF I did not run this on any production server (and I think I cannot anyway d... [14:03:01] 10Gerrit, 10Patch-For-Review, 10Regression: Custom Gerrit footer not displayed - https://phabricator.wikimedia.org/T340372 (10Paladox) 05Openβ†’03Resolved a:03Paladox [14:03:38] 10Phabricator Antivandalism Extension, 10Release-Engineering-Team (They Live πŸ•ΆοΈπŸ§Ÿ), 10Patch-For-Review: AVA init-script.php code checks for non-existent libphutil - https://phabricator.wikimedia.org/T340633 (10SLyngshede-WMF) @Aklapper Thank you for looking, I didn't really see anything suspicious in the logs... [14:03:45] 10Gerrit, 10serviceops-collab: Gerrit LFS objects lack an automatic sync to gerrit replicas - https://phabricator.wikimedia.org/T257741 (10LSobanski) 05Openβ†’03Resolved [14:12:18] 10Gerrit, 10Regression: Custom Gerrit footer not displayed - https://phabricator.wikimedia.org/T340372 (10Aklapper) Thank you! [14:16:01] maintenance-disconnect-full-disks build 504028 integration-agent-docker-1024 (/: 27%, /srv: 16%, /var/lib/docker: 97%): OFFLINE due to disk space [14:25:43] maintenance-disconnect-full-disks build 504030 integration-agent-docker-1024 (/: 27%, /srv: 16%, /var/lib/docker: 98%): still OFFLINE due to disk space [14:25:43] maintenance-disconnect-full-disks build 504030 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [14:25:43] maintenance-disconnect-full-disks build 504030 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [14:30:52] maintenance-disconnect-full-disks build 504031 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 98%): RECOVERY disk space OK [14:34:20] 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10CodeReviewBot) jnuche opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/157 startup: stop checking for a Python... [14:34:35] 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10CodeReviewBot) [14:35:40] 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10CodeReviewBot) jnuche opened https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/23 remove signal file to skip venv... [14:39:18] 10Phabricator, 10Release-Engineering-Team, 10VPS-project-Phabricator, 10User-brennen: After a deployment, Phabricator errors out with `Unable to load the "Arcanist" library. Put "arcanist/" next to "phabricator/" on disk.` - https://phabricator.wikimedia.org/T314460 (10Aklapper) https://gerrit.wikimedia.or... [14:45:58] maintenance-disconnect-full-disks build 504034 integration-agent-docker-1024 (/: 27%, /srv: 18%, /var/lib/docker: 100%): OFFLINE due to disk space [14:50:56] maintenance-disconnect-full-disks build 504035 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 99%): RECOVERY disk space OK [14:50:56] maintenance-disconnect-full-disks build 504035 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [14:50:56] maintenance-disconnect-full-disks build 504035 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [14:53:39] 10Gerrit, 10Regression: Custom Gerrit footer not displayed - https://phabricator.wikimedia.org/T340372 (10hashar) I am not sure why it broke, it either comes from: * the split of the old javascript file to multiple files (`plugins/wm-custom-links`). Although I am pretty sure I tested it, I might forgot to ensu... [14:58:40] 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10CodeReviewBot) dancy merged https://gitlab.wikimedia.org/repos/releng/train-dev/-/merge_requests/23 remove signal file to skip venv... [15:02:20] 10Release-Engineering-Team, 10Scap, 10Patch-For-Review: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10CodeReviewBot) jnuche merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/157 startup: stop checking for a Python... [15:03:26] 10GitLab (CI & Job Runners), 10Release-Engineering-Team: GitLab CI: "ENOSPC: no space left on device, mkdir" - https://phabricator.wikimedia.org/T340586 (10Jdforrester-WMF) Do we need to add the auto-cleaner script to the GitLab runners like we do for the Jenkins agents? [15:11:19] maintenance-disconnect-full-disks build 504039 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 98%): OFFLINE due to disk space [15:16:13] maintenance-disconnect-full-disks build 504040 integration-agent-docker-1024 (/: 27%, /srv: 17%, /var/lib/docker: 98%): OFFLINE due to disk space [15:16:13] maintenance-disconnect-full-disks build 504040 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 95%): still OFFLINE due to disk space [15:16:13] maintenance-disconnect-full-disks build 504040 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [15:16:13] maintenance-disconnect-full-disks build 504040 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [15:21:16] maintenance-disconnect-full-disks build 504041 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 98%): RECOVERY disk space OK [15:21:16] maintenance-disconnect-full-disks build 504041 integration-agent-docker-1029 (/: 29%, /srv: 16%, /var/lib/docker: 99%): OFFLINE due to disk space [15:21:17] maintenance-disconnect-full-disks build 504041 integration-agent-docker-1038 (/: 29%, /srv: 20%, /var/lib/docker: 99%): OFFLINE due to disk space [15:25:59] maintenance-disconnect-full-disks build 504042 integration-agent-docker-1024 (/: 27%, /srv: 12%, /var/lib/docker: 98%): OFFLINE due to disk space [15:25:59] maintenance-disconnect-full-disks build 504042 integration-agent-docker-1029 (/: 29%, /srv: 17%, /var/lib/docker: 84%): RECOVERY disk space OK [15:27:40] 10Release-Engineering-Team, 10Scap: "Not running from a virtual environment." warning in generated scap docs - https://phabricator.wikimedia.org/T337493 (10jnuche) 05Openβ†’03Resolved a:03jnuche [15:31:08] maintenance-disconnect-full-disks build 504043 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 98%): RECOVERY disk space OK [15:35:14] I'm getting pretty badly affected by out-of-space errors on this CR: https://gerrit.wikimedia.org/r/c/analytics/datahub/+/933890 Is there anything I can do to help? [15:36:45] maintenance-disconnect-full-disks build 504044 integration-agent-docker-1038 (/: 29%, /srv: 11%, /var/lib/docker: 98%): RECOVERY disk space OK [15:40:55] maintenance-disconnect-full-disks build 504045 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 95%): still OFFLINE due to disk space [15:40:55] maintenance-disconnect-full-disks build 504045 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [15:40:55] maintenance-disconnect-full-disks build 504045 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [15:46:08] maintenance-disconnect-full-disks build 504046 integration-agent-docker-1024 (/: 27%, /srv: 17%, /var/lib/docker: 100%): OFFLINE due to disk space [15:46:08] maintenance-disconnect-full-disks build 504046 integration-agent-docker-1037 (/: 29%, /srv: 30%, /var/lib/docker: 96%): OFFLINE due to disk space [15:46:08] maintenance-disconnect-full-disks build 504046 integration-agent-docker-1038 (/: 29%, /srv: 18%, /var/lib/docker: 98%): OFFLINE due to disk space [15:50:26] 10Phabricator, 10Release-Engineering-Team, 10VPS-project-Phabricator, 10User-brennen: After a deployment, Phabricator errors out with `Unable to load the "Arcanist" library. Put "arcanist/" next to "phabricator/" on disk.` - https://phabricator.wikimedia.org/T314460 (10brennen) I had that thought, but thin... [15:51:02] maintenance-disconnect-full-disks build 504047 integration-agent-docker-1024 (/: 27%, /srv: 23%, /var/lib/docker: 100%): RECOVERY disk space OK [15:51:02] maintenance-disconnect-full-disks build 504047 integration-agent-docker-1037 (/: 29%, /srv: 13%, /var/lib/docker: 95%): RECOVERY disk space OK [15:56:01] maintenance-disconnect-full-disks build 504048 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 100%): OFFLINE due to disk space [16:01:17] maintenance-disconnect-full-disks build 504049 integration-agent-docker-1023 (/: 29%, /srv: 37%, /var/lib/docker: 98%): OFFLINE due to disk space [16:01:17] maintenance-disconnect-full-disks build 504049 integration-agent-docker-1024 (/: 27%, /srv: 13%, /var/lib/docker: 94%): RECOVERY disk space OK [16:01:18] maintenance-disconnect-full-disks build 504049 integration-agent-docker-1038 (/: 29%, /srv: 18%, /var/lib/docker: 98%): RECOVERY disk space OK [16:06:09] maintenance-disconnect-full-disks build 504050 integration-agent-docker-1023 (/: 29%, /srv: 18%, /var/lib/docker: 96%): RECOVERY disk space OK [16:06:09] maintenance-disconnect-full-disks build 504050 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 95%): still OFFLINE due to disk space [16:06:09] maintenance-disconnect-full-disks build 504050 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [16:06:10] maintenance-disconnect-full-disks build 504050 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [16:17:00] maintenance-disconnect-full-disks build 504052 integration-agent-docker-1038 (/: 29%, /srv: 23%, /var/lib/docker: 100%): OFFLINE due to disk space [16:21:14] maintenance-disconnect-full-disks build 504053 integration-agent-docker-1038 (/: 29%, /srv: 11%, /var/lib/docker: 98%): RECOVERY disk space OK [16:25:55] maintenance-disconnect-full-disks build 504054 integration-agent-docker-1024 (/: 27%, /srv: 19%, /var/lib/docker: 100%): OFFLINE due to disk space [16:31:18] maintenance-disconnect-full-disks build 504055 integration-agent-docker-1024 (/: 27%, /srv: 8%, /var/lib/docker: 96%): RECOVERY disk space OK [16:31:18] maintenance-disconnect-full-disks build 504055 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 95%): still OFFLINE due to disk space [16:31:18] maintenance-disconnect-full-disks build 504055 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 98%): still OFFLINE due to disk space [16:31:18] maintenance-disconnect-full-disks build 504055 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 95%): still OFFLINE due to disk space [16:36:00] maintenance-disconnect-full-disks build 504056 integration-agent-docker-1024 (/: 27%, /srv: 9%, /var/lib/docker: 96%): OFFLINE due to disk space [16:36:01] maintenance-disconnect-full-disks build 504056 integration-agent-docker-1038 (/: 29%, /srv: 16%, /var/lib/docker: 98%): OFFLINE due to disk space [16:39:03] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Cloud-VPS (Quota-requests): Rebuild WMCS integration instances to larger flavor - https://phabricator.wikimedia.org/T340070 (10hashar) I filed it as a place holder to request the quota while I was investigating the root cause of the insta... [16:40:55] !log integration: sudo cumin --force 'name:docker' 'docker buildx prune --force' [16:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:41:00] maintenance-disconnect-full-disks build 504057 integration-agent-docker-1038 (/: 29%, /srv: 11%, /var/lib/docker: 64%): RECOVERY disk space OK [16:42:23] hashar: many thanks [16:42:54] they are going to fill up again though :-( [16:43:49] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Machine-Learning-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10hashar) 05Resolvedβ†’03Open [16:43:51] OH I KNOW [16:44:02] I can move that specific job to a standalone instance :) [16:45:36] maintenance-disconnect-full-disks build 504058 integration-agent-docker-1024 (/: 27%, /srv: 16%, /var/lib/docker: 9%): RECOVERY disk space OK [16:45:36] maintenance-disconnect-full-disks build 504058 integration-agent-docker-1025 (/: 29%, /srv: 14%, /var/lib/docker: 3%): RECOVERY disk space OK [16:45:36] maintenance-disconnect-full-disks build 504058 integration-agent-docker-1028 (/: 29%, /srv: 14%, /var/lib/docker: 3%): RECOVERY disk space OK [16:45:36] maintenance-disconnect-full-disks build 504058 integration-agent-docker-1033 (/: 29%, /srv: 8%, /var/lib/docker: 3%): RECOVERY disk space OK [16:47:57] yeah well no, cause that is in pipelinelib :/ [16:57:47] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.41.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T340243 (10brennen) Pre-emptively noting that I have a hard stop today at 16:45 MDT / 22:45 UTC, and RelEng is a bit s... [17:11:44] 10GitLab (CI & Job Runners), 10Release-Engineering-Team: GitLab CI: "ENOSPC: no space left on device, mkdir" - https://phabricator.wikimedia.org/T340586 (10thcipriani) >>! In T340586#8973755, @Jdforrester-WMF wrote: > Do we need to add the auto-cleaner script to the GitLab runners like we do for the Jenkins ag... [17:24:48] 10GitLab (CI & Job Runners), 10Release-Engineering-Team: GitLab CI: "ENOSPC: no space left on device, mkdir" - https://phabricator.wikimedia.org/T340586 (10hashar) `/var/lib/docker/overlay2` also holds BuildKit build cache which is not exposed by the `docker` command (afaik). So one has to use `sudo docker bui... [18:26:41] brennen: jnuche: btw, regarding "1 second phabricator outage" that only monitoring noticed. [18:26:44] Jun 28 12:33:56 phab1004 envoyproxy-hot-restarter[870]: [2023-06-28 12:33:56.745][2990][info][main] [source/server/hot_restarting_parent.cc:71] shutting down due to child request [18:27:17] envoyproxy was restarted, and it terminates TLS for apache in front of phab [18:50:35] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.41.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T340243 (10brennen) [18:51:07] mutante: ah, gotcha. [18:54:06] so because our monitoring is so good now, it caught it, or we were lucky/unlucky in timing. could happen with almost any service that is a webserver nowadays, doc, releases, etc [18:55:23] and there are restart scripts for it when for example infra-security deployes newer deb packages that have dependencies [18:56:11] so yea, it's kind of real but we also want to ignore it, let's just see if it happens again within .. 6 ..months [18:56:39] (because monitoring would also create tickets now) [19:06:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Cloud-VPS (Quota-requests): Rebuild WMCS integration instances to larger flavor - https://phabricator.wikimedia.org/T340070 (10rook) Oh I see, sorry I did not understand that you were seeking a new flavor. My newfound clarity brings a new... [19:10:53] 10Release-Engineering-Team (Priority Backlog πŸ“₯), 10Patch-For-Review, 10Release, 10Train Deployments, 10User-brennen: 1.41.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T340243 (10brennen) T340679 and T340682 are at enough volume that I'd like to see them fixed before going to all wikis. [19:12:31] hey @brennen you here? [19:14:07] https://phabricator.wikimedia.org/T340682 is very strange - I'm not 100% sure what's going on there. it's either something that will fix itself in 5 mins or something I've missed (looking into it) [19:15:23] https://phabricator.wikimedia.org/T340679 I can provide a patch for, but I [19:15:36] Jdlrobson: o/ [19:16:08] it's been... lemme see... [19:16:32] not quite an hour since the deploy [19:16:35] noticed on contint2001: Error: '/usr/bin/git remote set-url origin https://gitlab.wikimedia.org/releng/dev-images.git' returned 128 instead of one of [0] [19:16:48] something is up with releng/dev-images cloning [19:17:42] that's not even a clone, it's about trying to update the configured url [19:17:47] how about we just use rm -rf on repos and run puppet ones.. and dont try to switch them from gerrit to gitlab.. seems a lot easier [19:18:44] it's certainly about the puppet git::clone class [19:18:57] i wonder if it just hit some permission issue. [19:21:13] brennen: gadget ones should be easy to fix. [19:21:56] cool [19:22:00] the wikisource one.. if we can't find a reviewer, I can revert the debug log patch if necessary. Note: since there are no wikisource projects in group 2 this won't correspond with a spike when we roll out further [19:22:51] if it's just the one deprecation notice we can probably live with the current rate of it 'til next week [19:23:29] not *ideal* to have ~3k an hour... [19:24:04] but it could be filtered out of the dashboard at any rate, and mostly ignorable in logspam-watch if there's just one. [19:32:41] agreed. [19:32:51] I'm leaning towards a revert for the current train [19:33:57] unless there's a way to suppress certain warnings? Which I doubt? [19:35:44] brennen: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/933642 [19:36:25] brennen: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/933642 is a blocker for next train but it would suppress the warnings for now) [19:39:13] 10Release-Engineering-Team, 10Patch-For-Review, 10Puppet: Puppet git::clone probably does not need `umask` parameter - https://phabricator.wikimedia.org/T338277 (10Dzahn) deployed change to /etc/zuul/wikimedia on contint2002, contint1002, finally contint2001. zuul class does not use umask parameter anymore f... [19:40:21] Jdlrobson: i think reverting for now would probably make for less work this week, at least. [19:40:52] since i assume there'll be more with a new set of wikis? [19:42:23] defer to your judgment but that's where i lean as well. [19:42:47] brennen: I don't think it will increase (i'll work through the gadgets today). it's more about buying me more time :-) [19:43:00] so yeh let's revert and i'll spend rest of week preparing next week's train [19:43:16] Jdlrobson: ok, sounds good. i can go ahead and deploy that. [19:43:17] So yeh backporting https://gerrit.wikimedia.org/r/c/mediawiki/core/+/933642 sounds good. [19:56:07] brennen: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikisource/+/933643 is ready too! [20:03:38] Jdlrobson: ah, k. i can go ahead with that one next, although is the backport necessary with the revert? [20:14:35] (logspam has definitely shut off.) [20:37:05] I guess not :) [20:40:40] kk [21:02:11] 10Beta-Cluster-Infrastructure, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Current-Sprint): Create new participant questions columns in beta DB - https://phabricator.wikimedia.org/T340694 (10Daimona) [21:05:26] 10Beta-Cluster-Infrastructure, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Current-Sprint): Create new participant questions columns in beta DB - https://phabricator.wikimedia.org/T340694 (10Daimona) [21:09:43] 10Beta-Cluster-Infrastructure, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Current-Sprint): Create new participant questions columns in beta DB - https://phabricator.wikimedia.org/T340694 (10Daimona) [21:15:23] 10Beta-Cluster-Infrastructure, 10CampaignEvents, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Current-Sprint): Create new participant questions columns in beta DB - https://phabricator.wikimedia.org/T340694 (10Daimona)