[08:00:26] Cteam: welcome to today 🦄! Don’t forget to post your update in thread. [08:00:26] Feel free to include: [08:00:26] 1. 🕫 Anything you'd like to share about your work [08:00:26] 2. ☏ Anything you'd like to get help with [08:00:26] 3. ⚠ Anything you're currently blocked on [08:00:26] (this message is from a toolforge job under the admin project) [09:06:25] Done: [09:06:26] * Did several improvements to our fork of the locales-buildpack [09:06:26] * Got a document we can start brainstorming on for toolforge high level view/next steps: https://docs.google.com/document/d/1sqo6YGRn9u-S7V0y9m07cYKA84vQlKa-7_F8p7eg80Y/edit [09:06:26] Doing: [09:06:26] * Continue gathering some data from ceph disks failures for dell (T348643) <- will try to focus on this [09:06:26] * Continue working on removing the expired certs from puppet if I have time (T354295) [09:06:26] Blockers: [09:06:27] T348643: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 [09:06:27] T354295: [puppet] Remove expired and unused certs from modules/profile/files/ssl/ and modules/base/files/ca - https://phabricator.wikimedia.org/T354295 [09:06:27] * Nothing [13:07:03] done: [13:07:03] * last cleanup of old wiki replica proxies [13:07:03] * fixed instance live migrations and thus hypervisor draining (https://phabricator.wikimedia.org/T355067) [13:07:03] * discovered a timezone bug in spicerack (https://phabricator.wikimedia.org/T347490#9461831) [13:07:04] doing: [13:07:04] * clinic duty things (including T355138 T344108 T355061) [13:07:04] blockers: [13:07:04] T355138: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 [13:07:05] * cloudrabbit1003/dc-ops [13:07:05] T344108: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108 [13:07:05] T355061: MaxConnTrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1060:9100 - https://phabricator.wikimedia.org/T355061 [13:45:00] Done: [13:45:00] * nothing [13:45:00] * (well, scheduled the CKA exam and gathered some resources to prepare for it) [13:45:00] Doing: [13:45:00] * tinkering with lima-kilo on lima & vagrant [13:45:01] ** upgrading vagrant to bookworm [13:45:01] ** looking into making the same install script work for both lima & vagrant [13:45:02] * trying to figure out why build pipelines are so slow on my lima-vm, with the logs timing out every time [13:45:02] ** it seems that [step-copy-builder-to-tmp] is the culprit, taking ~1.5 minutes to run [14:48:39] blancadesal: about the slowness, on my machine the first run seems to pull docker-registry.tools.wmflabs.org/toolforge-tektoncd-pipeline-cmd-git-init:v0.33., heroku-builder:22 and docker-registry.tools.wmflabs.org/toolforge-library-bash:5.1.4 at the same time (those are the containers in PodInitializing status), and they all get unblocked at the same time too, so I'm suspecting some type of contention at the docker side pulling [14:48:39] images. I would have expected for the git-init and bash ones to be downloaded way faster than the builder one [14:53:28] hmm, second run shouldn't have these issues though? [14:53:44] yep, and it does not for me xd [14:58:31] in other "news" I though upgrading vagrant-lima-kilo to bookworm would be trivial but it turned out it's not xd [14:58:50] oh, what was the issue? (/me curious) [15:02:27] when simply swapping to the bookworm box with no other changes, vagrant up seemingly runs ok, including ansible, but then I can't ssh into it: vagrant@192.168.121.65: Permission denied (publickey). [15:02:58] weird [15:03:49] I created a 'vanilla' box with our usual config but without provisioning it, then ran the install script from inside. that went fine and I ssh didn't get messed up [15:04:14] now running ansible from inside, the kind cluster creation times out [15:15:29] hmm, it's after running the playbook that ssh stops working [15:16:37] interesting, is sshd running? [15:16:49] (afair the playbook does not really touch it though) [15:20:10] on the vagrant box you mean? [15:22:57] yep [15:23:23] don't know, lost access xd [15:23:25] just trying to figure out if it's on the network side, or the service not running, or auth [15:24:02] there's no way to get a console? (libvirt would be `virsh console 1` or whichever is the id for the vm/aka domain in libvirt lingo) [15:24:10] wrong channel? [15:24:28] yeah, we're spamming. sorry :) [15:25:12] 👍 let's move to -admin [15:42:29] Done: [15:42:31] * set up the new lima-kilo vm on my M1 macbook [15:42:33] * some code reviews [15:42:35] Doing: [15:42:37] * getting my brain back into toolsdb things (T344717) [15:42:38] T344717: [toolsdb] test creating a new replica host - https://phabricator.wikimedia.org/T344717