[08:31:53] * arturo online [09:00:29] I see maintain-kubeusers is not running in toolsbeta, apparently ImagePullBackOff [09:00:36] however, I see the image in toolsbeta-harbor [09:03:58] T384809 [09:03:59] T384809: toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809 [10:18:11] Yeaa it’s this weird thing (maybe not so weird) where you can view the images through the web UI but can’t pull it (I didn’t notice it because of this). [10:21:27] But yeaa this is not a problem, I was working on toolsbeta harbor upgrade last Friday. Just re-running prepare, and restarting with docker-compose -f fixed it (I’ll keep an eye out but just heads up that this might happen again as I keep tinkering with the toolsbeta harbor) [10:25:05] Also welcome back arturo: [10:33:33] thanks Raymond_Ndibe ! I'm happy to be back [10:34:08] Raymond_Ndibe: so if you are working on the harbor upgrade, do you have a ticket, so I can link T384809 to it? [10:34:09] T384809: toolsbeta: maintain-kubeusers not running because ImagePullBackOff - https://phabricator.wikimedia.org/T384809 [10:34:52] I guess T384327 [10:34:52] T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 - https://phabricator.wikimedia.org/T384327 [10:36:38] Yes that is the one. Needed docker version upgrade. I already submitted two patches on Gerrit one for this and one for the docker version upgrade, but I haven’t tested it enough to flag off for review [10:37:05] ok, let me know later if you need some help with them [10:38:10] Yes will try to complete the testing once I push what I’m currently working on [12:36:36] toolforge question: I am clearing out NFS space and there is > 1Tb of files in /srv/tools/project/.shared/cache [12:36:49] do we offer any guarantees about persistence then? Can I just wipe out everything more than a week old there? [12:36:56] s/then/there/ [12:38:34] Not sure I can answer. Any idea what the cache is used for? Do users actively use it? [12:40:02] they definitely actively use it, but I don't know what the context is. [12:40:36] looks like it's mostly cewbot who uses it, let's see if I can find the admin... [12:40:47] Checking wikitech. I doubt there is any guarantees [12:40:51] ok [12:43:32] first time I heard about `/srv/tools/project/.shared/cache` :-P [13:00:39] * arturo reading about CubeFS compared to ceph https://dl.acm.org/doi/10.1145/3299869.3314046 [13:08:00] here's a fun mystery: why are we getting alerts for the CA cert on pontoon-puppetdb-01.monitoring.eqiad.wmflabs, a VM that was deleted in May of last year? [13:46:39] oh, because the monitoring is in cloudinfra, not on the hosts themselves [13:49:55] periodic reminder to keep https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Skill_matrix up to date [14:32:12] topranks: I'm going to pool another cephosd node if you want to nervously watch the graphs [14:32:47] andrewbogott: cool, I'll get the popcorn in the microwave :P [14:32:54] also wb arturo :) [14:46:00] topranks: ok, here we go. it should be the same as last time but I'm going to let the cookbook do its default thing rather than nursing it a long bit by bit. [14:46:17] cool [14:46:30] ...or maybe not, cookbook is being weird [14:46:58] lol ok, out of interest which is the host you are pooling? [14:47:21] 1013 [14:47:30] trying [14:47:48] ok [15:02:44] heads up I'm planning to add v6 addresses to the cloud-private interfaces of a few nodes in codfw1dev [15:25:55] ok! [15:26:28] I need to vanish for a while. Cloudcephosd is in an inconsistent state but it should be harmless to leave it be for now. topranks you can stand down :) [15:29:50] taavi: ack [15:30:20] topranks: I'm happy to be back! I'll be reaching out about T380728 and other IPv6 network stuff soon [15:30:21] T380728: openstack: network problems when introducing new networks - https://phabricator.wikimedia.org/T380728 [15:39:29] hmmm seems like someone has removed my direct access to the network hardware? [15:39:47] like switches? [15:39:59] yep [15:40:16] I have no idea :-S if you think that's wrong, maybe open a phab ticket [16:07:55] taavi: that was me, I think I was just following up on a task when you moved from being wmf employee [16:07:56] https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1060379 [16:08:09] I can find out our policy there, it's a simple patch to re-add you anyway [16:36:09] see you tomorrow [16:36:21] * arturo offline [19:07:22] Puppet question anyone [19:08:08] I can clearly see that on toolsbeta puppet server the following commit is checked out: [19:08:43] https://www.irccloud.com/pastebin/fzj6auF9 [19:10:06] But running `puppet agent —test` I get the previous commit I already resetted on the puppet server applied [19:10:17] root@toolsbeta-harbor-1:/srv/ops/harbor# puppet agent --test [19:10:17] Info: Using environment 'production' [19:10:17] Info: Retrieving pluginfacts [19:10:17] Info: Retrieving plugin [19:10:17] Info: Loading facts [19:10:18] Info: Caching catalog for toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud [19:10:18] Info: Applying configuration version '(376b7990d6) gitpuppet - [toolforge::harbor] use latest thirdparty/docker' [19:10:18] Notice: Applied catalog in 6.11 seconds [19:10:19] root@toolsbeta-harbor-1:/srv/ops/harbor# [19:10:57] Either something is being cached, or I’m doing something wrong [19:59:30] Raymond_Ndibe: there's a deployment step on puppetservers, so what you see in the repo and what's deployed might not be the same. [19:59:49] Typically it's only the local 'master' branch that gets deployed [19:59:55] Is that enough to explain what you're seeing? [21:05:42] Re: puppetserver steps after changing the git clone, I think that profile::puppetserver::git tries to setup git hooks to do the needful. I haven't followed things all the way down to figure out if role::puppetserver::cloud_vps_project uses that profile or does some other thing to setup the git clones. [21:08:41] I think that the hooks work right, but the thing about it only deploying from 'master' regardless of your currently-checked-out-branch is maybe invisible [21:08:53] I tried to have the hook yell about it, but haven't tested recently [21:11:00] ah, that is interesting. I guess I would have expected the service to use whatever is the current HEAD on the checkout without caring about branch. I'll try to hang on to that info. [21:39:38] it's not great but that set of hooks is somewhat spagghetified and I got bogged down trying to act more intuitively