[03:03:22] <Raymond_Ndibe>	 anyone knows where I can find the files from which this image was built? docker-registry.tools.wmflabs.org/toolforge-bullseye0-builder:latest not on gitlab (or atleast not easy to find)
[08:26:30] <dcaro>	 Raymond_Ndibe: it's on gerrit, but we are not using that anymore https://gerrit.wikimedia.org/g/cloud/toolforge/buildpacks
[09:44:42] <taavi>	 I suspect that connection error thing is either some rate limit on the wikimedia CDN, or some network issue with the new k8s workers.
[09:47:42] <dcaro>	 let me check which workers are they running on
[10:00:51] <dcaro>	 taavi: listeria is running on the new nfs workers yes, chie-bot seems to be running crons, so they would spawn in different places I guess, but it might have been the new workers too
[10:08:11] <dcaro>	 I think chiebot was running also on the new NFS nodes
[10:10:17] <taavi>	 this is going to be impossible to debug without a way to reproduce or at least exact times of when it has been happening
[10:12:10] <dcaro>	 given that it seems to happen to several tools, we should be able to reproduce with some code snippet
[10:24:38] <dcaro>	 hmm, just restarted harbor to cleanup caches, and now I seem to be unable to log in :/
[10:25:59] <taavi>	 I'm looking at the container_network_transmit_packets_dropped_total prometheus metric and it's showing a few workers that have dropped some tranmitted packets somehow. one of them is worker-nfs-5, another is worker-82 and then ingress-4 and -5
[10:34:26] <dhinus>	 is tools-harbor seems down? there's an alert and I cannot log in
[10:34:42] <dhinus>	 s/seems //
[10:34:54] <taavi>	 dcaro: ^
[10:35:17] <dhinus>	 sorry missed the message from dcaro just above :)
[10:36:13] <taavi>	 still looking for a review of https://gerrit.wikimedia.org/r/c/operations/puppet/+/993693 btw
[10:38:38] <dcaro>	 hmm, it's strange that it does not let me log in :/
[10:48:49] <dcaro>	 interesting, took all harbor down (docker-compose down), re-run the prepare script, and brought it up and it seemed to do the trick
[11:01:06] <taavi>	 dhinus: did you have any exim specialists to ask for reviews in mind?
[11:01:39] <dhinus>	 not really :)
[11:01:58] <taavi>	 hmm
[11:02:17] <dhinus>	 I'm not sure who's worked with exim in the past
[11:02:54] <dhinus>	 I was hoping we can identify someone either in the team or outside... but if we can't I'm fine with merging :)
[11:04:50] <taavi>	 I added Keith and Jesse since they seem to have touched the prod exim config, I'll just merge this evening if we don't get a response
[11:06:05] <dhinus>	 sgtm!
[12:43:00] <dcaro>	 quick review https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/21
[12:43:12] <dcaro>	 just copy-pasted the scripts from the other clis
[12:51:12] <blancadesal>	 dcaro: left a comment
[15:14:17] <dcaro>	 I'm restarting harbor, some alerts might trigger
[15:42:19] * bd808 yawns and waves
[15:42:52] <bd808>	 my brain has no idea what timezone it is in. UTC+12? UTC-7? :shrug:
[16:12:50] <dcaro>	 \o welcome back!
[16:17:58] <blancadesal>	 🌴 🏖️
[17:01:23] <dhinus>	 taavi: for the wmcs-cookbooks repo the Gerrit setting "submit type" was "fast-forward only". I've changed it to "rebase if necessary", which is what we have in operations/puppet.
[17:01:42] <taavi>	 oh that should do it. thanks.
[17:03:18] <blancadesal>	 dcaro: is there a reason you didn't open an MR for https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/tree/allow_kind_pull_from_harbor?ref_type=heads? (in my mind this had been merged so I was very, very confused for a bit, but I thing we just merged one branch into another but never to main?)
[17:05:45] <dcaro>	 it's here https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/93
[17:06:26] <dcaro>	 it was merged
[17:07:10] <dcaro>	 and it's in main: https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/commit/954b7e72c214d7d7fbd4d48ffa7622ca4f3a973f
[17:07:23] <dcaro>	 not sure why it did not delete the branch :/
[17:08:06] <dcaro>	 I think there's an extra commit on that branch, weird
[17:08:15] <dcaro>	 might have re-pushed to it somehow
[17:09:02] <dcaro>	 just deleted it
[17:12:26] <blancadesal>	 hmm, I was just trying to test dhinu.s changes and the changes from that branch don't seem to be there
[17:15:44] <dcaro>	 oh, probably needs a rebase (dhinu.s branch)
[17:17:33] <dhinus>	 nope, I just checked and my branch is up to date
[17:19:05] <blancadesal>	 ok, might my local environment then
[17:26:23] <dcaro>	 hmmm....
[18:05:32] * dcaro off
[18:31:18] <taavi>	 andrewbogott: if you're planning to remove many more k8s workers, we will need to provision matching capacity in new nodes
[18:31:59] <andrewbogott>	 Yep. I think I'm done for now, I removed six workers.
[18:32:11] <andrewbogott>	 do you want to add the new ones? You probably have the command already in your recent bash  history ")
[18:32:19] <andrewbogott>	 um... not sure what ") means
[18:33:26] <taavi>	 ok, these are smaller nodes so I'll add 3 new larger ones
[18:33:36] <taavi>	 the command is `cookbook wmcs.toolforge.add_k8s_node --cluster-name tools --role worker_nfs` ftr
[18:37:44] <andrewbogott>	 thanks!
[18:53:37] <taavi>	 has anyone seen this certificate error before? https://phabricator.wikimedia.org/P55900 that did not go away after trying to remove and re-create that instance
[19:02:07] <andrewbogott>	 probably you need 'cert clean' on the puppetmaster
[19:02:12] <andrewbogott>	 Not sure how it got into that state though
[19:04:21] * bd808 lunch
[19:17:05] <taavi>	 i think this is a race condition with how the image gets accessed. https://gerrit.wikimedia.org/r/c/operations/puppet/+/992677 and a image rebuild should fix it.
[19:33:20] <andrewbogott>	 that patch looks right but I don't yet understand how it affects puppet certs
[19:35:33] <taavi>	 basically the cookbook thinks that the first puppet run is complete when it is not
[19:36:15] <andrewbogott>	 oh, the cookbook checks the cloud-init flag?
[19:36:22] <andrewbogott>	 then that makes sense
[19:37:01] <taavi>	 yep, since the root key it uses is embedded in the base images now
[23:12:07] <Rook>	 Could I get a +1 on https://phabricator.wikimedia.org/T356195
[23:30:05] <bd808>	 Rook: I don't think taavi and I have much confidence that just moving from toolforge to a dedicated project will change they error they are having at https://github.com/dpriskorn/WikidataTopicCurator/issues/5. This seems like a wild guess by the developer.
[23:34:43] * bd808 comments on the task
[23:36:36] <Rook>	 That's fine. Thanks for commenting on it