[06:48:05] good morning [06:58:53] greetings [09:51:46] I noticed there were 19 emails "notification about job updatetools" over the weekend [09:52:34] they usually mean its pod was terminated, which is ok, but I'm concerned by the fact it was terminated 19 times [09:53:22] is it a symptom of some wider issue? [09:54:56] stupid question: what exactly is that job doing? [09:58:17] no idea, this might be the time I actually look into it ;) [09:58:58] the only thing I know is that I remember seeing those emails in the past where Toolforge was down, and also when ToolsDB was down I think (but I might be wrong) [10:09:28] dhinus: reviewed your packaging MR [10:09:49] taavi: thanks, I'll fix the missing things :) [10:18:28] dcaro: I found T390138 re jobs-api on fastapi, do you think all of that needs to happen at once, or whether I could just do the flask->fastapi swap for now to unblock the loki stuff and leave the remaining stuff for later? [10:18:29] T390138: [jobs-api] Generate the openapi definition from the code - https://phabricator.wikimedia.org/T390138 [10:22:35] dhinus: you might also want to look at https://wikitech.wikimedia.org/wiki/Debian_packaging/Package_your_software_as_deb for automatically uploading your packages to apt.wikimedia.org [10:25:34] taavi: thanks, that was my next step. is that guide up to date? [10:25:49] looks like it was edited recently, so I assume it is [10:26:00] no idea :D [10:27:11] I will soon find out :) [10:27:20] I left a question in the MR about licenses [11:28:34] taavi: I does not need to happen at once no, the synchronization of models to generate the same api spec (or a backwards compatible one) might also be tricky, so probably best even to do it in two parts [12:15:20] I need a review on https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/78, it's blocking any new releases of webservice [12:16:35] ~60 lines review ;) [12:45:55] dcaro: LGTM, approved [12:46:03] thanks! [13:05:20] got very focused and lost track of time, at least I seem to have something for jobs-on-fastapi that works minus a few tests [13:05:24] * taavi afk for late lunch [13:11:59] nice! [13:39:37] * andrewbogott waves [13:39:41] welcome godog ! [13:40:12] cheers andrewbogott ! thank you [13:59:12] taavi: when you're back I'd appreciate your thoughts on https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/2 [13:59:41] (and anyone else who is interested in licensing/relicensing discussions) [14:00:08] can anyone suggest a graceful way to take paws down while I'm messing with NFS? I'm tempted to just stop all the PAWS k8s nodes so that it display a consistent 'service unavailable' [14:00:41] andrewbogott: maybe take down the proxy service in k8s? [14:01:31] depends on your definition of "take paws down" [14:01:52] removing the proxy will stop users from accessing it, but the jupyter pods will continue doing their thing [14:01:53] I don't have a definition in mind, I'd just like it to not behave surprisingly while the NFS volume isnt there. [14:02:26] Is there reason to think it won't come back up if I just stop everything? [14:03:26] in theory it will [14:11:56] now I'm running the e2fsck which will likely take ages [14:25:45] ok, that should get us one less chronic alert [14:25:46] for a while [14:29:24] nice! [14:30:06] * dhinus paws: now fits 2x the amount of android source trees! [14:30:13] yeah :( [15:02:45] dhinus: a quick look to the updatetools job seems to suggest that all it does is populate the table toollabs_p with the list of tools and users from toolforge, did a quick search in codesearch and found no usages of toollabs_p, but might be used by other things [15:03:45] in the logs most of the errors are `mysql server has gone away` [15:07:05] hmm that is consistent with my memories of the tool failing when toolsdb was unavailable [15:07:14] let's see if there's any suspicious log from toolsdb over the weekend [15:09:13] dcaro: Raymond_Ndibe: do you happen to know if the client changes this comment is referring to have already happened? https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/blob/main/tjf/api/jobs.py?ref_type=heads#L63 [15:09:37] taavi: not yet [15:09:45] ok! [15:09:48] https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/112 [15:11:31] dcaro: I don't see anything in toolsdb logs that might explain it, the other explanation could be network issues between the worker and toolsdb [15:11:56] it does update the tools and users tables in one single query, so might take some time [15:12:36] (might also be that it's not able to log the actual errors it was having, and the mysql error there is just a red herring) [15:55:02] A TIL from Friday: Magnum sets everything up so that PVCs create and mount Cinder volumes using https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md [16:14:17] Very fun to see godog joining the team after all of the attempts that Chase and I made to trick^Wconvince him to join us in the past. :) [16:14:45] hahah! indeed [16:52:51] I'm about to log off, but I noticed there's a puppet alert on clouddumps1001 T401130 [16:52:53] T401130: PuppetFailure Puppet has failed on clouddumps1001:9100 - https://phabricator.wikimedia.org/T401130 [16:54:19] hat file is not there https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/c3ad9dd1dff81dfec21d3b0ce3fd486d982ce6ad/modules/dumps/files/web/ [16:54:43] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/c02168c2a4196c5d3f45e20684b088c2afd39f48 removed it [16:54:55] probably btullis ^ [16:55:06] good catch [16:56:16] I think that the `content` should be empty file there instead, until the absent applies everywhere [16:56:28] (chicken and egg issue xd) [16:58:48] * dcaro off [16:58:50] cya tomorrow! [17:00:24] * dhinus off [23:35:15] @dcaro: about the comment you mentioned, I did not add that. It’s probably the side effect of some rebase operation. But I think you have a patch up to completely remove unset fields yeaa?