[07:12:20] !log admin start maint on neutron - T421054 [07:12:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:12:42] T421054: Move all openstack rabbitmq queues to quorum - https://phabricator.wikimedia.org/T421054 [07:26:55] !log admin neutron maint done - T421054 [07:27:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [07:27:02] T421054: Move all openstack rabbitmq queues to quorum - https://phabricator.wikimedia.org/T421054 [13:20:56] !log toolsbeta reboot harbordb1 instance [13:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:17:43] I have a tool called wikimonitor running on Toolforge inside a JDK 21 container with 3 GiB of memory and 3 CPU cores. For some reason, the tool stops functioning when its memory usage reaches around 1 GiB. [14:17:43] There are no error messages in the logs indicating that the container or the application has crashed. Interestingly, when the memory usage reaches approximately 1 GiB, the last log entry shows that the SSE connection to Wikimedia was interrupted. Normally, this is not an issue, as the tool automatically reconnects. [14:17:43] However, once it reaches this threshold, the tool does not attempt to reconnect, and there are no log messages indicating any reconnection attempts. : https://tools-static.wmflabs.org/bridgebot/c25c3825/file_79054.jpg [14:23:31] WMCloud TOS: does it require a public issue tracker? [14:23:55] I mean, from the POV of tools. Do they need a public issue tracker? [14:27:46] If this is not required, are you aware of any place to discuss the introduction of such requirement? ❤️ [15:41:47] I've read the TOS. Answer: I think nope. Discussion moved here :3 [15:41:49] https://wikitech.wikimedia.org/wiki/Wikitech_talk:Cloud_Services_Terms_of_use#Tools:_requiring_a_Public_Issue_Tracker [21:05:13] <Михаил> Hi! Is there any supported way to restart a classic Toolforge webservice once per day? [21:05:14] <Михаил> I tried this [21:05:16] <Михаил> ``` [21:05:17] <Михаил> toolforge jobs run restart-webservice-daily \ [21:05:19] <Михаил> --image bookworm \ [21:05:20] <Михаил> --command "toolforge webservice buildservice restart" \ [21:05:22] <Михаил> --schedule "@daily" \ [21:05:23] <Михаил> --no-filelog``` [21:05:25] <Михаил> But looks like this doesn't work because scheduled jobs run inside containers and don't have the toolforge/webservice CLI [22:14:23] I think there are a few more or less supported ways you could hack it together, but why do you want to restart your webservice every day? [22:41:33] <Михаил> It fetches some data (words frequencies) from GitHub on startup. Just how an app was written originally. Could you please describe a way to do it (re @lucaswerkmeister: I think there are a few more or less supported ways you could hack it together, but why do you want to restart your webservice e...) [22:44:35] I think the best solution would be to change the app so it can reload its data whenever needed, and the second best would be to change it so it exits after a day (and then gets restarted automatically by toolforge) [22:44:54] my third suggestion was going to be a health check script that starts to return “false” after a day so toolforge will restart the webservice [22:45:05] but now that I check, I’m not sure webservices can have health check *scripts* [22:45:07] jobs can: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_jobs#Configuring_health_checks_for_jobs [22:45:17] but webservices only document health check endpoints https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Health_checks [22:47:21] alternatively, I thought you could patch the kubernetes deployment object to add a time limit to the pod, but I’m not sure that exists – jobs can have an activeDeadlineSeconds (https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup) but that might not be the case for deployments [22:47:32] you _could_ have a health check endpoint that starts failing after a day [22:47:58] yeah, that would be a variation of making it exit after a day [22:48:22] could be easier to implement, true