[00:42:12] !log tools.wikibugs Updated gerrit-channels.yaml to: 87e88d59ab285c1f990579f37a1ceb7def84c5b8 gerrit: use wikibugs2.channelfilter.ChannelFilter [00:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [00:47:42] !log tools.wikibugs Updated channels.yaml to: 87e88d59ab285c1f990579f37a1ceb7def84c5b8 gerrit: use wikibugs2.channelfilter.ChannelFilter [00:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [00:55:39] !log tools.wikibugs Updated channels.yaml to: 7874c409dcef0ecc4bebc22914c540b37fd06574 gerrit: use wikibugs2.channelfilter.ChannelFilter [00:55:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:55:54] !log taavi@tools-sgebastion-11 tools.wikibugs toolforge jobs restart irc [08:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [09:00:22] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [09:00:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [09:25:06] I am suddenly getting 400 errors for all Rest API calls in my spamcheck tool. E.g. for https://en.wikipedia.org/api/rest_v1/page/html/Veljko_Ra%C5%BEnatovi%C4%87/1212334420?redirect=false [09:25:26] Is there an outage? [09:32:49] https://phabricator.wikimedia.org/T359509 [10:16:08] nothing major https://www.wikimediastatus.net/ [10:16:27] you can try asking in -ops/-sre [10:27:45] hello i have an issue with a tool that I maintain on toolforge. it's called etytree [10:30:08] can anyone help me out please? i think I might only need to restart the etytree-b instance [10:30:28] but I'm afraid i might mess it up if i don't do it correctly [10:31:23] Hi Epantaleo, can you elaborate a bit more? do you have a ticket? [10:31:28] are you trying to migrate the tool? [10:31:32] (from gridengine to k8s) [10:33:04] i do not have a ticket, but I have an event where i will show the tool and apparently it is down now [10:34:16] should I create a ticket? I' am not sure what is going on but when I search for a word in the tool etytree I don't see anything and the sparql endpoint is off as well [10:34:31] I will migrat hopefully next week to a new operating system [10:34:58] I don't know any details on how the tool works, only the platform it runs on, but I can try helping debug some stuff [10:35:12] but I'll need some information on how the tool is supposed to work :) [10:35:14] http://etytree-virtuoso.wmflabs.org/sparql this is the endpoint [10:35:18] sure [10:35:20] I'll try [10:35:21] thanks [10:35:36] that is a cloudvps project? [10:35:42] yes [10:35:43] (a VM of it's own) [10:35:47] okok, so not toolforge? [10:36:11] (trying to understand the layout of the tool) [10:36:13] I think it's both [10:36:15] https://etytree.toolforge.org/ [10:36:24] because there is a graphical interface on toolforge [10:36:35] and a sparql endpoint + virtuoso on the VM [10:36:45] okok, nice, that helps [10:37:16] and what does not work, is searching on the backend? [10:37:33] for sure the sparql endpoint is not working [10:37:52] that's essential so it might be that restarting the VM will fix it [10:38:03] maybe someone sent a very big query [10:38:16] which messed the endpoint [10:38:39] as far as I can see, the VM that's responding to the request on etytree-virtuoso.wmflabs.org is etytree-a [10:39:03] etytree-virtuoso.wmflabs.org http://172.16.0.251:8890 [10:39:06] oh ok [10:40:49] there's a virtuoso instance running on that port, but has no logs on journalctl [10:40:57] the VM seems mostly idle [10:41:44] it's been down for a while, because of OS issues. recently I asked to reinstate it [10:41:52] and it worked [10:42:12] two months ago [10:42:21] and i just found out is not working again [10:42:53] i am suppsed to migrate it to a different OS but I haven't managed to yet. Maybe that's the issue? [10:43:11] *more recent OS [10:44:48] I doubt that if it was working before it stopped suddenly [10:44:52] we can try restarting the service [10:45:06] (could be though) [10:46:06] Epantaleo: should I restart the service? [10:46:28] (systemctl restart virtuoso-opensource-7.service) [10:47:16] yes please [10:48:23] do you have an idea of what the issue might be? when it was restarted it worked, I checked it. And now before the meeting i checked again and it is not... [10:48:45] done ,it seems to be working again, can yo utry? [10:49:22] how did you restart it last time? [10:49:34] 3 months ago? [10:50:24] cool tool btw :) [10:52:03] oh great thank you so much [10:52:06] so [10:52:19] I used to be much more familiar with it a few years ago [10:52:27] i was able to restart it by myself [10:52:43] someone from this chat helped me last time and francesco negre [10:52:51] negri which I met at a conference [10:53:02] so i do restart it from horizon? [10:53:11] just in case... [10:53:15] for future reference, I just issued that command `systemctl restart virtuoso-opensource-7.service` [10:53:20] from where? [10:53:23] from the VM itself [10:53:40] you mean after ssh? [10:53:57] !log etytree ran `systemctl restart virtuoso-opensource-7.service` from the VM ssh [10:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Etytree/SAL [10:54:07] yes, added the log just in case too :) [10:54:35] thanks [10:54:39] a lot! [10:55:35] np, you should move to the new OS though ;), when you have a minute, feel free to ask for help too if you find issues (a task might help more, so we can keep track/reply at different hours/timezones) [10:57:22] ok thanks [11:26:45] In general it would be ideal if internal errors like this could be translated to 500 errors in the user-facing APIs. It is the same for other cases, e.g. https://phabricator.wikimedia.org/T350672 [11:27:02] oops, wrong channel [12:48:59] One of my jobs is no longer being executed ton Kubernetes. Last schedule time is displayed as 2024-03-05T00:00:00Z but this job is set to run daily. What might be the issue and how to troubleshoot? [12:56:58] @Yetkin, could you share the tool name and job name? [12:58:10] @arturo Tool name: superyetkin [12:58:11] Job name: job-popular-pages [13:03:32] let me take a look [13:06:11] @Yetkin: the first thing I would try is to schedule the job on a different timeframe. The schedule `0 0 * * *` is usually very busy. Try `--schedule @daily`, or some other hour, like `9 23 * * *` [13:06:32] if the system fails to schedule the job because too busy, it wont be scheduled outside the timeframe [13:09:00] @arturo Can I schedule without deleting the job first? [13:10:28] @Yetkin try `toolforge jobs restart job-popular-pages` to run the job now without waiting for the schedule time [13:15:48] @arturo The job complated successfully. Is there any chance I can obtain the command I used to create the job? I do not want to delete the job and create it again to schedule it at a different time [13:16:25] @Yetkin: it should have been something like this: [13:17:19] `toolforge jobs run job-popular-pages --command "php ybot/wikiproject_popular_pages.php" --image "php7.4" --schedule "@daily"` [13:17:34] usually, for users with a long list of jobs like you have, we suggest storing them in a yaml file [13:17:48] see here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Loading_jobs_from_a_YAML_file [13:18:44] I'm also planning to work on a way to export the currently defined list of jobs to a YAML file: T320575 which would help in your case [13:18:45] T320575: Allow exporting jobs list in YAML format - https://phabricator.wikimedia.org/T320575 [13:35:08] @arturo Is there any way to find out if my job has not been scheduled to run due to the busy hours you mentioned above? I have observed this by chance 😊 [13:35:57] we currently don't have a way to notify about this. It would be a good thing to have though [13:36:02] let me create a phab ticket [13:43:43] arturo: https://phabricator.wikimedia.org/T306790 [13:46:36] dcaro: thanks [15:19:45] !log urbanecm@tools-sgebastion-10 tools.sal webservice restart # 503 Service Unavailable [15:19:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [18:44:42] !log anticomposite@tools-sgebastion-10 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected [18:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:51:44] !log bd808@tools-sgebastion-11 tools.wikibugs Temporarily disable git webhook handler while landing and deploying changes to channel file handling. [20:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [21:01:14] !log paws increase worker count to manage outreachy load T359591 [21:01:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [21:01:19] T359591: add worker to paws - https://phabricator.wikimedia.org/T359591 [21:22:04] !log bd808@tools-sgebastion-11 tools.wikibugs Updated to c42300f (T359230, T359202, T359228) [21:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [21:32:00] !log bd808@tools-sgebastion-11 tools.wikibugs Restored git webhook handler [21:32:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL