[00:34:44] fyi I started https://en.wikipedia.org/wiki/Wikipedia:Bots/Noticeboard#Missing_bots_from_Toolforge's_Grid_Engine_shutdown [00:50:34] thanks for doing that legoktm [02:56:29] !log wikisource rebooted wsexport-prod02 [02:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [07:20:46] !log wikisource on wsexport-prod02 reduced MaxRequestWorkers to 20 T335553 [07:20:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [07:20:52] T335553: Investigate recent (2023) downtime of WS Export - https://phabricator.wikimedia.org/T335553 [10:12:25] toolforge asked me to report this issue here: [10:12:26] > toolforge jobs logs rustbot | tail -10 [10:12:26] Job name:     Job type:             Status: [10:12:27] ------------  --------------------  ---------------------------------------- [10:12:27] update-wikis  schedule: 17 * * * *  Last schedule time: 2024-02-22T09:17:00Z [10:12:28] rustbot       continuous            Running [10:12:28]   File "/usr/lib/python3/dist-packages/tjf_cli/cli.py", line 528, in op_logs [10:12:29]     params=params, [10:12:29]   File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 167, in get_raw_lines [10:12:30]     **kwargs, [10:12:30]   File "/usr/lib/python3/dist-packages/toolforge_weld/api_client.py", line 130, in _make_request [10:12:31]     raise self.exception_handler(e) from e [10:12:31]   File "/usr/lib/python3/dist-packages/tjf_cli/api.py", line 59, in handle_http_exception [10:12:32]     except requests.exceptions.InvalidJSONError: [10:12:32] AttributeError: module 'requests.exceptions' has no attribute 'InvalidJSONError' [10:12:33] ERROR: Please report this issue to the Toolforge admins: https://w.wiki/6Zuu [10:13:22] actually the command was:  toolforge jobs list [10:14:22] seems to happen ~50% of times for this command. same for `toolforgs jobs run` [10:14:38] what tool is that? [10:16:23] (it's a double error, the latter being an error when handling the error as the bastion is using a very old python version but the code expects a newer one, that should be sorted soon when we deprecate the grid engine and rebuild the bastions in the newer debian) [10:16:39] (but the original error being something else we should look into) [10:17:01] tool: `listeria` [10:17:47] hm, I got a different error, 500 from the API [10:18:24] For now I just re-run the commands until they work. Feels like multiple servers and one of them is 500? [10:19:39] yep, I'll open a task to follow up [10:23:01] magnusmanske: v [10:23:03] T358194 [10:23:04] T358194: [jobs-api] Getting errors when listing jobs - https://phabricator.wikimedia.org/T358194 [10:23:06] that one :) [10:59:36] please restart Bridgebot: it has left channels since a few hours. Thanks! [11:02:44] !log tools.bridgebot restart bridgebot [11:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [11:02:50] magnusmanske: should be fixed already, I'll leave the task open to finish up some non user facing work, but should work now so please report if you still see errors [11:02:51] Titore: [11:04:30] taavi: thank you [11:32:51] !log taavi@tools-sgebastion-11 tools.wikibugs toolforge jobs restart redis2irc [11:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [12:31:34] !log tools.tool-db-usage migrate to build service [12:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tool-db-usage/SAL [13:24:08] !log tools.replag migrate to buildpacks [13:24:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.replag/SAL [14:23:05] !log anticomposite@tools-sgebastion-10 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected [14:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [14:35:08] Hi folks, looking for infos on how best to handle logging for an experiment running on Toolforge. Interested in any best practices for extracting log files or other recommended mechanisms for enabling a usage analysis. Any guides? [14:46:45] erut is it a webservice you want to parse the logs of? [14:47:03] dcaro yep! python based [14:47:27] if you only want raw page views, you can try https://toolviews.toolforge.org/ [14:48:06] if you want to generate your own stats, you'll have to log to the disk and then parse it (on NFS) or something similar [14:49:29] you can get some more info on the requests for your webservice (status codes) here https://grafana.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1 [14:50:37] and get the raw data used for those graphs from here https://tools-prometheus.wmflabs.org/tools/graph?g0.expr=&g0.tab=1&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1h [14:50:38] interesting! that's a nice tool already. As to the logging to disk, should I be worried about persistence/data loss? As in, how temporary is the file space on the instances? [14:51:51] if you are using the instance storage, it will be reset on every restart, so not recommended for logs you want some archival of, but you can use the NFS to store those, and the persistance is as long as it does not grow too much xd (then we might have to truncate it to free space) [14:52:37] ack! and we're talking some GB for not too much? [14:52:40] if you are using buildservice images, you might have to pass `--mount=all` to get the nfs mounted [14:53:01] yep, a couple GB is ok, but dozens might be too much [14:53:36] perfect! I think that's really all I needed to know :) thanks a bunch dcaro [14:53:44] (we are working on providing a different storage offering like s3/swift, that you can do, but right now is a bit of trouble) [14:54:12] see https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide (you'll need a cloudvps project) [14:54:19] np :) [14:54:29] ah, okay that's interesting. We were considering s3 [14:54:49] but NFS should be fine for now, really [14:55:32] good to know :) [14:56:31] (like really, knowing what you considered will help us focus on the things that are useful to people) [15:00:24] gitlab seems to be down. [15:02:47] 301 #wikimedia-gitlab [15:07:09] taavi: ok. Seems to be back now anyway. [15:45:39] !log anticomposite@tools-sgebastion-10 tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC [15:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:48:13] !log anticomposite@tools-sgebastion-10 tools.stewardbots Deploy 34c320e [16:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:50:13] !log lucaswerkmeister@tools-sgebastion-10 tools.ww-monitor added two entries to keywords_to_report [20:50:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ww-monitor/SAL [20:53:20] !log lucaswerkmeister@tools-sgebastion-10 tools.ww-monitor restored old config file because I did a dumb [20:53:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ww-monitor/SAL [21:03:36] !log lucaswerkmeister@tools-sgebastion-10 tools.ww-monitor added two entries to keywords_to_report (hopefully correctly this time) [21:03:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ww-monitor/SAL