[12:15:18] !log quarry Deploying: 792277: query.py: Make quarry history descending | https://gerrit.wikimedia.org/r/c/analytics/quarry/web/+/792277 538d322ffd18dd7ad1e53644cffc3946d1c42990 [12:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [12:16:09] Thanks Rook :) [12:16:18] np :) [12:23:49] !log quarry Deploying: 791606: Return 404 on query ids that do not exist | https://gerrit.wikimedia.org/r/c/analytics/quarry/web/+/791606 e19b0a5e706f7e853f66b0d376c43cd499d8a0e2 [12:23:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [14:12:39] !log clouddb-services enable gtid on toolsdb T301993 [14:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [14:12:42] T301993: [toolsdb] Enable gtid to help replication recovery - https://phabricator.wikimedia.org/T301993 [18:01:49] !log toolhub Update demo server to 42072d [18:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [19:05:58] !log tools.lexeme-forms deployed 8cdef0cf20 (l10n updates) [19:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [20:25:16] !log tools.wd-image-positions deployed 803f7f1f3a (info when users can’t edit due to noscript or missing login) [20:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wd-image-positions/SAL [20:57:02] Good evening! Anyone experienced with the grid engine who could help me with a stuck job? [20:57:42] I stopped it a while ago, but according to qstat, the deletion is still pending, and I cannot restart it due to that. [21:04:29] According to "qstat -xml", the job is running on tools-sgeexec-0940.tools.eqiad.wmflabs , but according to my SSH client, that's not a valid host name. [21:05:46] tkarcher: I think a cloud admin can force it [21:06:37] Someone should show soon [21:06:48] you can use !_help if you need them quick [21:07:24] Ok, thanks. [21:08:18] It's not super urgent, but it would be convenient if I could restart it soon. [21:10:09] bd808: is that something you can do ^ [21:11:15] tkarcher: what is the tool name and job id please. I can force kill stuck things for you. [21:11:40] tool name: erinnermich [21:12:06] job number: 7856719 [21:12:11] "tools-sgeexec-0940.tools.eqiad.wmflabs" is a valid hostname, but only from inside of toolforge (the domain is made up and only in our internal resolvers) [21:13:02] !log tools.erinnermich Forced the deletion of job 7856719 per IRC request [21:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.erinnermich/SAL [21:13:25] tkarcher: you should be able to start it again now [21:13:44] Thanks! :-) [21:14:55] np. and don't be worried about using the ! + help magic to get attention. If all the helpers want to ignore you they are in control of their irc client notifications. :) [21:16:29] Successfully started - and I had to stop it again. :-( (This time stopping worked, at least) [21:17:16] usually if a job requires a force kill it really means the job is not running but the grid scheduler thinks that it actually is [21:18:04] Yeah, I noticed that, too. There was no response from the script, and no error messages. [21:18:23] Now I get error messages again. Which is progress. Somehow. [21:18:56] For whatever reason, my virtual Python environment can't find the MWAPI module anymore. The job was running since January without problems. [21:27:18] hm, perhaps you’re on the Buster grid now, which means different a different Python version? https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#What_are_the_primary_changes_with_moving_to_Buster? [21:27:32] (make that https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#What_are_the_primary_changes_with_moving_to_Buster%7F so the question mark is part of the URL) [21:30:22] I was trying to move to Kubernetes earlier today, but failed (due to having no experience with that whatsoever), and was hoping that stopping and restarting the job with "-release buster" might be enough to get it running again. Which wasn't the case. [21:32:26] Ah! https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users sounds promising. I'll try that. [21:32:26] tkarcher: that should mostly work, but if you are using a python venv you will need to rebuild it to work with the version of python on the buster grid. The login.toolforge.org and dev.toolforge.org bastions have the correct version of python installed to build a venv that will work with the buster grid engine nodes. [21:59:02] Quick update: Yes, I had to rebuild the environment, and now it works, sort of. Lots of syntax errors within the Python script due to the version change, but those I can fix myself, I hope. Thanks again for your support! [22:33:26] Final update for today: Bot is running smoothly on Buster now. I'll read more about Kubernetes in the next weeks and try to migrate again later. [22:33:42] \o/ [22:34:35] happy news tkarcher :)