[06:28:10] good morning [06:28:37] i've got a job stuck and I am not able to delete it using qdel [06:28:53] qstat [06:28:53] job-ID prior name user state submit/start at queue slots ja-task-ID [06:28:55] ----------------------------------------------------------------------------------------------------------------- [06:28:56] 1522509 0.25828 robot tools.rebot dr 04/02/2023 23:00:29 task@tools-sgeexec-10-20.tools 1 [06:29:07] neither with qdel -f [06:29:17] what am I doing wrong? [06:30:43] thanks in advance [06:41:13] @Pau: I've deleted your job. Unfortunately the grid engine sometimes loses track of jobs it's running like that and you need an admin to recover from it. [06:48:56] ok, thanks! [06:49:43] 👍 [11:41:27] !log tools.lexeme-forms deployed 994cbd48b0 (fix typo in a Hindustani template) [11:41:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [12:37:46] I think there is something wrong with an nginx proxy in toolforge. Here is the thing: query-chest.toolforge.org is basically url shortener but for wdqs queries. And if you provide it with uuid, it looks it up in redis and issues the redirect. It works just fine unless the URL is really long (e.g. 5K chars). Here is an example: https://query-chest.toolforge.org/redirect/aJWWLi6LAGgyimasMASM6Y8USQmGW6AgyOgYqQYCIWw [12:37:56] the response is 502 but the logs says this [12:38:05] [pid: 13|app: 0|req: 5/27] 192.168.247.67 () {62 vars in 1216 bytes} [Sat Apr 8 12:34:32 2023] GET /redirect/aJWWLi6LAGgyimasMASM6Y8USQmGW6AgyOgYqQYCIWw => generated 9435 bytes in 9 msecs (HTTP/1.1 302) 3 headers in 4720 bytes (1 switches on core 0) [12:39:28] And locally for the same url, it works just fine. I thought there might be a limit in a value size per key in redis in toolforge but I looked up the key and it's fully there [12:40:20] strangely a similar tool works just fine ls.toolforge.org/p/19420423 [12:41:28] maybe I need to issue a different kind of redirect? [12:41:42] (and there is nothing in logs of the tool) [12:44:03] maybe I should do a js redirect :( [12:46:00] yeah, I just stop issuing 302, I let browser handle it via HTML [13:01:16] Amir1: try bumping the uwsgi buffer-size, cf. https://gitlab.wikimedia.org/toolforge-repos/speedpatrolling/-/blob/main/uwsgi.ini [13:02:00] (the ls tool doesn’t use uwsgi, that might be why it’s not affected) [13:03:18] hmm, but the log says uwsgi has issued the redirect successfully, unless it issues it partially and thinks it's done? sigh [13:09:52] nope, didn't fix it, reverted [13:10:54] damn :/ [14:31:10] I have just started a job using the "toolforge-jobs" command but its status is "not running". How can I get it running? [14:33:58] hello? [14:36:07] `toolforge-jobs show ` will usually show the reason. if not, I need the tool name to look more into it [14:38:27] the output is [14:38:28] +-------------+------------------------------------+ [14:38:29] | Job name:   | stt-job                            | [14:38:29] +-------------+------------------------------------+ [14:38:30] | Command:    | php ybot/stt_surum.php             | [14:38:30] +-------------+------------------------------------+ [14:38:31] | Job type:   | continuous                         | [14:38:31] +-------------+------------------------------------+ [14:38:32] | Image:      | bullseye                           | [14:38:32] +-------------+------------------------------------+ [14:38:33] | File log:   | yes                                | [14:38:33] +-------------+------------------------------------+ [14:38:34] | Output log: | stt-job.out                        | [14:38:34] +-------------+------------------------------------+ [14:38:35] | Error log:  | stt-job.err                        | [14:38:35] +-------------+------------------------------------+ [14:38:36] | Emails:     | none                               | [14:38:36] +-------------+------------------------------------+ [14:38:37] | Resources:  | mem: 500m, cpu: default            | [14:38:37] +-------------+------------------------------------+ [14:40:55] use a pastebin next time please [14:41:09] that, and see my message fully ('if not, I need the tool name to look more into it') [14:42:09] tool name is superyetkin [14:43:06] thanks, give me a sec [14:45:40] you seem to have specified memory as '500m', which it parses as 0.5 bytes (which is technically correct) and not 500 megabytes. you want an uppercase M instead [14:45:55] toolforge-jobs should have given a proper error for that, I'll check why it did not do that [14:50:43] recreated it but this time it says "fails to start" [14:51:02] +-------------+------------------------------------------------------------------------+ [14:51:03] | Hints:      | Last run at 2023-04-08T14:48:37Z. Pod in 'Running' phase. Pod has been | [14:51:03] |             | restarted 3 times. State 'waiting'. Reason 'CrashLoopBackOff'.         | [14:51:04] |             | Additional message:'back-off 40s restarting failed container=job       | [14:51:04] |             | pod=stt-job-7456689548-kclt7_tool-                                     | [14:51:05] |             | superyetkin(3a4c0990-0ee5-462d-85c2-4f27de1c6af4)'.                    | [14:51:05] +-------------+------------------------------------------------------------------------+ [14:51:14] pastebin please [14:51:37] (e.g. https://paste.toolforge.org/) [14:52:58] sorry, cannot access pastebin [14:54:06] https://paste.toolforge.org/view/75da70ad [14:54:48] 'fails to start' means that the command you specified fails to execute [14:55:04] in this case you're trying to run a php script, so you probably should be using php7.4 as the image [14:55:22] this matches what's in stt-job.err [15:00:02] tried php7.4 but it fails to start again [15:01:37]  Reason 'CrashLoopBackOff' ??? [15:04:29] how can I see the "actual" error causing my script to not start? [15:08:06] by looking at the log files it created (stt-job.out and stt-job.err) [15:09:45] are there any images with PHP 5.x installed? [15:10:25] my jstart command running the same script is fine, by the way [15:16:43] are there any images with PHP 5.x installed? [15:17:47] yes, but what do you need them for? the grid is running 7.3 [15:19:51] man, the same script (stt_surum.php) runs on the grid just fine but fails to start on kubernetes. I am wondering why [15:20:59] I am trying to port my scripts to toolforge-jobs from the grid but the script fails to start. Where should I look at for the details? The log files do not help at all [15:27:54] is there anyone who can get his/her PHP script up and running on toolforge-jobs here? [15:28:08] my scripts run fine on the grid [15:31:18] the log files contain the output your script created, so if you need more information to debug you likely need to edit the script to add it [17:49:45] jstop will not stop my job on Toolforge. Can anyone do it for me for the job 1523908? [18:10:42] done [18:12:00] thanks