[09:55:28] Now I can submit jobs again and they seem to run fine. Does anyone here know what fixed it? Thanks anyway! [10:16:45] DennisPriskorn were you able to get the logs out of one of the failing jobs? [10:17:12] no, no logs whatsoever and nothing in get events either, see link above [10:29:15] How many jobs are you creating in parallel? there is a limit on how many resources you can consume at the same time. [10:56:31] !log tools.pbbot New serialization method for LonelyPages app [10:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pbbot/SAL [11:00:06] !log toolsbeta created node toolsbeta-sgeexec-10-6.toolsbeta.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by arturo@nostromo [11:00:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:02:30] !log toolsbeta created puppet prefix 'toolsbeta-sgeweblig' [11:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:02:56] !log tools created puppet prefix 'toolsbeta-sgeweblig' [11:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:03:01] !log tools created puppet prefix 'tools-sgeweblig' [11:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:03:28] did we agree on using sgeweblig instead of sgeweblight? [11:04:04] we didn't, but I figured that using the same string length than 'sgewebgen' would be OK [11:04:31] any reason why? [11:04:43] (as in, does grid mind? any other program?) [11:05:43] the string length issue was raised because a kernel limitation on hostnames, no? that's why we started using webgen as well [11:06:05] and to allow hardcoding the debian release version in the VM name, etc [11:06:54] yep, but the kernel length was 64 chars, that was already sorted out with the weblight suffix no? [11:07:22] ex https://gerrit.wikimedia.org/r/c/operations/puppet/+/731114 [11:07:38] on https://gerrit.wikimedia.org/r/c/operations/puppet/+/731113/1 [11:07:39] ok! if we settled on both `webgen` and `weblight` then I clearly missed something. I thought we only settled on `webgen` [11:07:46] and https://gerrit.wikimedia.org/r/c/operations/puppet/+/731111/1 [11:08:08] you approved those patches [11:08:14] right [11:08:30] I was about to discover that myself, patching that file was my next step [11:11:17] you might want to take over T292465 [11:11:17] T292465: Automate rebuild and rebuild toolsbeta-sgewebgrid-generic-0901 - https://phabricator.wikimedia.org/T292465 [11:14:48] that sound very related to the kind of things I'm doing lately, everything grid automation in preparation for the buster migration [11:27:34] !log tools created puppet prefix `tools-sgeweblight`, drop `tools-sgeweblig` [11:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:27:49] !log toolsbeta created puppet prefix `toolsbeta-sgeweblight`, drop `toolsbeta-sgeweblig` [11:27:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:32:06] only 2 jobs at the moment. creating a single itemsubjector job failed (re @wmtelegram_bot: How many jobs are you creating in parallel? there is a limit on how many resources you can consume at the same time.) [11:33:08] DennisPriskorn: would you mind creating a detailed bug report on phabricator? [11:56:47] done https://phabricator.wikimedia.org/T299039 (re @wmtelegram_bot: DennisPriskorn: would you mind creating a detailed bug report on phabricator?) [12:28:36] !log toolsbeta created node toolsbeta-sgeweblight-10-1.toolsbeta.eqiad1.wikimedia.cloud and added it to the grid - cookbook ran by arturo@nostromo [12:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:36:03] DennisPriskorn: thanks, will follow up there