[12:16:47] !log toolsbeta created VM toolsbeta-sgecron-02 (T284767) [12:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:16:54] T284767: Toolforge: migrate cron servers to Debian Buster - https://phabricator.wikimedia.org/T284767 [12:19:13] !log toolsbeta created puppet prefix `toolsbeta-sgecron` with proper hiera/roles [12:19:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:04:18] !log toolsbeta livehacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/760933 (T284767) [13:04:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:04:22] T284767: Toolforge: migrate cron servers to Debian Buster - https://phabricator.wikimedia.org/T284767 [20:11:34] !log research-collaborations-api set --cores 24 --ram 50000 --instances 12 T301121 [20:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Research-collaborations-api/SAL [20:11:37] T301121: Request increased quota for research-collaborations-api Cloud VPS project - https://phabricator.wikimedia.org/T301121 [20:13:00] andrewbogott, for T300753 I was thinking to just match the resource allocation of analytics and then put a date on the calendar to turn off the old project. Does that match your thinking? [20:13:01] T300753: Request increased quota for Data Engineering Cloud VPS project - https://phabricator.wikimedia.org/T300753 [20:13:09] balloons: yep! [20:16:33] !log data-engineering set --cores 56 --ram 103000 --instances 24 T300753 [20:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Data-engineering/SAL [23:28:45] https://sal.toolforge.org/ not loading for anyone else? [23:30:28] it's dead jim [23:30:30] https://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&var-namespace=tool-sal&refresh=5m looks like it hasn't been working all day [23:31:01] but has been getting close to CPU limits [23:31:03] !log tools.sal Restart hung webservice [23:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [23:31:21] all better [23:31:53] thanks [23:31:59] I haven't figured out if it is a crawler that hits it and locks up all the workers or some other crappy resource leak [23:33:22] I really don't want to dive into that too deeply. Time would be better spent rewriting the webservice away from the abandonware PHP framework I wrote it with to something nicer (like flask) [23:37:02] The semi-random lockups go back quite some bit in the tool's SAL -- https://sal.toolforge.org/tools.sal