[07:29:46] !log tools.wikiloves Deploy af63dae abd a488a29 (T347813) [07:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [07:30:04] !log tools.wikiloves Deploy af63dae abd a488a29 (T347813) [07:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [11:05:49] !log project-proxy resize proxy-04 g3.cores2.ram4.disk20 to match proxy-03 [11:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [11:41:08] !log project-proxy configure keepalived ip for main project-proxy service T316982 [11:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [11:41:12] T316982: High availability for the main cloud vps web proxy - https://phabricator.wikimedia.org/T316982 [11:52:29] !log tools reboot tools-sgeweblight-10-22, 28 [11:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:54:44] oh, that's you :), I was going to do it now too, but saw the emails piling up [12:02:33] !log tools also reboot tools-sgeweblight-10-30 [12:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:09:19] taavi: sgeweblight-10-25 seems stuck too [12:09:45] where do you see that? [12:13:07] https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview?orgId=1&viewPanel=2&from=now-1h&to=now [12:13:15] and sshing to it, I see a bunch of processes like [12:13:23] root 11369 0.0 0.0 5988 2468 ? D Oct11 0:00 /usr/bin/lsof +c 15 -nXd DEL [12:13:39] that's one of the things that gets stuck with nfs hiccups [12:17:50] hmm, sgeexec-10-17 has some stuff going on too [12:19:58] seems to have started ~2 days ago, I think that that lsof is what makes the process count increase, once something gets stuck, any lsof will get stuck too, and as it runs periodically, it gets more and more stuck lsofs [12:20:45] !log rebooting sgeexec-10-17 [12:20:47] dcaro: Unknown project "rebooting" [12:21:00] !log tools rebooting sgeexec-10-17 [12:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:30:09] !log tools.bridgebot restart bnc pod to get tool to reconnect [12:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [12:34:49] !log tools.bridgebot Restart because of duplicate messages [12:34:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [14:01:31] !log tools deploy jobs-cli v15 T348250 [14:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:01:35] T348250: jobs: Add option to disable NFS mounts - https://phabricator.wikimedia.org/T348250 [15:07:50] !log tools reboot tools-k8s-worker-70 [15:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL