[04:40:53] is toolforge down? i'm suddenly getting auth errors `derenrich@login.toolforge.org: Permission denied (publickey,hostbased).` [04:43:34] and now it's working again? weird [08:51:23] derenrich: we had some networking issues in the past few hours. currently investigating the root cause. [12:18:48] !log admin [codfw1dev] restart rabbitmq-server.service on all 3 cloudcontrols, all nova-compute agents are down complaining about rabbitmq being unreachable [12:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:50:24] is something wrong with toolforge? i took my instance down and now it refuses to come back up with basically no changes on my side. i've tried rebuilding the image multiple times and twiddling requested resources. i'm not seeing any obvious issues https://grafana.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1&var-cluster=prometheus-tools&var-namespace=tool-video-answer-tool&from=now-1h&to=now&refresh=10s [17:58:36] !log bd808@tools-bastion-12 tools.video-answer-tool webservice stop; webservice buildservice start --mount=all -c=2 -m=2Gi [17:58:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.video-answer-tool/SAL [17:59:40] derenrich: I think I got it to start. I'm not sure why, but I think `webservice` got confused about the state of the system in your namespace [18:01:27] bd808: odd. i was trying to increase the resources allocated. guess i'll try again [18:04:48] I saw no pods running, so I tried the webservice command from your relaunch.sh. That acted like it was starting a pod, but when it finished there was still nothing in the namespace. Then I tried just removing the `toolforge` from the start of the command and got the same looks like it is starting but did not behavior. My 3rd attempt was what I !log'ged above with the explict stop and then a start. [18:06:04] Almost any time `webservice` is acting weird that hard stop + cold start pattern will work. [18:12:04] ok so i did a hard stop followed by `webservice buildservice start --mount=all -c=4 -m=4Gi` and it's just hanging without making a pod. i'm guessing this is a quota issue but i would've expected an error and i would've thought this was within quota [18:13:47] ah apparently the cpu limit is 3 [23:36:21] !log bd808@tools-bastion-12 tools.k8s-status Update to 2d7e972 and restart (T342848) [23:36:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.k8s-status/SAL