[14:48:10] o/ i'm not able to create web proxies on Horizon right now (they fail with: "Danger: There was an error submitting the form. Please try again.") and when I create instances, they seem fine but I can't reach them via ssh either. anyone else having issues? i've tried on two different projects and with chrome and firefox with no success. [14:51:08] andrewbogott: ^ mind taking a look at the horizon issue? [14:51:50] isaacj_: I can look in a few minutes, ping me again if I don't get back to you [14:52:03] :thumbs up: thanks andrewbogott ! [14:52:12] isaacj_: about the unreachable instances, is there are anything in the "log" tab of the instance? [14:54:00] majavah: not an expert but nothing looked amiss to me after creating it. i tried hard rebooting a few times too just in case. i'll copy the last like 20 lines below of the current log: [14:54:13] https://www.irccloud.com/pastebin/xLAObQjW/ [14:54:39] isaacj_: let's start with one issue at a time; I'm guessing that the unreachable issue thing is the main thing? Could easily be a firewall or a proxyjump thing. [14:55:40] isaacj_: what project is this? [14:56:07] works for me. i'll be honest that the unreachable piece happens like 25% of the time with new instances but usually resolves itself within an hour or so (i actually created the instance last night so it's been unreachable for a while now). it's on `recommendation-api` and the instance is `wiki-gender` [14:56:28] i can reach other instances on that project FYI [15:00:15] isaacj_: something is messed up with cloud dns record creation. That's most likely causing both problems and is not you doing anything wrong. [15:00:21] I'm going to restart some things and we'll see what we get [15:00:41] yay (sorry)! thanks -- no strong urgency for this but much appreciated [15:04:38] !log admin restarting designate-sink in eqiad1; it's complaining about rabbit but I don't want to restart rabbit yet [15:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:06:35] hi! It seems like new instances are not getting their IP in DNS, at least they aren't for the traffic project? [15:07:11] ema: known issue [15:07:18] ema: I'm looking at that, not sure what the problem is yet [15:07:19] ema: a couple minutes ago Andrew was talking about DNS issues and restarting things [15:08:03] thanks! [15:15:19] !log admin restarting all designate services in eqiad1 [15:15:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:15:32] isaacj_: if you delete the disagreeable instances and recreate I think things will work now [15:15:37] ema: same :) [15:16:09] trying out now [15:17:21] isaacj_: I'm also interested on if the web proxy thing got fixed as well... I'm still doing VM tests so haven't gotten there yet [15:17:38] recreating now -- will test web proxy too! [15:18:00] andrewbogott: I've deleted cptext.traffic.eqiad1.wikimedia.cloud (old IP: 172.16.3.228) and re-created it (new IP: 172.16.0.145). DNS still returns the old one [15:18:02] andrewbogott: yep, web proxy successfully created [15:18:57] ema: wait, I thought the point was that the old one didn't get a dns record? [15:19:14] andrewbogott: the old one did get a DNS record yesterday [15:20:20] ok, sorry, I misunderstood the issue. Can you delete any VMs you have using that name, and tell me when they're deleted, and then wait for my say-so before recreating? [15:20:29] for sure! [15:20:51] andrewbogott: cptext instance deleted [15:21:16] ema: ok, I'm doing a scan for leaked records, will see what it says [15:21:16] andrewbogott: i can also ssh into the new instance (i'm using a different instance name now so not sure if that saved me some headache). thanks! i do get this message when I ssh in but not sure if that's cause for concern: `Puppet does not seem to have run in this machine. Unable to find '/var/lib/puppet/state/last_run_report.yaml'.` [15:22:27] isaacj_: if you run puppet now does it work? "sudo run-puppet-agent" [15:23:22] it asks for my sudo password, which I don't think I've seen before running doing other sudo commands on instances [15:24:09] andrewbogott: ok, thanks. The other instance I'd like to use is cpupload.traffic.eqiad1.wikimedia.cloud [15:24:37] ema: yep, I see leaks for both of those that showed up while things were broken. Will take a few minutes to clean up [15:24:54] nice [15:26:12] it would be nice to monitor this too, as in something should have been red in icinga :) [15:32:07] andrewbogott: ok now both entries are gone from DNS [15:32:23] both cptext and cpupload that is [15:32:29] ema: yeah, I have a note about why our monitoring tools didn't catch this [15:32:38] It must use IPs for testing VM availability [15:32:45] Or else it isn't reporting properly [15:32:55] Anyway, I think you're good to go now [15:33:01] cool, trying! [15:34:16] andrewbogott: cpupload got its new IP in DNS as expected, ty! [15:34:28] great, sorry for the breakgate [15:34:33] *breakage [15:34:58] np, thanks for fixing it [15:51:39] bd808, pong. The wm-bb is out [15:58:16] !log tools.bridgebot Restarting to reconnect irc client [15:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [15:59:05] Globgor: thanks for the poke [18:39:28] The ‘webservice shell’ command places you inside a container running on the Toolforge Kubernetes cluster. This container does not include the ‘kubectl’ command which is generally used from a bastion host to talk to the Kubernetes cluster. You should be able to just use commands like ‘python3 -m venv web/python/venv’ to setup the venv for your tool. (re @peacearth: Hi, [18:39:28] I am trying to run python scripts with Kubernetes on Toolforge. [18:39:29] Since I need to install some package with pip, I read [18:39:31] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Virtual_environments [18:39:32] and tried to run [18:39:34] ‘webservice --backend=kubernetes python3.7 shell’ [18:39:35] on Toolforge, and then I tried the command ‘kubectl’, but it says that [18:39:37] ‘bash: kubectl: command not found’ [18:39:38] Does anyone know how to solve such problem? Thanks!) [18:42:37] Wow. Bridgebot made that ugly on the irc side. [19:16:18] Oh, I see. Thanks for answering! [19:16:19] I tried kubectl because after I run the webservice shell command, it says that [19:16:20] Defaulting container name to interactive. [19:16:22] Use 'kubectl describe pod/interactive -n tool-xxxxx' to see all of the containers in this pod. [19:16:23] If you don't see a command prompt, try pressing enter. [19:16:25] So I thought that kubectl should work there, but it seems not. (re @bd808: The ‘webservice shell’ command places you inside a container running on the Toolforge Kubernetes cluster. This container does not include the ‘kubectl’ command which is generally used from a bastion host to talk to the Kubernetes cluster. You should be able to just use commands like ‘python3 -m venv web/python/venv’ to setup the venv for your [19:18:00] @peacearth: I agree that message is confusing. Unfortunately it is generated by Kubernetes and is not something that we can change or remove at this time. [21:33:19] I see, thank you. [22:31:02] bd808, one question. Sometimes, the messages on irc/telegram no encoded with utf-8 [22:31:28] ex: pensé > pensÃ© [22:31:48] any form to view correctly or not? [22:31:52] (with wm-bb) [22:36:19] Globgor: I looked into that a bit once. I think it happens because of messages sent by clients. The matterbridge software seems to treat all the messages as raw bytes, so the problems happen when someone is not sending utf-8 to start with.