[09:30:26] Hi, one of our cloud vps instances is complaining about failed puppet runs for three days now. Looking into it more, it seems `http://apt.wikimedia.org/wikimedia` is lacking some python3 packages. Any advice on how to go about debugging this further? [09:36:53] MichaelG_WMDE: which VM, which packages? did you try a `apt-get update` + puppet re-run already? [09:39:23] @majavah the wikibase registry vm. I didn't try that yet. I'd assume puppet would do that as part of its usual process? [09:39:35] But I'll try it now, since it should be harmless in itself [09:41:58] Mh, it complains about an invalid signature of the repo mentioned above. That is probably the issue [09:42:16] (Though I wonder a bit why puppet didn't fail at that point already) [09:43:18] I guess the problem is that it still tries to talk to the jessie part of that repo [09:43:29] * MichaelG_WMDE looks to find where all this is configured [09:45:29] can you paste apt's output somewhere so I can have a look? [09:46:21] but yes, jessie sounds.. very wrong [09:50:20] it seems to be a very old vm - probably it should be migrated to something newer or some place else [09:50:44] I'll make a phabricator paste [09:53:16] majavah: There you go: https://phabricator.wikimedia.org/P16891 [10:34:31] rip wikipedia [10:34:40] "upstream connect error or disconnect/reset before headers. reset reason: overflow" [10:35:06] looks like a known issue, per -operations [10:35:37] @maj [10:35:40] majavah: [10:35:56] * majavah: what's the full name of that channel? I'd normally check on metawiki, but... y'know [10:36:02] majavah: what's the full name of that channel? I'd normally check on metawiki, but... y'know [10:36:05] enterprisey: #wikimedia-operations [10:36:24] enterprisey: mostly full of automated monitoring bots at the moment noticing every individual app server is having issues [10:36:32] I see, thanks [12:12:29] hi, is there a way to unlink a wikimedia (SUL) account from my developer account? [12:13:44] Misza: unlink on where? there isn't a canonical mapping of sul<->developeraccount, but various services (phabricator, toolsadmin) might have their own ones [12:15:30] majavah: in the toolforge admin console, I accidentally linked it to a wrong dev account [12:19:26] Misza: I don't think that's possible to self-service, but we can probably do it manually if you open a phabricator task [12:30:55] okay, my main problem is: I can't recover my account (that I created a long time ago), wikitech wiki says it's not registered on the User: page but account creation page says the same name is in use. I'm not getting a password reset link to either of the 2 emails I might've used. Is this somehow salvageable? [12:32:07] (I registered another account to try and see if the one I'm recovering has an email set and accidentaly linked the SUL, but let's ignore it for now) [12:32:34] which account? this one https://ldap.toolforge.org/user/misza? [12:35:09] majavah: ok thanks, got the email now, wasn't aware of this lookup page [12:35:28] I actually tried wrong username :) [12:40:41] majavah: where in phab should I create a ticket for unlinking SUL account? as "generic task" or in some specific project? [12:42:49] Misza: tag it with #wmcs-kanban and #striker please [12:44:01] I could probably do it myself, but I'm not confortable with messing with the production system [12:46:16] T287369, thank you for all the help [12:46:17] T287369: Unlink SUL account from dev account - https://phabricator.wikimedia.org/T287369 [12:46:25] good bot :) [13:31:27] !log cloudvirt-canary disabled diamond on the machines (T287351) [13:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudvirt-canary/SAL [13:31:31] T287351: [Cloud VPS alert][cloudvirt-canary] Puppet failure on canary1018-01.cloudvirt-canary.eqiad1.wikimedia.cloud (172.16.1.236) - https://phabricator.wikimedia.org/T287351 [13:31:35] !log cloudvirt-canary disabled diamond on the machines (T287350) [13:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Cloudvirt-canary/SAL [13:31:37] T287350: [Cloud VPS alert][cloudvirt-canary] Puppet failure on canary1044-01.cloudvirt-canary.eqiad1.wikimedia.cloud (172.16.3.177) - https://phabricator.wikimedia.org/T287350 [14:29:25] anyone have access to huggle to restart xmlrcs?? [14:29:43] it is hanging and stopping wm-bot from taking the feed [14:29:59] Huggle is a desktop app? [14:31:14] https://openstack-browser.toolforge.org/project/huggle [14:32:17] andre_, ^^ I have pinged addshore, however, presumably off doing something else [14:32:29] *has a look* [14:32:49] thanks! [14:33:22] * addshore reads https://wikitech.wikimedia.org/wiki/XmlRcs?wprov=srpw1_0 [14:38:18] sDrewthedoff: workig now? [14:38:55] !log huggle restarted xmlrcs.huggle.eqiad1.wikimedia.cloud [14:38:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Huggle/SAL [14:39:43] !log huggle waited for redis, then started xmlrcsd and es2r as xmlrcs user per https://wikitech.wikimedia.org/wiki/XmlRcs#Maintainer_info [14:39:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Huggle/SAL [14:46:33] MacFan4000: ^ [14:46:58] RhinosF1: I know [14:47:07] I tested after that was done [16:37:23] What's the maximum RAM I can have on a single Cloud VPS instance? [16:37:44] !log tools removing tools-k8s-ingress-4 from active ingress nodes at the proxy T280340 [16:37:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:37:48] T280340: Upgrade Toolforge Kubernetes to latest 1.19 - https://phabricator.wikimedia.org/T280340 [16:41:07] ...It's looking like whatever number you have for me, it won't be big enough 😄 [16:41:11] harej, I think there's a default flavor with 36Gb (which is surely a typo and I meant to make it 32, but here we are) [16:41:22] what are you trying to do? [16:41:41] If you need a super-special flavor with a different allocation you can open a quota ticket and we'll consider it but RAM is hard to overprovision so not as abundant as e.g. cores [16:41:45] Very large databases [16:42:26] According to docker stats this database is currently using over 80 GB of RAM [16:43:25] I am not sure Cloud VPS is a good home for this database. [16:43:43] The impact would have to be proportionate to the resource usage and I am not confident that is the case. [16:45:26] just because it's using 80 GB of ram doesn't necessarily mean it "needs" it to run it [16:47:22] yeah, it might be greedy and just caching as much as the OS lets it [16:48:33] That's fair! [16:48:48] My workstation definitely has the memory to spare. [17:37:54] !log tools repooled the whole set of ingress workers after upgrades T280340 [17:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:38:00] T280340: Upgrade Toolforge Kubernetes to latest 1.19 - https://phabricator.wikimedia.org/T280340 [18:49:43] !log tools.notwikilambda installed MinervaNeue and MobileFrontend a few hours ago, forgot to log it earlier (T287401) [18:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.notwikilambda/SAL [22:04:11] I'm running the parliamentdiagram script on Toolforge, and it's suddenly broken. [22:04:37] From my errors.log file: 2021-07-26 22:00:40: (mod_cgi.c.748) stat for cgi-handler /usr/bin/python failed: No such file or directory [22:05:48] But calling "file /usr/bin/python" yields: /usr/bin/python: symbolic link to python2.7 [22:10:15] And running python from the shell doesn't cause any problem. [22:19:15] bstorm: ^ fallout from the upgrade? [22:19:49] doesn't sound like it [22:19:54] It started about 8 hours ago [22:19:57] We didn't upgrade any libraries or images [22:20:01] looking back through the logs [22:20:09] that's longer ago than the upgrade [22:20:29] That sounds like some kind of issue with a virtualenv [22:20:47] How can I debug that? [22:21:00] I'm not using a virtualenv. [22:21:09] At least, not directly myself. [22:21:28] You'd need to be in kubernetes. Are you using Kubernetes or Grid? [22:21:32] Kubernetes [22:21:34] Ok [22:21:40] Which tool? [22:22:08] parliamentdiagram [22:22:27] When you say "running python from shell" do you mean inside `webservice shell` or just on the bastion command line? [22:22:38] Just on the bastion command line [22:23:04] Ok, services that are running in Kubernetes will be using the python version in their image, not what the bastion has. [22:23:26] Usually, you run `webservice shell` with the correct image to get the right version. Let me take a look at what you've got [22:24:05] Your kubernetes image is docker-registry.tools.wmflabs.org/toolforge-php73-sssd-web:latest [22:24:09] That's not going to have much or python in it [22:24:27] It may have python2 because the OS has it [22:24:46] What should I do about that? [22:24:58] I'm not sure why your code cares about python? [22:25:07] My code runs a python CGI script [22:25:13] Ah ok [22:25:43] Which was running fine until a few hours ago :-/ [22:26:39] At around 1600 UTC it would have been restarted [22:26:50] That fits exactly with the start of the problem [22:26:51] Everything was [22:27:35] Overall, you are running in the same environment you were before, though [22:27:45] Unless....hmm. [22:27:48] I have a thought. [22:27:59] One thing that was changed is that the image would be refreshed [22:28:22] And there was a new version of webservice rolled out ages ago that you'd just now be seeing [22:28:34] Since unless you restart, you won't see the changes [22:29:15] OK, but I still don't understand why it can't find /usr/bin/python [22:29:41] The image for the container is like the OS this runs inside [22:29:46] (because I'm completely clueless about webservice and kubernetes, although I use them) [22:29:55] So if that was updated but nobody tested it yet for your particular use case [22:29:57] OK. [22:30:04] then you might have found something nobody else saw :) [22:30:10] And testing it, I see the issue [22:30:15] I feel so special :-D [22:30:18] lol [22:30:43] Since you are using a php container to run python, it's ended up with a funny problem. Webservice is now a python3 system [22:30:51] it doesn't seem to have python2 installed at all [22:31:03] It only has python 3 [22:31:32] And is that somewhere else? [22:31:36] python 2 is end-of-life, so the best thing you could do is try to run on python3 [22:31:37] yeah [22:31:45] `/usr/bin/python3` [22:31:53] On debian anyway [22:31:55] My script starts with "#!/usr/bin/python" [22:31:57] Ah, OK! [22:32:03] So maybe I can just change that line? [22:32:13] Possibly...you might need to make some code changes [22:32:20] OK, will start checking now. [22:32:21] Thank you! [22:32:40] I didn't expect it to not have python2 installed at all, so you've found something that other people are likely to run int [22:32:43] *into [22:32:54] I'll make a ticket for that in case it is causing more trouble than it seems [22:33:03] It is really a good idea to switch to python3 if you can, though [22:33:16] Of course! [22:33:38] I just didn't realise that I was implicitly calling python2 by just calling python. [22:35:31] I've changed the first line to #!/usr/bin/python3 and it's still giving the same error. [22:37:55] It may not be using that line to decide what to call [22:38:10] One moment, I'll finish the ticket and see if I can help a bit more [22:38:34] Thanks! [22:38:56] T287421 [22:38:57] T287421: Latest Toolforge docker images don't always have python2 installed at all - https://phabricator.wikimedia.org/T287421 [22:40:13] I found `public_html/westminster.py:#!/usr/bin/python` [22:41:09] Sure, but that's an old, disused script. [22:41:24] Eh, not quite, but it's not the one being called here. [22:41:58] The page in question is parlitest.php [22:42:02] ok :) [22:42:19] Which uses js/main.js [22:42:25] Which calls like this: [22:43:29] $.ajax({ [22:43:30] type: "POST", [22:43:30] url: "newarch.py", data: {data: JSON.stringify(requestJSON)} [22:43:46] I see. [22:43:58] Mind if I restart it? It may not have reread the file when you changed it [22:44:02] Sure! [22:44:44] !log tools.parliamentdiagram deleting pod to restart tool [22:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.parliamentdiagram/SAL [22:45:10] Mind you, if that works to get it to use python3, we may have a new set of errors...but that would be progress [22:45:24] can you try it? [22:46:00] Still giving the same problem, let me check the error log to see whether it's the same error. [22:46:26] Same error. [22:46:32] After the message about restart. [22:47:18] hrm. Why is the cgi-handler doing that... [22:49:28] * bstorm goes looking for lighttpd configs or something [22:49:49] :-D [22:51:47] odd. there's a uwsgi.ini file in here. [22:52:20] I don't remember ever having used uWSGI [22:53:47] I have no idea how that got there :-D [22:56:33] Ah, so it seems lighttpd does some matching based on file extension and from there determines what runtime to use. In this case /usr/bin/python. That's configurable somehow iirc [22:57:38] Oh, that would fit the problem. [22:59:07] Looks like we can force lighttpd to do things with a `~/.lighttpd.conf` file. That's not really my usual thing, so I'm looking at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd#Example_configurations [22:59:59] I'm impressed so far. [23:01:06] Looks like it'd be the mimetype.assign [23:01:26] mmmmaaaybe [23:01:51] I'll try anything! [23:02:44] trying something [23:02:48] restarting the pod [23:04:00] !log tools.parliamentdiagram deleting pod to restart tool after adding an attempted cgi config to $HOME/.lighttpd.conf T287421 [23:04:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.parliamentdiagram/SAL [23:04:06] T287421: Latest Toolforge docker images don't always have python2 installed at all - https://phabricator.wikimedia.org/T287421 [23:04:41] not looking good yet :) [23:05:22] Now it's thoroughly broken :-D [23:05:31] At least I won't get users reporting subtle bugs. [23:06:19] lol [23:07:20] Huh... [23:07:23] It sort of worked [23:07:26] https://www.irccloud.com/pastebin/QE9dPnUM/ [23:07:35] That's the k8s log output [23:07:38] I've removed my config [23:07:56] It sems to me that it just failed to run the python [23:08:32] restarting it without my config [23:08:42] My main script is still throwing a 503 error, though. [23:09:00] not now :) [23:09:04] Took it a second [23:09:23] True, now the old bug is back :-D [23:09:54] When I added that config, it did try python3, but it failed pretty catastrophically [23:09:57] https://www.irccloud.com/pastebin/bPGf3hO6/ [23:10:04] Ah, I see. [23:10:20] Hrm, maybe not [23:10:27] It's possible that my config was just garbage :) [23:10:33] That would also produce that! [23:10:41] Since python3 launches lighttpd [23:11:10] Right, and I see that it seems to have died in the attempt of launching lighttpd [23:11:26] So there's a short path to getting you up and running. I could add python2 to that particular image [23:11:51] OK, as a temporary fix. [23:12:02] It doesn't seem like the right solution entirely, but it's also the solution I'm familiar with. Clearly, fixing lighttpd is not my forte [23:12:09] One moment [23:14:45] running a test build locally [23:18:51] OK. It's now 01:15 here and I have to get up at 06:00 to fly to Spain for work. Can you mail me at davidrichfield@gmail.com or post an update at https://github.com/slashme/parliamentdiagram/issues/109 when you figure out what's going pear-shaped? [23:18:58] 👋🏻 [23:19:09] I can email ya [23:19:20] My hero, thanks! [23:19:28] Also, I'll keep that ticket up to date at T287421 [23:19:28] T287421: Latest Toolforge docker images don't always have python2 installed at all - https://phabricator.wikimedia.org/T287421 [23:19:47] Excellent. You are appreciated! [23:20:15] np, sorry for the surprise breakage [23:28:51] bleh, this is going to bloat the containers further, if they have to contain python2 and 3 :( [23:29:25] Agreed. It's not as bad as the fact that they have emacs, though :) [23:30:26] Previously, only the python3 containers needed both python2 and 3 [23:31:10] yeahhh :| [23:31:16] not sure if you had a chance to see my comment at https://phabricator.wikimedia.org/T286784#7231357 [23:31:25] legoktm: if you come up with ways to help people transition to python3 instead, I'm more than happy to go in that direction. The biggest one would be finding out if lighttpd can use python3 for cgis. I've never used it that way, myself. [23:32:00] yeah, also that :) [23:32:01] oh, I think readding python2 is the correct but unfortunate decision [23:32:39] and maybe have a clean break whenever we add bullseye-based images (or the glorius buildpack future) [23:32:49] I'm really hoping to spend more time getting the infrastructure capable of buildpacks. [23:32:57] <3 [23:33:11] I could honestly set it up fairly quickly if not for the changing requirements around gitlab [23:33:23] But..we'll see [23:33:39] I also lost a week to other things :)