[00:00:51] I'm trying to build a python venv for my new tool account (dyk-tools). [00:01:01] I'm getting: [00:01:13] The virtual environment was not created successfully because ensurepip is not [00:01:13] available. On Debian/Ubuntu systems, you need to install the python3-venv [00:01:13] package using the following command. [00:01:13] apt-get install python3-venv [00:01:13] You may need to use sudo with that command. After installing the python3-venv [00:01:13] package, recreate your virtual environment. [00:01:14] Failing command: ['/data/project/dyk-tools/www/python/venv/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip'] [00:05:59] any ideas what's wrong? [00:09:05] hmm, I might have just had the wrong container type [00:09:08] nevermind :-) [00:09:10] roy649: are you creating the venv in a `webservice shell`? [00:09:17] heh [00:09:31] Yeah, I did "webservice --backend=kubernetes shell" [00:09:42] I thought it would default to python3.7, but apparently not [00:09:54] when I explicitly ask for python3.7, it works :-) [00:10:30] if you don't have any files telling webservice the default type, I would guess it probably falls back to php7.4 or something like that [00:11:01] (or one of the other types that serves public_html by default) [00:11:12] ah [00:11:15] good that it works with the explicit type 👍 [00:11:19] I see my other tool has a service.manifest. [00:11:24] I guess that's what I'm missing. [00:11:31] yeah, sounds like it [00:13:34] though I think .manifest is the file you're not supposed to edit manually, and service.template is the one intended for users [00:13:55] yeah, I do see it's got a comment telling you not to edit it. [00:14:03] (can be in the home dir, or also in www/python/src/ so you can check it into source control) [01:22:37] !log admin restarting designate-sink on cloudservices100[45], possible example of T316614 [01:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [01:22:42] T316614: designate-sink lock ups (was: cloudcontrol1005/nova instance creation test is CRITICAL) - https://phabricator.wikimedia.org/T316614 [05:58:20] !log paws Upgrade IRkernel fd62fefdcd42ced29f30be56f82f14ed765fe780 T318275 [05:58:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [05:58:23] T318275: upgrade IRkernel - https://phabricator.wikimedia.org/T318275 [11:35:36] !log tools aborrero@tools-k8s-control-1:~$ sudo -i kubectl -n jobs-emailer rollout restart deployment/jobs-emailer (T317998) [11:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:35:39] T317998: toolforge-jobs emails not working - https://phabricator.wikimedia.org/T317998 [12:13:00] !log paws T318279 Tidy R config files c220d165ea779946fa7a3ff98cf806f023e46109 [12:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [12:13:03] T318279: cleanup R files - https://phabricator.wikimedia.org/T318279 [14:01:03] !log admin test [14:01:49] !log admin test2 [14:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:02:14] not sure what happened with the bots [14:02:23] checked the network, but it seems just fine [14:02:49] it seems they are back online and working 🤷 [14:43:36] https://phabricator.wikimedia.org/T319155 [14:43:36] Hello, I actually cannot agree with this declination. Please let me explain [14:43:40] Probably I didn't use the correct term here - my tool checks the availability of w.wiki by examining whether the browser can receive a request from the remote domain, no matter what the response is and how it is served. That is, it counts as a "failure" only you get network errors when connecting to w.wiki (this happens when connections to w.wiki are censored or interfered, detecting which is one of my tool's objective), while [14:43:40] other states including 404 are considered successful [16:24:11] !log tools.stashbot restarted [16:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [16:32:28] does someone know why bots are geting down? wmopbot logs doesn't show any clue about what is happening [16:34:39] Stewardsbot keeps dipping too [16:36:44] wm-bot is not getting down like other bots, probably the issue is related to something wm-bot does not use [16:37:39] Ouch I hope I have not destroyed the Striker internal consistence, when I tried to create a Diffusion repo, but it created a GitLab repo, and then I tried to fix deleting the GitLab repo [16:37:41] https://phabricator.wikimedia.org/T320438 [16:38:26] FWIW, no one else can view your attachment images [16:38:37] :O [16:38:45] It's phab weirdness from a little while ago [16:38:56] If you click through, click edit, and make them viewable to anyone, should fix it [16:39:11] !log tools.stewardbots restart StewardBot [16:39:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:39:29] Yep yep, try now [16:39:39] In my private Phabricator it does not happen [16:40:10] hmm.. wm-bot is on its own vps project I think, so this sounds like a toolforge specific issue? [16:40:35] looking but no promises yet [16:40:35] I'm guessing that bug is just an i18n message that wasn't updated properly [16:40:44] Probably [16:40:46] could be, stewardsbot is on tools [16:47:49] !log tools clean up labstore1006/7 mounts from k8s control nodes T320425 [16:48:14] let's see if that helps. in theory it should stabilize the k8s network components [16:51:48] !log tools clean up labstore1006/7 mounts from k8s control nodes T320425 [16:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:51:51] T320425: [cloudvps] Find and cleanup any mounts to labstore1006/1007 - https://phabricator.wikimedia.org/T320425 [16:52:01] ping me if you see any more disconnections later today please? [16:54:57] danilo: is wmopbot manually down or is it still having troubles? [17:01:58] !log paws Bump pywikibot to 7.7.1 24d0e2223175024bd11465decbf0a1087c1cc2c3 T320432 [17:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [17:02:01] T320432: New upstream stable release for Pywikibot 7.7.1 - https://phabricator.wikimedia.org/T320432 [17:19:24] taavi: wmopbot auto restarted, thank you [17:57:19] taavi:^ seems like disconnections are still hapening [18:37:39] !log tools.stewardbots ./stewardbots/SULWatcher/manage.sh restart # All bots disconnected [18:37:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [18:57:18] I am not finding anything in wmopbot logs that explain why it is getting down, I only found in "kubectl describe pod " Container: Last State: Terminated (Reason: Error; Exit code: 1), so it is not being killed, it seems something that makes the process hang till be disconnected by IRC ping timeout [19:02:16] sigh. looks like NFS being annoying as usual. [19:30:07] !log tools rebooting all k8s worker nodes to clean up labstore1006/7 remains [19:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:56:21] taavi: another wmopbot quit ^^ [20:01:18] !log tools.stewardbots Restart all bots [20:01:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:03:38] Error from server: error when creating "/data/project/stewardbots/stewardbots/StewardBot/k8s-deployment.yaml": admission webhook "registry-admission.tools.wmflabs.org" denied the request: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]): [Kind=apps/v1, Kind=Deployment, Namespace=tool-stewardbots Name=stewardbot Image=docker-registry.tools.wmflabs.org/toolf [20:03:38] orge-python39-sssd-base:latest] [20:03:43] well that's new [20:04:27] T320446 [20:04:28] T320446: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]) - https://phabricator.wikimedia.org/T320446 [20:04:35] AntiComposite: I've *just* seen that too! [20:05:46] ubn? [20:06:13] I'm getting toolforge job failures across my tools now too :/ [20:08:13] and wmopbot pods were killed and deleted, different to what happened earlier [20:08:20] yeah, wasn't a timeout, was (Remote host closed the connection) [20:25:30] taavi: k8s issues ^ [20:25:53] JJMC89: aware, being looked at [20:26:48] Seems all toolforge services are down. [20:27:16] DENelson83: yup, T320446 [20:27:20] yes, looking into it [20:27:35] What is "T320446"?  No link. [20:27:44] https://phabricator.wikimedia.org/T320446 [20:28:42] Okay.  I will monitor that for more info. [20:29:08] is something up with toolforge? [20:29:10] $ kubectl rollout restart deployment lexeme-forms [20:29:12] error: failed to patch: admission webhook "registry-admission.tools.wmflabs.org" denied the request: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]): [Kind=apps/v1, Kind=Deployment, Namespace=tool-lexeme-forms Name=lexeme-forms Image=docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest] [20:29:20] (and the tool is currently down, uWSGI killed itself following a SIGINT/SIGQUIT that I have no explanation for) [20:29:23] Read a few messages up [20:29:28] (the log doesn’t say which signal it was, helpfully) [20:29:30] `kubectl get events` shows several copies of that “denied the request” message, k8s has been trying to recreate the container for about half an hour it seems [20:29:32] s/container/pod/ [20:29:34] I’ll try recreating the deployment from scratch [20:29:36] ugh, and now there’s a pod that’s terminating, so apparently it managed to create one after all? [20:29:47] the bridgebot is catching up [20:29:53] I believe everything is starting to recover now [20:29:53] Oh [20:30:01] Oh... Looks like it's working again. [20:30:14] Only like ten minutes of downtime. [20:30:33] ah. did I miss something over here [20:30:35] I suppose I should’ve expected that :| [20:30:37] ok, sorry for the noise :( [20:30:50] Your message only just came to IRC @lucaswerkmeister [20:30:58] But yes there was an issue [20:31:07] I’m looking at the IRC logs now [20:31:31] sorry folks :/ will follow up on the phabricator task when it's not so late [20:36:35] no worries, thanks for putting it under control! [20:37:38] TheresNoTime or others: can someone (try to) restart wikibugs (https://www.mediawiki.org/wiki/Wikibugs)? [20:37:55] urbanecm: lookin' [20:38:38] on the bright side, I hope the issues with irc bots disconnecting has been solved by now [20:40:50] urbanecm: done :) [20:41:03] thank you kindly [20:41:47] TheresNoTime: can i bother you to restart stashbot as well? docs: https://wikitech.wikimedia.org/wiki/Tool:Stashbot#Maintenance [20:42:01] sure :) [20:42:17] whats the cli SAL log command again? [20:42:24] thought it was like `logmsg` or something [20:42:37] dologmsg [20:42:52] thanks :) [20:43:29] !log tools.wikibugs restart wikibugs [20:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:43:58] seems it came back [20:44:11] !log tools.stashbot restart stashbot [20:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [20:58:20] and...bots are down again :/ [21:04:22] hmm [21:05:05] bridge is doubling IRC on telegram [21:08:50] !log tools.wikibugs restart wikibugs [21:08:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [21:09:03] !log tools.stashbot restart stashbot [21:09:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [21:26:43] !log tools.bridgebot Double IRC messages to other bridges [21:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [23:03:07] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # Ping timeout not noticed by the bot [23:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [23:04:13] !log tools.bridgebot Double IRC messages to other bridges [23:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [23:04:24] !log tools.stewardbots ./SULWatcher/manage.sh restert # Ping timeout, all bots disconnected [23:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [23:05:45] always helps when stashbot doesn't go down