[00:09:04] <wm-bb>	 <mahir256> I did get this email too and I too was confused by it (re @lucaswerkmeister: so… did that just email all the toolforge “viewers” (“everyone”)?)
[00:32:20] <wm-bb>	 <sohom_datta> I did not get the initial mail, but got the followup email :(
[00:44:18] <AntiComposite>	 unfortunately the followup was sent to cloud-announce
[05:25:08] <arturo>	 :-( webservice broken
[07:47:26] <Wurgl>	 webservice php8.2 shell gives "Error from server (Forbidden): pods "shell-1719301586" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed]
[07:51:12] <JJMC89>	 https://phabricator.wikimedia.org/T362050#9919714
[08:03:16] <arturo>	 Wurgl: I'm working on a fix
[08:03:27] <dcaro>	 I just downgraded toolforge-webservice on login.toolforge.org to the previous version (same that's in login-buster), things seem to be working again
[08:03:39] <arturo>	 dcaro: thanks
[08:03:41] <dcaro>	 arturo: strange that in lima-kilo it did not fail
[08:04:07] <dcaro>	 versions are the same, and feature flags seem the same too (`kubectl get pods -n kube-system -o yaml | grep feature-gate` reports the same)
[08:05:56] <arturo>	 yeah
[08:05:56] <dcaro>	 there's a difference in the admission plugins enabled (tools has `EventRateLimit` extra)
[08:06:02] <dcaro>	 but that seems unrelated
[08:06:23] <arturo>	 and same in toolsbeta, I believe it did not show the issue in toolsbeta either
[08:07:40] <dcaro>	 we should add webservice shell to the functional tests too, I can do that
[08:07:48] <arturo>	 thanks
[08:08:46] <arturo>	 dcaro: mmmmm what if the diff is in the PSP definition?
[08:09:14] <arturo>	 I mean, in the PSP settings themselves having a drift
[08:09:30] <arturo>	 I could investigate that as well
[08:09:49] <dcaro>	 it might be yes
[08:10:15] <dcaro>	 the error log https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37#note_90452 is not specific about that being enabled or not (though not being explicitly enabled and available in lima-kilo makes me wonder)
[08:11:03] <arturo>	 mmmm
[08:11:34] <arturo>	 wait, the defaultProcMount is not injected by the tools-webservice code I deployed yesterday
[08:11:42] <arturo>	 this should be injected by the Kyverno policy maybe
[08:11:54] <arturo>	 so this may be a case of kyverno fighting PSP
[08:12:04] <dcaro>	 it's in the patch
[08:12:09] <arturo>	 no, sorry, it is injected by the patch
[08:12:25] <dcaro>	 https://usercontent.irccloud-cdn.com/file/xx24077x/image.png
[08:12:28] <arturo>	 is not this in the jobs-api as well?
[08:13:07] <dcaro>	 yep
[08:13:09] <dcaro>	 https://www.irccloud.com/pastebin/a2Jxwjfu/
[08:13:27] <dcaro>	 maybe it's in the wrong place? (pod/container/...)
[08:14:23] <arturo>	 seems to be in the right place
[08:14:31] <dcaro>	 in jobs api it's added in `containers[].securityContext.procMount`
[08:15:11] <dcaro>	 same
[08:15:13] <arturo>	 same as spec.containers[0].securityContext.procMount, no?
[08:15:19] <arturo>	 ok
[08:19:06] <arturo>	 so maybe the explanation is, as hinted by bryan, is the feature gate is enabled/disabled inconsistently
[08:21:13] <dcaro>	 kubectl version returns the same commit hash
[08:21:23] <dcaro>	 different build date though
[08:22:09] <dcaro>	 (limactl having a newer build, tools an older one)
[08:22:12] <dcaro>	 *lima-kilo
[08:24:06] <arturo>	 the procmounttype featuregate is disabled in the 3 setups
[08:25:03] <dcaro>	 that's what's sounds weird yep
[08:25:27] <dcaro>	 toolsbeta has the same exact build (hash and date) than tools
[08:25:33] <dcaro>	 Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.17", GitCommit:"22a9682c8fe855c321be75c5faacde343f909b04", GitTreeState:"clean", BuildDate:"2023-08-23T23:37:25Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
[08:25:50] <dcaro>	 maybe we should move the chat to -admin xd (stop spamming people)
[08:27:22] <arturo>	 ack
[09:42:17] <arturo>	 !log toolsbeta deploy toolforge-webservice 0.103.8 (T362050)
[09:42:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[09:42:21] <stashbot>	 T362050: toolforge: review pod templates for PSP replacement - https://phabricator.wikimedia.org/T362050
[09:44:02] <arturo>	 !log tools deploy toolforge-webservice 0.103.8 (T362050)
[09:44:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:46:39] <arturo>	 !status OK
[09:47:08] <arturo>	 :-( dhinus could you please do `!status OK` here?
[09:48:19] <taavi>	 !status Ok
[09:48:36] <arturo>	 thanks
[09:49:31] <dhinus>	 I never remember what is the required permission to call !status
[09:53:40] <arturo>	 the wikimedia cloak, which I don't have
[09:53:52] <arturo>	 (and refuse to request)
[09:54:11] <dhinus>	 haha ok :)
[10:58:11] <Wurgl>	 Strange! Some script started as expected at 00:42 GMT and started unexpected(!) again at 7:37? 
[10:58:47] <Wurgl>	 toolforge jobs list -o long outputs for this job:
[10:58:49] <Wurgl>	 |    thuesday     |        ~/jobs/thuesday.sh         |         schedule: 42 0 * * 2         | php8.2 | none  |    yes    |    /data/project/persondata/logs/thuesday.out     |    /data/project/persondata/logs/thuesday.err     |  none   | mem: 2G, cpu: default |   all   |   no   |     none      |          Running for 10h14m27s           |
[10:59:07] <Wurgl>	 It is the persondata Tool
[11:00:00] <Wurgl>	 Aha! It was re-started for some reason
[12:40:06] <elukey>	 dcaro: o/
[12:40:36] <dcaro>	 hello \o
[12:41:38] <elukey>	 I am working on https://phabricator.wikimedia.org/T367978 as exercise, rolling out a new version of glibc (no security issues to patch) - the cloudceph osd/mon nodes have the new version, and debdeploy reports that some ceph-related daemons need to be restarted. Would it be possible to restart one/two nodes just to make sure that everything is ok? The rest can be done anytime in the future, maybe 
[12:41:44] <elukey>	 coupled with reboots etc..
[12:42:07] <elukey>	 Andrew told me to ping you for the nodes, in case you're not the sme lemme know who I can ping
[12:43:48] <dcaro>	 elukey: sure, which ones would you want to restart?
[12:44:51] <elukey>	 dcaro: pick one of your choice, they should all have the new glibc version! Thanks! :)
[12:45:33] <dcaro>	 elukey: cloudcephosd1015.eqiad.wmnet works? some were rebooted not long ago
[12:48:51] <elukey>	 dcaro: ah lovely these ones are buster nodes, gimme a sec 
[12:49:25] <dcaro>	 yep sorry :/, still have to upgrade them, been dealing with some hardware issues and trying to avoid doing the upgrade while they are not resolved
[12:50:08] <elukey>	 dcaro: nono sorry from me, on buster the version is already ok, you can skip restarts! My pebcak
[12:50:32] <elukey>	 I am trying to do what Moritz does and I am clearly not up to the task, be patient :D
[12:50:34] <dcaro>	 okok, so one of the newer ones then, maybe cloudcephosd1030?
[12:51:28] <elukey>	 for the moment I am good, I'll ping you in case more buster/bookworm nodes pop up if you are ok!
[12:52:25] <dcaro>	 okok! thanks!
[13:56:55] <paladox>	 Hi, wikibugs seems to be down (or has not posted anything since 13:21pm UK time)
[14:00:40] <dhinus>	 paladox: there were some IRC issues today that also affected other bots (e.g. wm-bot)
[14:01:01] <paladox>	 ah
[14:01:22] <dhinus>	 probably it just needs to rejoin, but I've never touched it before
[14:01:59] <dhinus>	 I can try restarting it like this https://www.mediawiki.org/wiki/Wikibugs#Deploying_changes
[14:09:55] <dhinus>	 !log tools.wikibugs webservice restart; jobs restart irc (lost IRC connection)
[14:09:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL
[14:11:13] <dhinus>	 I think it's back
[14:26:26] <paladox>	 thanks dhinus 
[14:26:30] <paladox>	 seems to be working
[14:28:38] <dhinus>	 yw!
[16:34:15] <Lucas_WMDE>	 !log wikidata-dev wikibase-product-testing-2022: shut down instance, probably no longer needed and can be removed later
[16:34:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL
[17:19:40] <dcaro>	 !log redirects rebooted redirects-nginx02 as it was non-responsive
[17:19:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL
[19:12:39] <mutante>	 do you guys know about this:
[19:12:40] <mutante>	 modules/profile/manifests/toolforge/grid/exec_environ.pp:    class {'phabricator::arcanist': }  # T139738 and T287390
[19:12:40] <stashbot>	 T139738: Install Arcanist in toollabs::dev_environ - https://phabricator.wikimedia.org/T139738
[19:12:41] <stashbot>	 T287390: Install Arcanist in Toolforge grid engine - https://phabricator.wikimedia.org/T287390
[19:12:58] <mutante>	 lol, I linked myself to the tickets I should read 
[19:13:11] <mutante>	 is that still in use?
[19:14:16] <mutante>	 I assume yes, you can disregard my comments.
[19:17:19] <taavi>	 mutante: profile::toolforge::grid::exec_environ is going away "soon"
[19:20:28] <mutante>	 taavi: ok, thanks!  in this case I actually just had to realize that I don't need to touch that class
[19:20:41] <mutante>	 and why it's a separate one 
[19:39:20] <wm-bb>	 <bozzy> OK about many "Developer account not linked to Phabricator". But how to link to it to make bot happy in future runs?
[19:40:27] <wm-bb>	 <bozzy> I'm talking about the nice StrikerBot in phab
[19:47:14] <mutante>	 bozzy: when you log out of phabricator and back in.. it should give you 2 buttons to pick from: Login with MediaWiki or  Login with Developer (Wikitech) account
[19:50:51] <taavi>	 or, if you have an existing phabricator account, you'll want to add it via Settings -> External accounts, since logging out and back in via developer/ldap accounts will give you a duplicate account that you don't want
[20:12:55] <wm-bb>	 <ederporto> Hello, I'm finding an error when I try to enter webservice shell. I read that there are some action going on on this, but can not find easy-to-follow instructions on what to do. My paste bin is here (https://pastebin.com/hypRfNVY), and it is happening in several, if not all, of my tools (sara-wmb being the most important to me atm). Thank you in advance for any tips
[20:12:55] <wm-bb>	 or help :)
[20:18:13] <andre>	 "webservice shell" on which server?
[20:18:37] <andre>	 argh sorry, ignore my last line. This is #wikimedia-cloud already, sigh
[20:19:25] <andre>	 bozzy: https://phabricator.wikimedia.org/settings/panel/external/ per https://www.mediawiki.org/wiki/Phabricator/Help#Creating_your_account
[20:26:17] <wm-bb>	 <ederporto> Both my accounts are now linked and refreshed, but nothing changed
[21:25:18] <bd808>	 @ederporto: On dev.toolforge.org I was able to `sudo become sara-wmb;  webservice --backend=kubernetes python3.11 shell` with no errors. Can you tell me more about how you got that "TypeError: 'type' object is not subscriptable" failure?
[21:27:55] <wm-bb>	 <ederporto> This is my workflow (it worked until Sunday, at least):
[21:27:55] <wm-bb>	 <ederporto> PS C:\Users\Éder Porto> ssh -i id_rsa ederporto@@login-buster.toolforge.org
[21:27:57] <wm-bb>	 <ederporto> ederporto@tools-sgebastion-10:~$ become sara-wmb
[21:27:58] <wm-bb>	 <ederporto> tools.sara-wmb@tools-sgebastion-10:~$ webservice --backend=kubernetes python3.11 shell
[21:28:00] <wm-bb>	 <ederporto> Then I get that error (re @wmtelegram_bot: <bd808> @ederporto: On dev.toolforge.org I was able to `sudo become sara-wmb;  webservice --backend=kubernetes python3.11 shell`...)
[21:32:06] <bd808>	 @ederporto: thanks! I can recreate the crash on the legacy login-buster.toolforge.org bastion. My first guess is that this is a python version issue on the bastions. I would suggest that you can use `webservice shell` from login.toolforge.org or dev.toolforge.org which both have a newer Python version (3.11) which has been better tested with the latest `webservice` build.
[21:32:31] <bd808>	 I will open a bug about the crash under python 3.7 on the legacy bastion
[21:34:34] <wm-bb>	 <ederporto> Thank you! I was able to enter the shell! <3 I follow that workflow from the my first flask oauth tool tutorial since 2020 (re @wmtelegram_bot: <bd808> @ederporto: thanks! I can recreate the crash on the legacy login-buster.toolforge.org bastion. My first guess is that th...)
[21:47:34] <bd808>	 T368463 is the bug report for `webservice` crashing on login-buster.
[21:47:35] <stashbot>	 T368463: `webservice` (build 0.103.8) crashes on login-buster.toolforge.org (python 3.7) - https://phabricator.wikimedia.org/T368463
[21:50:21] <bd808>	 !log tools Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing T368463
[21:50:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[22:12:47] <AntiComposite>	 now, let me see if I can figure out GitLab
[22:28:26] <AntiComposite>	 gotta love structural capitalization
[22:28:50] <AntiComposite>	 https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/46 should fix it
[22:31:33] <bd808>	 AntiComposite: nice! I added a.rturo as a reviewer for you.
[23:17:06] <thcipriani>	 noticing that some requests from our CI runners are having intermittent trouble connecting to github (as well as a few other services), but it's very intermittent. Ran a test with running netcat in a loop and it happened for me, too. Have you noticed anything from other VPS projects? (I can provide a few timestamps if that's helpful)
[23:21:39] <thcipriani>	 more context on https://phabricator.wikimedia.org/T362425
[23:22:00] <bd808>	 thcipriani: I would broadly say that Cloud VPS is known to have networking "blips" that have never really been tracked down, but if you have some solid data on a failure you should start a Phab task. Unfortunately our main networking gurus (a.rturo and t.aavi) are EU timezone folks.
[23:23:47] <thcipriani>	 gotcha, "solid data" is a stretch. happened at 6:10:22utc across 3 vms in the integration projects is about where I decided to ask :)
[23:24:39] <bd808>	 I think the contint boxes all switched to a new networking setup really recently as part of T326373. That might be related. 
[23:24:40] <stashbot>	 T326373: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373
[23:27:59] <thcipriani>	 hrm, seems to have started in earnest today, UTC morning. which might align with closing a subtask there https://phabricator.wikimedia.org/T358761, I'll admit I'm not well-versed enough to know if that is happening close to the integration project at all?
[23:29:29] <wm-bb>	 <lucaswerkmeister> /me unhelpfully seconds that the errors seemed to be a lot worse today than before
[23:31:10] * bd808 says a thing on the task about the download failures
[23:31:39] <thcipriani>	 <3