[00:09:04] I did get this email too and I too was confused by it (re @lucaswerkmeister: so… did that just email all the toolforge “viewers” (“everyone”)?) [00:32:20] I did not get the initial mail, but got the followup email :( [00:44:18] unfortunately the followup was sent to cloud-announce [05:25:08] :-( webservice broken [07:47:26] webservice php8.2 shell gives "Error from server (Forbidden): pods "shell-1719301586" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.containers[0].securityContext.procMount: Invalid value: "DefaultProcMount": ProcMountType is not allowed] [07:51:12] https://phabricator.wikimedia.org/T362050#9919714 [08:03:16] Wurgl: I'm working on a fix [08:03:27] I just downgraded toolforge-webservice on login.toolforge.org to the previous version (same that's in login-buster), things seem to be working again [08:03:39] dcaro: thanks [08:03:41] arturo: strange that in lima-kilo it did not fail [08:04:07] versions are the same, and feature flags seem the same too (`kubectl get pods -n kube-system -o yaml | grep feature-gate` reports the same) [08:05:56] yeah [08:05:56] there's a difference in the admission plugins enabled (tools has `EventRateLimit` extra) [08:06:02] but that seems unrelated [08:06:23] and same in toolsbeta, I believe it did not show the issue in toolsbeta either [08:07:40] we should add webservice shell to the functional tests too, I can do that [08:07:48] thanks [08:08:46] dcaro: mmmmm what if the diff is in the PSP definition? [08:09:14] I mean, in the PSP settings themselves having a drift [08:09:30] I could investigate that as well [08:09:49] it might be yes [08:10:15] the error log https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/37#note_90452 is not specific about that being enabled or not (though not being explicitly enabled and available in lima-kilo makes me wonder) [08:11:03] mmmm [08:11:34] wait, the defaultProcMount is not injected by the tools-webservice code I deployed yesterday [08:11:42] this should be injected by the Kyverno policy maybe [08:11:54] so this may be a case of kyverno fighting PSP [08:12:04] it's in the patch [08:12:09] no, sorry, it is injected by the patch [08:12:25] https://usercontent.irccloud-cdn.com/file/xx24077x/image.png [08:12:28] is not this in the jobs-api as well? [08:13:07] yep [08:13:09] https://www.irccloud.com/pastebin/a2Jxwjfu/ [08:13:27] maybe it's in the wrong place? (pod/container/...) [08:14:23] seems to be in the right place [08:14:31] in jobs api it's added in `containers[].securityContext.procMount` [08:15:11] same [08:15:13] same as spec.containers[0].securityContext.procMount, no? [08:15:19] ok [08:19:06] so maybe the explanation is, as hinted by bryan, is the feature gate is enabled/disabled inconsistently [08:21:13] kubectl version returns the same commit hash [08:21:23] different build date though [08:22:09] (limactl having a newer build, tools an older one) [08:22:12] *lima-kilo [08:24:06] the procmounttype featuregate is disabled in the 3 setups [08:25:03] that's what's sounds weird yep [08:25:27] toolsbeta has the same exact build (hash and date) than tools [08:25:33] Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.17", GitCommit:"22a9682c8fe855c321be75c5faacde343f909b04", GitTreeState:"clean", BuildDate:"2023-08-23T23:37:25Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"} [08:25:50] maybe we should move the chat to -admin xd (stop spamming people) [08:27:22] ack [09:42:17] !log toolsbeta deploy toolforge-webservice 0.103.8 (T362050) [09:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [09:42:21] T362050: toolforge: review pod templates for PSP replacement - https://phabricator.wikimedia.org/T362050 [09:44:02] !log tools deploy toolforge-webservice 0.103.8 (T362050) [09:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:46:39] !status OK [09:47:08] :-( dhinus could you please do `!status OK` here? [09:48:19] !status Ok [09:48:36] thanks [09:49:31] I never remember what is the required permission to call !status [09:53:40] the wikimedia cloak, which I don't have [09:53:52] (and refuse to request) [09:54:11] haha ok :) [10:58:11] Strange! Some script started as expected at 00:42 GMT and started unexpected(!) again at 7:37? [10:58:47] toolforge jobs list -o long outputs for this job: [10:58:49] | thuesday | ~/jobs/thuesday.sh | schedule: 42 0 * * 2 | php8.2 | none | yes | /data/project/persondata/logs/thuesday.out | /data/project/persondata/logs/thuesday.err | none | mem: 2G, cpu: default | all | no | none | Running for 10h14m27s | [10:59:07] It is the persondata Tool [11:00:00] Aha! It was re-started for some reason [12:40:06] dcaro: o/ [12:40:36] hello \o [12:41:38] I am working on https://phabricator.wikimedia.org/T367978 as exercise, rolling out a new version of glibc (no security issues to patch) - the cloudceph osd/mon nodes have the new version, and debdeploy reports that some ceph-related daemons need to be restarted. Would it be possible to restart one/two nodes just to make sure that everything is ok? The rest can be done anytime in the future, maybe [12:41:44] coupled with reboots etc.. [12:42:07] Andrew told me to ping you for the nodes, in case you're not the sme lemme know who I can ping [12:43:48] elukey: sure, which ones would you want to restart? [12:44:51] dcaro: pick one of your choice, they should all have the new glibc version! Thanks! :) [12:45:33] elukey: cloudcephosd1015.eqiad.wmnet works? some were rebooted not long ago [12:48:51] dcaro: ah lovely these ones are buster nodes, gimme a sec [12:49:25] yep sorry :/, still have to upgrade them, been dealing with some hardware issues and trying to avoid doing the upgrade while they are not resolved [12:50:08] dcaro: nono sorry from me, on buster the version is already ok, you can skip restarts! My pebcak [12:50:32] I am trying to do what Moritz does and I am clearly not up to the task, be patient :D [12:50:34] okok, so one of the newer ones then, maybe cloudcephosd1030? [12:51:28] for the moment I am good, I'll ping you in case more buster/bookworm nodes pop up if you are ok! [12:52:25] okok! thanks! [13:56:55] Hi, wikibugs seems to be down (or has not posted anything since 13:21pm UK time) [14:00:40] paladox: there were some IRC issues today that also affected other bots (e.g. wm-bot) [14:01:01] ah [14:01:22] probably it just needs to rejoin, but I've never touched it before [14:01:59] I can try restarting it like this https://www.mediawiki.org/wiki/Wikibugs#Deploying_changes [14:09:55] !log tools.wikibugs webservice restart; jobs restart irc (lost IRC connection) [14:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [14:11:13] I think it's back [14:26:26] thanks dhinus [14:26:30] seems to be working [14:28:38] yw! [16:34:15] !log wikidata-dev wikibase-product-testing-2022: shut down instance, probably no longer needed and can be removed later [16:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-dev/SAL [17:19:40] !log redirects rebooted redirects-nginx02 as it was non-responsive [17:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL [19:12:39] do you guys know about this: [19:12:40] modules/profile/manifests/toolforge/grid/exec_environ.pp: class {'phabricator::arcanist': } # T139738 and T287390 [19:12:40] T139738: Install Arcanist in toollabs::dev_environ - https://phabricator.wikimedia.org/T139738 [19:12:41] T287390: Install Arcanist in Toolforge grid engine - https://phabricator.wikimedia.org/T287390 [19:12:58] lol, I linked myself to the tickets I should read [19:13:11] is that still in use? [19:14:16] I assume yes, you can disregard my comments. [19:17:19] mutante: profile::toolforge::grid::exec_environ is going away "soon" [19:20:28] taavi: ok, thanks! in this case I actually just had to realize that I don't need to touch that class [19:20:41] and why it's a separate one [19:39:20] OK about many "Developer account not linked to Phabricator". But how to link to it to make bot happy in future runs? [19:40:27] I'm talking about the nice StrikerBot in phab [19:47:14] bozzy: when you log out of phabricator and back in.. it should give you 2 buttons to pick from: Login with MediaWiki or Login with Developer (Wikitech) account [19:50:51] or, if you have an existing phabricator account, you'll want to add it via Settings -> External accounts, since logging out and back in via developer/ldap accounts will give you a duplicate account that you don't want [20:12:55] Hello, I'm finding an error when I try to enter webservice shell. I read that there are some action going on on this, but can not find easy-to-follow instructions on what to do. My paste bin is here (https://pastebin.com/hypRfNVY), and it is happening in several, if not all, of my tools (sara-wmb being the most important to me atm). Thank you in advance for any tips [20:12:55] or help :) [20:18:13] "webservice shell" on which server? [20:18:37] argh sorry, ignore my last line. This is #wikimedia-cloud already, sigh [20:19:25] bozzy: https://phabricator.wikimedia.org/settings/panel/external/ per https://www.mediawiki.org/wiki/Phabricator/Help#Creating_your_account [20:26:17] Both my accounts are now linked and refreshed, but nothing changed [21:25:18] @ederporto: On dev.toolforge.org I was able to `sudo become sara-wmb; webservice --backend=kubernetes python3.11 shell` with no errors. Can you tell me more about how you got that "TypeError: 'type' object is not subscriptable" failure? [21:27:55] This is my workflow (it worked until Sunday, at least): [21:27:55] PS C:\Users\Éder Porto> ssh -i id_rsa ederporto@@login-buster.toolforge.org [21:27:57] ederporto@tools-sgebastion-10:~$ become sara-wmb [21:27:58] tools.sara-wmb@tools-sgebastion-10:~$ webservice --backend=kubernetes python3.11 shell [21:28:00] Then I get that error (re @wmtelegram_bot: @ederporto: On dev.toolforge.org I was able to `sudo become sara-wmb; webservice --backend=kubernetes python3.11 shell`...) [21:32:06] @ederporto: thanks! I can recreate the crash on the legacy login-buster.toolforge.org bastion. My first guess is that this is a python version issue on the bastions. I would suggest that you can use `webservice shell` from login.toolforge.org or dev.toolforge.org which both have a newer Python version (3.11) which has been better tested with the latest `webservice` build. [21:32:31] I will open a bug about the crash under python 3.7 on the legacy bastion [21:34:34] Thank you! I was able to enter the shell! <3 I follow that workflow from the my first flask oauth tool tutorial since 2020 (re @wmtelegram_bot: @ederporto: thanks! I can recreate the crash on the legacy login-buster.toolforge.org bastion. My first guess is that th...) [21:47:34] T368463 is the bug report for `webservice` crashing on login-buster. [21:47:35] T368463: `webservice` (build 0.103.8) crashes on login-buster.toolforge.org (python 3.7) - https://phabricator.wikimedia.org/T368463 [21:50:21] !log tools Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing T368463 [21:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:12:47] now, let me see if I can figure out GitLab [22:28:26] gotta love structural capitalization [22:28:50] https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/46 should fix it [22:31:33] AntiComposite: nice! I added a.rturo as a reviewer for you. [23:17:06] noticing that some requests from our CI runners are having intermittent trouble connecting to github (as well as a few other services), but it's very intermittent. Ran a test with running netcat in a loop and it happened for me, too. Have you noticed anything from other VPS projects? (I can provide a few timestamps if that's helpful) [23:21:39] more context on https://phabricator.wikimedia.org/T362425 [23:22:00] thcipriani: I would broadly say that Cloud VPS is known to have networking "blips" that have never really been tracked down, but if you have some solid data on a failure you should start a Phab task. Unfortunately our main networking gurus (a.rturo and t.aavi) are EU timezone folks. [23:23:47] gotcha, "solid data" is a stretch. happened at 6:10:22utc across 3 vms in the integration projects is about where I decided to ask :) [23:24:39] I think the contint boxes all switched to a new networking setup really recently as part of T326373. That might be related. [23:24:40] T326373: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373 [23:27:59] hrm, seems to have started in earnest today, UTC morning. which might align with closing a subtask there https://phabricator.wikimedia.org/T358761, I'll admit I'm not well-versed enough to know if that is happening close to the integration project at all? [23:29:29] /me unhelpfully seconds that the errors seemed to be a lot worse today than before [23:31:10] * bd808 says a thing on the task about the download failures [23:31:39] <3