[01:32:09] blancadesal: There wasn't an exact task for the "add logs to jobs emails" thing I mentioned, but T306310 was close enough that I added the idea as a comment there. [01:32:10] T306310: toolforge-jobs job emails should have information on why events happened - https://phabricator.wikimedia.org/T306310 [02:08:01] * bd808 off [09:09:21] bd808: I considered myself nudged xd [10:16:54] anybody knows what the grid engine alert "ToolsGridQueueProblem" is about? [10:17:15] I sshed to the host but I can't figure out what is triggering the alert [10:18:04] that's usually a job that failed and was flagged as errored [10:18:07] there should be a runbook [10:18:52] yes, there is [10:19:23] I was trying to reverse-engineer the alert but maybe I should just follow the runbook :) [10:20:32] ok I found the failed job [10:20:51] "queue webgrid-generic marked QERROR as result of job 3168495's failure at host tools-sgewebgen-10-2.tools.eqiad1.wikimedia.cloud" [10:21:27] you can follow then https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem#Debugging_a_failed_job [10:21:49] this is the most common issue https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsGridQueueProblem#Epilog_failed_on_webgrid_nodes [10:22:02] and the solution unless it happens very often, is just to clear the queues (there's a cookbook also) [10:23:10] the link in the runbook for clearing the queues got broken, fixing [10:23:53] okok, done :) [10:24:54] ha I was editing the same link :P [10:27:23] xd [10:27:52] I'm getting again issues with pyyaml 5.4.1 when trying to rebuild the cookbooks venv (python 3.12), do you remember how that was sorted out? [10:28:24] (it tries to build the wheel and fails with AttributeError: cython_sources) [10:29:38] hmm let me try on a clean venv [10:30:22] in the meantime I did run the cookbook from cloudcumin and it seems to have run "qmod -c", but according to the runbook I still need to run "qmod -cq" manually [10:30:54] hmm, I don't remember that, does it say why? [10:31:49] the cython_sources error is probably T345337? [10:31:50] T345337: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 [10:33:15] I thought that was sorted :,( [10:40:41] I'm installing a custom spicerack with https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/966492/ in it to be able to install wmcs :/ [10:44:26] hmpf... kafka-python now does not work with python 3.12 xd [10:45:53] https://github.com/dpkp/kafka-python/issues/2412 [10:47:19] the fix will be out with 2.0.3 (last release 2.0.2 was in september) [10:48:32] september 2020 though :D [10:49:06] probably worth opening a phab task [10:49:48] oh, missed the year, wow [10:52:08] back to the grid issue, the alert is gone [10:53:06] the timing matches when I ran the cookbook so I think it worked [10:53:54] dhinus: done :) [10:54:03] T354410 [10:54:03] T354410: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410 [10:54:45] thanks! [10:57:04] quick review https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/25, that enables us upgrading the builder image without having to change anything on the builds-builder code/deploy config [10:58:53] approved [11:00:15] thanks :) [11:46:47] another little step https://phabricator.wikimedia.org/T354415 [11:46:47] xd [11:47:19] T354415 [11:47:19] T354415: [cumin] urllib >= 2 fails with the new internal certificates - https://phabricator.wikimedia.org/T354415 [11:59:40] * dcaro lunch [14:07:11] quick review, I'm afraid that the gitlab redirects might expire or similar xd https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/23 [14:17:36] approved [14:34:52] thanks! [15:06:59] TIL that the shell hashes paths: "When you run a command in the shell, it remembers the path of the executable in a hash table to avoid searching the PATH every time" [15:08:30] (trying to get lima-vm to work on linux, which is supported but undocumented) [15:08:31] yep, you can run `hash` to see it [15:08:44] and `hash -r` to clear it :) [15:09:03] :) [15:11:03] I feel really weird that most of us is using mac instead of linux :/ [15:12:33] is raymond using mac now too? [15:12:55] I think so, but not sure [15:18:20] it feels contradictory to me being strict about using open-source on toolforge (and everywhere) but quite lenient on the tools we use instead, but anyhow, experienced inconsistency is part of life xd [15:20:46] we'd need open-source furniture to put the laptops on too, though xd (not being sarcastic, it's fun to think about) [15:21:40] I have an open-hardware keyboard :) [15:21:42] if that help [15:21:44] *helps [15:21:51] and open-hardware laptop too [15:21:55] (personal though) [15:22:29] dcaro: If I think about the oss rule as about forkability rather than about free software dogma it feel consistent to me. [15:23:09] not that I have anything against free software dogma :) [15:24:27] well, I think it comes as a consequence of free-knowledge, doesn't it? [15:24:34] at least there is dog in dogma :D [15:25:07] (and now I feel like building a new office desk with my bare hands) [15:25:30] for forkability, any of the new "hashicorp/elasticsearch" ugly licenses might be enough for us I think [15:26:18] Oh yes, I'm not saying it's ideal that we use proprietary tools individually. Just that me using a mac doesn't prevent other people from doing the same work that I'm doing /without/ a mac. Which would be different if e.g. we integrated proprietary software into hosting wikis. [15:26:20] I have built my own in the past! (well, my father and me), and it was a standing/non-standing thingie using weights and such [15:26:46] used it for a couple years, it was awesome, with two screens setup incorporated and plugs and such [15:27:26] I quite like the "doesn't prevent other people from doing the same work that I'm doing /without/ a mac" argument, hadn't thought of it this way [15:27:30] andrewbogott: that's the thing, now by setting up dev environments and such we are starting to see some friction there (lima-vm for example, docker, k8s, ... all those needed extra care because mac/linux differences) [15:27:40] yeah, you're right about the new licenses. IMO I'm not actually especially against those licenses, I just want to outsource the job of reviewing/approving them to OSI rather than having to spend my days debating the merits of every new home-made license [15:28:23] dcaro: agreed, that's part of why I wrap everything in linux VMs running on my mac rather than try to use my mac itself as a test/dev platform. [15:28:39] (although that's as much to avoid edge cases, which Macs have plenty) [15:29:14] why not use linux directly instead? [15:31:33] Personally it's because I spent a decade working in the linux on the desktop space and carrying a linux laptop and burned HUGE amounts of time trying to get my damn sound and wifi drivers to work. When I started needing a mac laptop I experienced great relief and have been ever after afraid of going back :D [15:32:15] Some people like shopping for home PC hardware, optimizing it, upgrading it, etc. etc. I do not! [15:32:57] ^ I'm aware that that experience is increasingly out of date and I should probably review the new status quo for linux-on-laptops [15:35:36] I think I have a similar experience to andrewbogott here, I was Linux-only (both personal and work laptop) for years (I did actually enjoy shopping for hardware!) but then I started working in jobs where I needed a Mac and learnt to appreciate the convenience of having a system that requires less tinkering [15:35:43] it's been a while since I had to fight with any of those yep xd (>10 years for sure, graphic cards less though, nvidia with optimus has only been an nice experience in the last few years) [15:36:55] I will very strongly defend interoperability though, and I always hated where I was in companies that wrote tools and scripts designed for Macs, basically preventing anyone using Linux from using those [15:37:28] oh... those 10 years are actually way more... it's been 3 since I'm around here... time goes fast [15:37:37] for an example of my mentality: I also don't really customize my laptop at all. I want it to work out of the box, and if it breaks I want to be able to buy a new one and have it act exactly the same without having to spend a week configuring and retraining my fingers. For now, Apple does a good job of getting me that. [15:38:21] can empathize, am also addicted to mac UX xd [15:38:42] considering our community is likely to include both Linux and Mac users, I think it's actually very good that we have a ~50% share of the two systems in the team [15:38:47] Hm. When I started using a Mac their core OS was still nominally open sourced. I have no idea if any of that ecosystem still exists. [15:38:55] we are not 50% anymore though [15:39:08] dhinus: that's a slippery slope to making someone carry a windows machine! [15:39:27] andrewbogott: sorry, you've been randomly picked to be that person :D [15:39:36] installing a default debian/fedora/ubuntu is quite seamless also [15:39:46] (on most laptopts, ex. not mac) [15:39:48] dcaro: wait... are you the only one left using linux now? [15:39:55] and taavi [15:39:58] ah yes [15:40:02] Arguably if I'm claiming that my Mac is just a wrapper around VMs and OSS anyway then I should be just as happy with Windows right? [15:40:12] or linux! [15:40:18] haha [15:40:21] yep [15:40:33] mac is a nice wrapper though [15:40:53] the fact that our users use mac or linux is not a good reason, or we would use also aws/etc., as our users (the tech community) uses that too [15:41:02] My knowledge is way out of date, I've only ever used an X-based linux laptop and as I understand it that's not actually what they use these days. [15:41:12] there's wayland now yes [15:41:18] replacing X [15:41:50] I was running Wayland about 10 years ago because I was using Debian unstable :P [15:42:17] Boy, when wayland was announced it seemed comically ambitious and/or impossible. I'm impressed that they followed through. (That was a Shuttleworth windmill wasn't it?) [15:42:24] nowadays I only see the difference in the commands to do stuff, though lately I don't even have to use xrandr and such [15:42:30] (lately meaning >10 years xd) [15:42:51] oh, huh, I guess it was from redhat [15:44:18] * andrewbogott is now remembering being a motif developer on hpux [15:45:53] y'all cannot imagine how much worse (l)unix-based windowing systems were than MS/Apple back in the day. I might be permanently traumatized. [15:46:20] I remember running startx... [15:46:22] When I was doing MediaWiki-Vagrant install parties at hackathons I found that the vast majority of new contributors to our technical volunteer community were using Windows laptops. [15:46:47] in india I think windows is quite popular [15:47:20] The AMD/ARM issues that are popping up are more about hardware vendor lock-in than they are about OS in my opinion [15:47:35] yep, though macos comes with both no? [15:47:48] I think supporting users on MacBooks is different from supporting AWS: some people already own Macs and they should be able to use them to interact with our tools, including local development, etc. Same for Windows ideally. [15:47:48] it's as much hardware lock-in as os lock-in [15:48:17] dhinus: well, you could say that people already deploy on aws [15:48:30] at the same time, I've considered more than once to switch to Debian for my WMF-related activities, and I might do it at some point [15:50:05] When they start making linux ARM laptops won't we have the same issues? [15:50:06] dcaro: I'm not sure about that. OSX is mostly a window manager for me. All of the apps I use day to day are things like Firefox, Docker, ssh, vim, tmux, and a metric ton of SaaS things. The fact that Docker is functionally connected to Linux kernel internals is probably the most proprietary thing. [15:50:13] Or are folks already running linux/arm laptops? [15:50:45] dcaro: it's easier to migrate from aws to toolforge than buying a new laptop though :) [15:50:53] (and imo using macos for development does not match the "doesn't prevent other people from doing the same work that I'm doing /without/ a mac", as unless you make things specially interoperable, you will end up needed a mac to do it) [15:51:15] andrewbogott: there's already linux/arm, and ever linux on mac [15:51:26] "make things specially interoperable" is usually my preferred goal :) [15:51:48] as in, whatever system you have (mac or linux), write code thinking that others will run it in a different platform [15:52:20] it could even be a different linux distribution, version, etc. -- of course it's hard to support all possible combinations [15:52:30] vagrant is basically the andrew system of 'use a headless linux VM no matter what your laptop/window manager is' right? So it's only now with the move off vagrant that we're starting to have OS-native workflows? [15:52:57] we had already [15:53:13] we moved to vagrant/vm because of those xd [15:53:22] (kind, minikube, etc.) [15:53:27] docker-compose, scripts, ... [15:53:33] hm, true [15:53:44] The ARM + Docker issues are about ARM. There is still a linux VM in the mix; it just has a different CPU design by default. [15:54:26] the network, security, mounts, and other issues we had with docker (besides arm) are also mac-specific [15:54:41] (a lot of them at least) [15:55:20] essentially, because mac uses a linux VM for docker, that VM needs interoperability with the macos to be able to be accessed, and access stuff [15:55:32] Hey, this is a total side-track, but... does anyone know the state of macOS and open source? I'm sure that it originated as a BSD fork and even had some community-maintained bits. Is that entirely/officially no longer true? [15:55:58] https://opensource.apple.com/projects/ [15:56:00] mac specific or differences between Linux and MacOs hosts? Is Windows transparently compatible? [15:56:01] the 'open source at apple' portal definitely does not mention macos [15:56:35] bd808: no, windows had it's own issues too with docker + kind/minikube [15:56:40] that were different than others [15:56:53] (one of the reasons why raymond started using a VM too) [15:57:05] yeah, so cross-platform stuff. which is honestly to be expected [15:57:14] (The root of my question is: I still think of macos as 'just another weird bsd-licensed unix variant, which a shiny proprietary windowing system on top. But maybe that's totally not true anymore) [15:57:47] bd808: agree, but the thing is that one of those platforms is free, the other two will force you to pay a license (of some sort) [15:57:54] Well, to be totally honest, I still sometimes think of macos as a fork of nextstep! [15:58:00] * andrewbogott withers into dust [15:58:07] oh, are we on a gratis argument now? [15:58:42] andrewbogott: https://en.wikipedia.org/wiki/XNU I think this is still mostly right? [15:58:56] not gratis, free (from andrew's argument before "doesn't prevent other people from doing the same work that I'm doing /without/ a mac") [15:59:45] thanks bd808! So arguably I am 'running' an osi-approved OS, as long as I don't let any pesky windows intrude on my workflow :p [16:00:05] "the other two will force you to pay a license (of some sort)" sounded like a switch to a gratis basis argument rather than the libre argument [16:00:47] s/pay a license/pay a fine/ [16:01:29] essentially, you will not be able to use them, share them, modify them or distribute them without paying [16:01:43] (or breaking the law) [16:02:03] andrewbogott: I think that's just the kernel, you might have trouble with the tools around it maybe [16:03:25] yeah, i'm sure it's a big jumbled mess [16:03:34] I can see the reasoning where if we're not using Terraform because it's not OSI-compliant, we should not be using macOS either. at the same time, the most important thing in my mind remains to avoid lock-in, so as long as I'm confident that I can switch to Linux tomorrow and continue to use all of my WMF tools, I think it's ok-ish? [16:04:08] if we continued to use Terraform, as it diverges from the open source fork, we would not easily be able to switch in 1 year or 23 [16:04:11] *or 2 [16:04:12] it's ok-ish anyhow, life goes on [16:04:40] I see the Terraform/AWS/etc. situation more risky, but I could be wrong [16:04:41] by the looks of it, it seems that opentofu might start changing sooner than terraform xd [16:04:56] true, terraform will probably remain behind :D [16:05:04] MacOS switched the default shell to zsh to avoid GPLv3 [16:05:24] and I've switched my default shell back to Bash the same day :D [16:10:52] I feel like I'm working on a vegan clothes store, wearing leather shoes xd [16:11:01] anyhow, gtg, cya on monday [16:11:57] thanks for the discussion :) [16:14:25] have good weekend :) [16:14:28] a [16:15:53] When I first started here one of the teams I was on had a strident FOSS believer as well. When we started using Google video calls for team meetings he would only join if someone else started the call in a conference room he was in. That's being vegan about your software licenses. [16:18:27] (That strident individual is still here, but holds slightly different views about license purity today as I understand it) [16:20:25] there's a big difference in effort between trying to setup an open-source video-conferencing system, and doing the work we do on a linux (that we do anyhow, as what I understand is that everyone just starts a linux VM anyhow), I don't think I'm being unreasonable or "strident" by raising the question, please don't misinterpret/repaint my argument for an extreme one. [16:27:09] dcaro: fwiw I don't think your argument is "extreme", I very much support the idea of making sure that people can work and volunteer on our projects using Linux laptops and if some things only work on a Mac we have a problem. I'm less convinced about declaring Linux as the only supported platform, but maybe we should? anyway it was good food for thought! [16:30:04] * dhinus will have another go at trying to move most work things into a Debian VM [16:31:02] * dhinus is also tempted to use Arch instead :P [16:42:34] I think that I /should/ be using a linux laptop, and I also think that I'm exhausted with linux laptops. [16:42:50] not exactly an ideological position :) [16:46:42] changing topic, toolsbeta-bastion-6 is again alerting, I restarted a few times recently (in the last 2/3 weeks) [16:46:55] this time I tried accessing the virsh console, and it seems fine, so maybe network? [16:47:50] some failed systemd units [16:52:03] hmm seems to have an ipv6 address but no ipv4 address [17:03:02] I think you should just build -7 and delete -6 since -6 is clearly cursed. [17:03:43] after reading some logs, apparently wmcs-wheel-of-misfortune killed systemd-networkd [17:04:11] Oh, it's /intentionally/ cursed! [17:05:57] do you know anything about that script? [17:06:32] found some info in the script itself [17:07:14] maybe we need to add an exclusion [17:12:07] T354430 [17:12:08] T354430: toolsbeta-bastion-6 crashes often - https://phabricator.wikimedia.org/T354430 [17:19:43] I don't know much, other than that it exists [17:21:15] my hunch is that it's too aggressive in Bookworm [17:21:30] I can reproduce with a dry-run that it wants to kill other system processes [17:21:31] wmcs-wheel-of-misfortune is a thing Brooke made to try and keep people from running long lived processes on the bastions. [17:21:50] It has a blck list of things to never kill inside it I think [17:21:52] whereas in the main tools bastion it only kills tools.* processes [17:22:15] there is an exclusion list but for some reason is not workin on that host (maybe because bookworm) [17:23:45] If it is causing mystery problems, we could try just turning it off for a while [17:24:11] well it's only on toolsbeta so far [17:24:28] so no big impact, I will try to fix it next week [17:24:34] and restart that bastion in the meantime [17:24:41] I don't have any sense of how much good it actually does, but maybe taavi does. [17:25:33] I do feel like we have far fewer IRC complaints of the bastions being unusably slow than say 4 years ago, but many things have changed in that time [17:25:56] I think it makes sense to have it, and hopefully it won't be too hard to fix [17:26:10] it even sends a polite email to the user whose job gets killed! [17:58:51] dhinus: heh. I see you've traced the problem back to poettering. I should have immediately guessed that from obvious systemd involvement. [17:59:20] hahaha [17:59:32] We had a bug in Toolforge that turned out to be him arbritrarily redefining the allowed chars for a username [18:00:23] I hope we don't have any tool users with UID between 500 and 999 [18:00:52] I'm also not entirely sure when the UIDs changed, if it was Bookworm or before [18:01:04] https://github.com/wikimedia/labs-striker/blob/master/striker/settings.py#L401 -- "LABSAUTH_MIN_UID = env.int("MIN_UID", default=500)" [18:02:28] :/ [18:03:37] Let me see if this is practical concern. Just have to remember how to do a range search in LDAP. [18:04:42] the only systemd uids seem to be 997 and 998 [18:04:45] brion's UID is 500, tim's in 501 [18:05:29] I think they will be fine, but if some users have 997 or 998 they will overlap [18:06:01] added benefit: brion's and tim's processes will no longer be killed by the wheel of misfortune :D [18:06:34] There are not a full 500 overlapping. Looks like only 30 [18:08:26] not too bad [18:08:48] how do new UIDs get assigned? [18:08:56] is there a risk more will be assigned in that range? [18:09:00] find largest and add 1 [18:09:05] ok then [18:09:18] maybe at some point it was decided to skip that range? [18:10:00] I have to log off, if you find something please leave a note at T354430 :) [18:10:01] T354430: wmcs-wheel-of-misfortune kills system processes - https://phabricator.wikimedia.org/T354430 [18:10:20] https://phabricator.wikimedia.org/P54529 [18:10:31] thanks for that list! [18:11:29] mgrabovsky overlaps with the "debian" user in bookwork (uid=1000) [18:11:35] *bookworm [18:11:59] not sure how badly that will break things. maybe not too badly ;) [18:12:23] they are a toolforge member too, so I guess we will find out at some point :/ [18:12:50] No tools owned by them though it seems [18:14:06] we might consider disabling their account just in case, or finding out if we can change the UID [18:14:33] 1000 is "debian" even in the old bastion [18:14:50] (buster) [18:15:23] they look to have been inactive for a while (at least in Gerrit) -- https://gerrit.wikimedia.org/r/q/owner:mgrabovsky%2540yahoo.com [18:18:19] maybe we got lucky (speaking of wheels of misfortune :P) [18:18:34] * dhinus off [19:04:01] * bd808 lunch [19:22:45] dhinus: we have the adduser class in Puppet which defines these ranges (both for the adduser tool and since a few years also for systemd-sysuser) [19:23:45] 100-499 are for system users which are purely local (usually created by Debian packages) [19:24:22] IOW where it doesn't matter what the UID is [19:24:54] we have a few legacy human users like Brion who are in that < 1000 space, but otherwise humans start at 1000 [19:25:28] the range between 900-999 is centrally assigned via modules/admin/data/data.yaml for system users which need to be identical across system boundaries [19:26:40] e.g. of files are synched across systems and the UID needs to be persistent