[10:21:27] Hi, good morning! [10:22:05] I received an e-mail telling me my bot Is Still Running On Strech Grid Engine [10:22:44] But I don't know how to migrate it to another place [10:23:22] it seems it better be Kubernetes [10:23:44] Do you have some instructions on how to do it for newbies? [10:24:41] Thanks in advance [10:27:49] @Pau, see https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/EPJFISC52T7OOEFH5YYMZNL57O4VGSPR/ for the announcement and some context [10:28:00] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation has pointers on how to move tools over [10:28:32] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework documents the new Kuberneted-based jobs framework. It still has some sharp edges but I've moved three tools over the weekend without too much hassle [10:39:14] ok, thanks [10:39:23] valhallasw [10:39:29] I'll try [10:39:46] If I get stuck, I'll ask again [12:30:39] !log paws update pywikibot 6db74b6c866021686a49aa8a2c4eb2d5da3bdddb [12:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [13:40:05] !log admin shutting down many codfdfw1dev servers (including network infra!) for T305469 [13:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:40:08] T305469: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469 [15:23:36] !log admin reimaging cloudvirt1020, leaving VMs in place [15:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [15:27:56] bd808: not sure if I should open another phabricator ticket for this, but would it be possible to install libexiv2-dev on python39 images, similar to how it was done in https://phabricator.wikimedia.org/T213965? Trying to install py3exiv2 on a k8s job. [15:31:05] DatGuy: it would take a new phabricator task, but also I really don't think we will approve this request. We have been very consistent in not installing rarely used libraries and utilities in the containers. It is just not scalable at this time. [15:31:50] There is work in progress towards a build system that will support custom containers. Until that arrives my best advice is to keep using grid engine for this task. [15:31:55] DatGuy: which pip package are you trying to install? looks like https://pypi.org/project/pyexiv2 ships with manylinux wheels meaning that if you have the 'wheel' pip package installed in your venv you can use a pre-compiled version [15:53:23] taavi: that's for python2. Trying to install py3exiv2 for python 3 support [15:57:53] if I'm reading the pip page correctly, that supports python 3 and is fully separate from a different project with the exact same name that only supports py2.. [15:58:11] at least I can install it inside a python3 container without any issues [15:58:44] Not for me. When I try and install py3exiv2 inside a tf-python39 image it throws an error since it doesn't have libexiv2-dev [15:58:58] Does work if I just install it in a venv though because of the phab ticket [16:00:10] no, I mean `pyexiv2` (as in the package on pip that supports python 3) installs for me inside a container, `py3exiv2` does not since [16:06:47] ah, well it installs but doesn't import. "OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found" [16:50:50] !log tools.tool-db-usage Updated to 3d178a5 (D1201) [16:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tool-db-usage/SAL [17:08:30] yeah, there are two py3 exiv bindings and they both kinda suck [17:29:06] !log paws updating links to phab with prefilled ticket links aef7c671a69a66a9872a48a24169f1f6bf2ffc4f [17:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [17:30:10] !log quarry exposing query history https://phabricator.wikimedia.org/T100982 [17:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [18:34:09] !log quarry update phab links to prefilled ticket https://phabricator.wikimedia.org/T303028 [18:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL [19:07:16] !log devtools - gitlab-prod-1001 randomly stopped working. we got the "puppet failed" mails without having made changes and can't ssh to the instance anymore when trying to check out why. trying soft reboot via Horizon T297411 [19:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [19:07:19] T297411: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 [19:08:19] !log devtools - gitlab-prod-1001 is indeed back after soft rebooting the instance. uptime 1 min T297411 [19:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [19:56:26] !log tools.heritage Adding the -release buster flag to cronjobs as part of Stretch deprecation [19:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [20:16:50] !log tools.slumpartikel Restarting webservice with the '--release buster' flag. php version compatibility needs to be investigated before migrating to kubernetes [20:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.slumpartikel/SAL [20:30:37] hello! does the new jobs framework support something like the cron's `-l` flag for scheduled jobs? specifically, I want to use `-l h_rt=` to set a job to run for a maximum amount of time [20:33:16] musikanimal: Kubernetes has a thing for that (.spec.activeDeadlineSeconds), but I don't think that a.rturo has exposed that in the current wrapper api. I bet he didn't even think of the potential need for it, so a feature request would probably be the right next step. [20:33:57] so this would be an upstream feature request? [20:34:38] no, a feature request for the new framework (which is a local project) [20:34:49] okay got it. I'll create a Phab ticket :) [20:35:40] also, is there something akin to `-once`? [20:36:19] you're the second person to ask about a max runtime limit [20:40:35] I think `.spec.parallelism = 1` is basically the same as `-once`, but again I don't think that a.rturo thought to expose that to users yet. [20:41:06] it defaults to 1 anyway [20:41:24] https://kubernetes.io/docs/concepts/workloads/controllers/job/ has a pretty good overview of all things a job can do from the Kubernetes POV [20:41:27] and for CronJobs, "concurrencyPolicy": "Forbid" is set [20:41:43] I was wondering if perhaps that was default behaviour [20:41:56] which will be similar to what running jsub -once from cron will do [20:43:43] it probably would be useful to have an option for concurrencyPolicy: Replace, (which kills the existing job and starts a new one instead of continuing the existing job), but that's a bit more advanced [20:44:14] * bd808 is actually using Replace in a job in the Wikimedia production Kubernetes cluster [20:47:41] not sure if the tags are right, but here we are: https://phabricator.wikimedia.org/T306391 [20:48:09] thanks everyone for your help! I'm going to test out the jobs framework on my other bots that don't need a max runtime [21:31:49] btw how do I know if I need NFS? [21:37:21] AmandaNP: the main reason to need NFS in a Cloud VPS project is for file systems that need to be shared across multiple instances. An example of this is the $HOME for each tool in Toolforge which needs to be available on the bastions, the grid engine exec instances, and the kubernetes exec instances. [21:38:02] Most Cloud VPS projects do not need NFS [21:38:24] Ahh ok. So I don't really need it. It's been converted to a VM, so can I just delete the instance or do I need to do more to not break things [21:41:11] AmandaNP: It sounds like you are saying that a.ndrewbogott moved NFS for a project from the old shared NFS server platform to a new instance inside your project. Before deleting the NFS server instance you would likely need to change some config in the other project instances to remove their config to mount from the NFS server. [21:42:41] https://phabricator.wikimedia.org/T301295 [21:43:17] yeah I missed that essentially ^ [21:43:41] *nod* And if T208414 is right that was just being used for $HOME? [21:43:42] T208414: Check whether utrs project requires NFS or not - https://phabricator.wikimedia.org/T208414 [21:44:28] afaik yes [21:45:47] That probably takes a little work to unwind. You would probably want to get the $hOME directories locally on utrs-production.utrs.eqiad1.wikimedia.cloud before deleting the NFS server instance. [21:46:07] not too hard though I would hope... [21:46:26] hmm. yeah I don't even know where to start with changing $HOMEs [21:47:49] * bd808 pokes around a bit [21:48:56] It looks like the NFS server data is on a CEPH volume so it might be really easy actually. [21:52:34] AmandaNP: The data is on the volume "utrs-nfs". You could attach that volume to the utrs-production instance directly, mount it at /srv, and then make /home a symlink to /srv/utrs/home. The only other bits needed would be the nfs mount logic cleanup. [21:53:09] if that all sounds like a foreign language, you could reopen the T301295 and ask for help killing off NFS :) [21:53:10] T301295: Does the cloud-vps UTRS project need NFS? - https://phabricator.wikimedia.org/T301295 [21:53:54] It sounds logical, but something I don't want to screw up guessing. I'll reopen and quote you.