Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-cloud

Filter:
Start date
End date

Displaying 107 items:

2022-04-18 10:21:27 <wm-bb> <Pau> Hi, good morning!
2022-04-18 10:22:05 <wm-bb> <Pau> I received an e-mail telling me my bot Is Still Running On Strech Grid Engine
2022-04-18 10:22:44 <wm-bb> <Pau> But I don't know how to migrate it to another place
2022-04-18 10:23:22 <wm-bb> <Pau> it seems it better be Kubernetes
2022-04-18 10:23:44 <wm-bb> <Pau> Do you have some instructions on how to do it for newbies?
2022-04-18 10:24:41 <wm-bb> <Pau> Thanks in advance
2022-04-18 10:27:49 <valhallasw> @Pau, see https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/EPJFISC52T7OOEFH5YYMZNL57O4VGSPR/ for the announcement and some context
2022-04-18 10:28:00 <valhallasw> https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation has pointers on how to move tools over
2022-04-18 10:28:32 <valhallasw> https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework documents the new Kuberneted-based jobs framework. It still has some sharp edges but I've moved three tools over the weekend without too much hassle
2022-04-18 10:39:14 <wm-bb> <Pau> ok, thanks
2022-04-18 10:39:23 <wm-bb> <Pau> valhallasw
2022-04-18 10:39:29 <wm-bb> <Pau> I'll try
2022-04-18 10:39:46 <wm-bb> <Pau> If I get stuck, I'll ask again
2022-04-18 12:30:39 <Rook> !log paws update pywikibot 6db74b6c866021686a49aa8a2c4eb2d5da3bdddb
2022-04-18 12:30:41 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
2022-04-18 13:40:05 <andrewbogott> !log admin shutting down many codfdfw1dev servers (including network infra!) for T305469
2022-04-18 13:40:08 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
2022-04-18 13:40:08 <stashbot> T305469: codfw: Dedicate Rack B1 for cloudX-dev servers - https://phabricator.wikimedia.org/T305469
2022-04-18 15:23:36 <andrewbogott> !log admin reimaging cloudvirt1020, leaving VMs in place
2022-04-18 15:23:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
2022-04-18 15:27:56 <DatGuy> bd808: not sure if I should open another phabricator ticket for this, but would it be possible to install libexiv2-dev on python39 images, similar to how it was done in https://phabricator.wikimedia.org/T213965? Trying to install py3exiv2 on a k8s job.
2022-04-18 15:31:05 <bd808> DatGuy: it would take a new phabricator task, but also I really don't think we will approve this request. We have been very consistent in not installing rarely used libraries and utilities in the containers. It is just not scalable at this time.
2022-04-18 15:31:50 <bd808> There is work in progress towards a build system that will support custom containers. Until that arrives my best advice is to keep using grid engine for this task.
2022-04-18 15:31:55 <taavi> DatGuy: which pip package are you trying to install? looks like https://pypi.org/project/pyexiv2 ships with manylinux wheels meaning that if you have the 'wheel' pip package installed in your venv you can use a pre-compiled version
2022-04-18 15:53:23 <DatGuy> taavi: that's for python2. Trying to install py3exiv2 for python 3 support
2022-04-18 15:57:53 <taavi> if I'm reading the pip page correctly, that supports python 3 and is fully separate from a different project with the exact same name that only supports py2..
2022-04-18 15:58:11 <taavi> at least I can install it inside a python3 container without any issues
2022-04-18 15:58:44 <DatGuy> Not for me. When I try and install py3exiv2 inside a tf-python39 image it throws an error since it doesn't have libexiv2-dev
2022-04-18 15:58:58 <DatGuy> Does work if I just install it in a venv though because of the phab ticket
2022-04-18 16:00:10 <taavi> no, I mean `pyexiv2` (as in the package on pip that supports python 3) installs for me inside a container, `py3exiv2` does not since
2022-04-18 16:06:47 <DatGuy> ah, well it installs but doesn't import. "OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found"
2022-04-18 16:50:50 <wm-bot> !log tools.tool-db-usage <bd808> Updated to 3d178a5 (D1201)
2022-04-18 16:50:55 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.tool-db-usage/SAL
2022-04-18 17:08:30 <AntiComposite> yeah, there are two py3 exiv bindings and they both kinda suck
2022-04-18 17:29:06 <Rook> !log paws updating links to phab with prefilled ticket links aef7c671a69a66a9872a48a24169f1f6bf2ffc4f
2022-04-18 17:29:08 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
2022-04-18 17:30:10 <Rook> !log quarry exposing query history https://phabricator.wikimedia.org/T100982
2022-04-18 17:30:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL
2022-04-18 18:34:09 <Rook> !log quarry update phab links to prefilled ticket https://phabricator.wikimedia.org/T303028
2022-04-18 18:34:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Quarry/SAL
2022-04-18 19:07:16 <mutante> !log devtools - gitlab-prod-1001 randomly stopped working. we got the "puppet failed" mails without having made changes and can't ssh to the instance anymore when trying to check out why. trying soft reboot via Horizon T297411
2022-04-18 19:07:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL
2022-04-18 19:07:19 <stashbot> T297411: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411
2022-04-18 19:08:19 <mutante> !log devtools - gitlab-prod-1001 is indeed back after soft rebooting the instance. uptime 1 min T297411
2022-04-18 19:08:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL
2022-04-18 19:56:26 <wm-bot> !log tools.heritage <lokal-profil> Adding the -release buster flag to cronjobs as part of Stretch deprecation
2022-04-18 19:56:29 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL
2022-04-18 20:16:50 <wm-bot> !log tools.slumpartikel <lokal-profil> Restarting webservice with the '--release buster' flag. php version compatibility needs to be investigated before migrating to kubernetes
2022-04-18 20:16:52 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.slumpartikel/SAL
2022-04-18 20:30:37 <musikanimal> hello! does the new jobs framework support something like the cron's `-l` flag for scheduled jobs? specifically, I want to use `-l h_rt=` to set a job to run for a maximum amount of time
2022-04-18 20:33:16 <bd808> musikanimal: Kubernetes has a thing for that (.spec.activeDeadlineSeconds), but I don't think that a.rturo has exposed that in the current wrapper api. I bet he didn't even think of the potential need for it, so a feature request would probably be the right next step.
2022-04-18 20:33:57 <musikanimal> so this would be an upstream feature request?
2022-04-18 20:34:38 <bd808> no, a feature request for the new framework (which is a local project)
2022-04-18 20:34:49 <musikanimal> okay got it. I'll create a Phab ticket :)
2022-04-18 20:35:40 <musikanimal> also, is there something akin to `-once`?
2022-04-18 20:36:19 <AntiComposite> you're the second person to ask about a max runtime limit
2022-04-18 20:40:35 <bd808> I think `.spec.parallelism = 1` is basically the same as `-once`, but again I don't think that a.rturo thought to expose that to users yet.
2022-04-18 20:41:06 <AntiComposite> it defaults to 1 anyway
2022-04-18 20:41:24 <bd808> https://kubernetes.io/docs/concepts/workloads/controllers/job/ has a pretty good overview of all things a job can do from the Kubernetes POV
2022-04-18 20:41:27 <AntiComposite> and for CronJobs, "concurrencyPolicy": "Forbid" is set
2022-04-18 20:41:43 <musikanimal> I was wondering if perhaps that was default behaviour
2022-04-18 20:41:56 <AntiComposite> which will be similar to what running jsub -once from cron will do
2022-04-18 20:43:43 <AntiComposite> it probably would be useful to have an option for concurrencyPolicy: Replace, (which kills the existing job and starts a new one instead of continuing the existing job), but that's a bit more advanced
2022-04-18 20:44:14 <bd808> is actually using Replace in a job in the Wikimedia production Kubernetes cluster
2022-04-18 20:47:41 <musikanimal> not sure if the tags are right, but here we are: https://phabricator.wikimedia.org/T306391
2022-04-18 20:48:09 <musikanimal> thanks everyone for your help! I'm going to test out the jobs framework on my other bots that don't need a max runtime
2022-04-18 21:31:49 <AmandaNP> btw how do I know if I need NFS?
2022-04-18 21:37:21 <bd808> AmandaNP: the main reason to need NFS in a Cloud VPS project is for file systems that need to be shared across multiple instances. An example of this is the $HOME for each tool in Toolforge which needs to be available on the bastions, the grid engine exec instances, and the kubernetes exec instances.
2022-04-18 21:38:02 <bd808> Most Cloud VPS projects do not need NFS
2022-04-18 21:38:24 <AmandaNP> Ahh ok. So I don't really need it. It's been converted to a VM, so can I just delete the instance or do I need to do more to not break things
2022-04-18 21:41:11 <bd808> AmandaNP: It sounds like you are saying that a.ndrewbogott moved NFS for a project from the old shared NFS server platform to a new instance inside your project. Before deleting the NFS server instance you would likely need to change some config in the other project instances to remove their config to mount from the NFS server.
2022-04-18 21:42:41 <AntiComposite> https://phabricator.wikimedia.org/T301295
2022-04-18 21:43:17 <AmandaNP> yeah I missed that essentially ^
2022-04-18 21:43:41 <bd808> *nod* And if T208414 is right that was just being used for $HOME?
2022-04-18 21:43:42 <stashbot> T208414: Check whether utrs project requires NFS or not - https://phabricator.wikimedia.org/T208414
2022-04-18 21:44:28 <AmandaNP> afaik yes
2022-04-18 21:45:47 <bd808> That probably takes a little work to unwind. You would probably want to get the $hOME directories locally on utrs-production.utrs.eqiad1.wikimedia.cloud before deleting the NFS server instance.
2022-04-18 21:46:07 <bd808> not too hard though I would hope...
2022-04-18 21:46:26 <AmandaNP> hmm. yeah I don't even know where to start with changing $HOMEs
2022-04-18 21:47:49 <bd808> pokes around a bit
2022-04-18 21:48:56 <bd808> It looks like the NFS server data is on a CEPH volume so it might be really easy actually.
2022-04-18 21:52:34 <bd808> AmandaNP: The data is on the volume "utrs-nfs". You could attach that volume to the utrs-production instance directly, mount it at /srv, and then make /home a symlink to /srv/utrs/home. The only other bits needed would be the nfs mount logic cleanup.
2022-04-18 21:53:09 <bd808> if that all sounds like a foreign language, you could reopen the T301295 and ask for help killing off NFS :)
2022-04-18 21:53:10 <stashbot> T301295: Does the cloud-vps UTRS project need NFS? - https://phabricator.wikimedia.org/T301295
2022-04-18 21:53:54 <AmandaNP> It sounds logical, but something I don't want to screw up guessing. I'll reopen and quote you.
2022-04-19 15:04:47 <ragesoss> hold onto your butts
2022-04-19 15:05:13 <ragesoss> good luck andrewbogott
2022-04-19 15:05:59 <andrewbogott> Going to be a longer outage than I planned because we're setting things to r/o and letting replication settle down first, it was running behind.
2022-04-19 15:12:27 <Lucas_WMDE> 🤞
2022-04-19 15:29:10 <andrewbogott> !log stopping all VMs on cloudvirt1019, reimaging host
2022-04-19 15:29:11 <stashbot> andrewbogott: Unknown project "stopping"
2022-04-19 15:29:19 <andrewbogott> !log admin stopping all VMs on cloudvirt1019, reimaging host
2022-04-19 15:29:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
2022-04-19 16:19:43 <andrewbogott> OK, I believe toolsdb to be back to normal-ish. Would love it if someone else here (e.g. Lucas_WMDE ) can confirm
2022-04-19 16:20:35 <Lucas_WMDE> https://quickcategories.toolforge.org/ looks fine again
2022-04-19 16:20:39 <Lucas_WMDE> though that’s only testing read access
2022-04-19 16:20:53 <andrewbogott> Can you check r/w? It was read only for a while
2022-04-19 16:21:01 <Lucas_WMDE> https://dicare.toolforge.org/lexemes/challenge.php also looks fine again, and that tool apparently writes on GET requests
2022-04-19 16:21:10 <andrewbogott> nice
2022-04-19 16:21:13 <andrewbogott> thank you!
2022-04-19 16:24:55 <Lucas_WMDE> and I just saw a write from QuickCategories too, so it’s definitely working :)
2022-04-19 16:25:05 <Lucas_WMDE> (dicare isn’t mine, it was just mentioned elsewhere)
2022-04-19 16:25:14 <Lucas_WMDE> (as being already broken during the r/o period)
2022-04-19 16:29:30 <wm-bot> !log tools.quickcategories <lucaswerkmeister> removed EXPECTED_DATABASE_ERROR from config again now that ToolsDB outage is over
2022-04-19 16:29:32 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL
2022-04-19 18:06:32 <MacFan4000> !log wm-bot updated a cloak in the admins config
2022-04-19 18:06:34 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wm-bot/SAL

This page is generated from SQL logs, you can also download static txt files from here