[02:59:02] I want to nominate video2commons but ehhh it is currently down (re @bd808: The third edition of Coolest Tool Award is looking for nominations until October 27! What is your favourite Wikimedia-related tool & what makes it so awesome? https://meta.wikimedia.org/wiki/Coolest_Tool_Award) [03:04:27] I think @chicocvenancio is working on a fix. The python version v2c is written for is does not work with the new Let’s Encrypt TLS signing chain. [09:57:21] !log twl bump RAM quota from 16GB to 32GB (T292100) [09:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Twl/SAL [09:57:28] T292100: Request increased quota for twl Cloud VPS project - https://phabricator.wikimedia.org/T292100 [10:19:16] !log toolsbeta Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus [10:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:19:46] !log toolsbeta Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus [10:19:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:46:23] !log admin updating python3-neutron across the fleet (T292936) [10:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:08:15] Taths fixed, as far as I know. @jeromi_mikhael might be referring to the YouTube 429s (re @bd808: I think @chicocvenancio is working on a fix. The python version v2c is written for is does not work with the new Let’s Encrypt TLS signing chain.) [11:18:56] !log toolsbeta Added a new grid webgrid generic node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the pool (T292465) - cookbook ran by dcaro@vulcanus [11:18:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:17:27] So [12:17:35] Is the youtube problem fixed [12:24:23] no, not permanently, at least. It works from time to time (re @jeromi_mikhael: Is the youtube problem fixed) [12:25:25] So it means that v2c could upload ytvideos in certain times only (re @chicocvenancio: no, not permanently, at least. It works from time to time) [13:29:16] yeah, whenever google allows it. [13:29:56] Any certain time for that? Or is it random (re @chicocvenancio: yeah, whenever google allows it.) [13:30:33] It would be nice if v2c had some kind of notification sistem to notify users when the youtube uploading is up [17:49:37] !log tools.lexeme-forms deployed e5c87ff53c (remove type ignore comments) and updated dependencies, including Flask 2.0.2 [17:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [18:22:09] !log tools.quickcategories deployed 7a5d6823e2 (remove type ignore comments), updated dependencies including Flask 2.0.2 [18:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.quickcategories/SAL [19:40:02] hm, in two tools now, `webservice restart` has told me that “your job is not running, starting...” [19:40:14] but when it eventually finished, `kubectl get pods` says the pod is still old (27h) [19:40:37] in the other tool (quickcategories) I did a rollout restart of the deployment manually [19:41:03] for this one (ranker) I’d be happy to leave it as it is for now in case anyone wants to investigate [19:47:55] same thing happened in pagepile-visual-filter now [19:47:59] (`webservice restart` exited 0 btw) [19:53:49] majavah: ^ possibly related to the k8s version update? [20:15:16] !log toolhub Updated demo server to 94b5a03 [20:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [20:21:20] no, I think it's related to T266844 [20:21:20] T266844: Use common k8s labels in tools-webservice - https://phabricator.wikimedia.org/T266844 [20:21:50] compare the labels on https://k8s-status.toolforge.org/namespaces/tool-ranker/pods/ranker-76cb7dccd-dm9fb/ to the newly-started https://k8s-status.toolforge.org/namespaces/tool-anticompositetools/pods/anticompositetools-684dcc6865-zn4dg/ [20:23:00] the ranker pod still has the tools.wm.o labels, not the app.k8s.io labels. So when webservice looks for pods with those labels, it finds none [20:23:34] ah, interesting [20:30:58] *nod* the label change would at least make the "your job is not running" part make sense. [20:31:22] so presumably if I do a `webservice rollout restart` that won’t change anything about that, it’ll still have the old labels [20:31:38] if I really wanted my tool to work with webservice right now I should stop and then start again [20:31:47] (but I think I’m okay working around it for now) [20:32:45] webservice stop did work for me on anticompositetools, though I'm not sure why. it is supposed to use the same selector [20:32:47] lucaswerkmeister: you could get really fancy and manually add the expected labels to your pod [20:32:54] hehe ^^ [20:34:01] looking at signatures, the deployment and the replicaset have both the new and old labels [20:35:19] ah, and request_stop deletes the deployment first anyway, deleting the pod is a cleanup step expected to fail. [20:38:23] looks like just deleting the pod (which is all `webservice restart` does anyway) won't work, the new pod doesn't get the new labels. [20:39:26] hmmm... that sounds buggy. The deployment and replica set have the labels but make pods that do not? [20:40:19] https://k8s-status.toolforge.org/namespaces/tool-signatures/pods/signatures-674787f45c-nn689/ [20:41:42] AntiComposite: the deployment for that tool is 404 days old so ... yeah. [20:42:11] correction: the deployment does not have the labels, the replicaset does have both [20:42:38] but not in the pod template [20:43:21] correction correction: the deployment and replicaset both have both sets of labels. but not in the pod template. and 2-space indentation is not enough for my brain. [20:43:35] tl;dr `webservice stop && webservice start` [20:44:48] my guess is that someone ( majavah? ) ran a cleanup script to add the labels to the replica set itself, but missed updating the template. but yeah "have you tried turning it off and on again" applies almost any time k8s things get funky [20:48:47] probably worth a mailing list note unless someone's planning to patch the objects [20:49:35] no new pod gets created even when you run `webservice restart`, so this should only result in confusion and annoyance, not broken webservices [20:51:07] (because there's already a deployment with the right labels) [21:10:19] * bstorm is someone...and yes she missed the templates when scripting adding all the labels retroactively [21:10:40] Damn it. I'd caught all the existing pods, but unless you actually stop and then start, the change won't pick up [21:11:43] That's true AntiComposite, but I do hate creating more confusion. [21:13:06] I could possibly try adding the labels to all templates retroactively. [21:16:57] that would automatically restart the pods, right? because they wouldn’t match the spec anymore [21:17:07] (not saying you shouldn’t do it, just that this would be an effect iiuc) [21:18:44] it could, yes. It's also a bit of a pain compared to adding labels. legoktm and I missed that detail when we were looking at rolling this out (that the labels in the template would be goofed) [21:19:10] As I mess with the scripting for it, it's actually kind of messy in general by comparison [21:20:38] I'll test a bit before trying it. [21:20:56] would it make sense to change webservice to at least reccomend a stop/start when in STATE_PENDING ? [21:22:53] Perhaps. It might be good to include in any documentation that stop/start might fix things anyway. It's the full "turn it off and turn it on again" since it recreates more of the stack. I'm hoping this particular odd case will not be something to consider in a bit though. [21:29:56] Well, there's a good reason not to backfill the value: [21:29:57] LabelSelectorRequirement(nil)}: field is immutable [21:30:05] Sigh. [21:34:05] AntiComposite and bd808: Apparently, you cannot change the label matching for an existing deployment. To fix any confusion caused by the label change, a stop and start for any webservice that was running prior to the label change is going to be a strongly suggested thing. [21:34:51] bstorm: thanks for digging into the rabbit hole [21:34:52] I can send an email to that effect. it's real easy to add labels...it's not so easy to change label expectations of a running deployment [21:41:37] so you could add the new labels to the pod template, but it would only kick the can down the road as the deployment and replicaspec would only care about the old labels until stop/start [21:42:16] so everything would still have to be stop/started eventually to be able to not set the old labels [21:44:27] Yeah [21:44:50] No real point in doing a lot of work on that when people really should just stop/start [21:57:10] bstorm: okay, thanks [23:23:37] !log toolhub Updated Demo server to 563f977 [23:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [23:28:35] !log tools.ranker deployed 3fd9ab9f20 (remove type ignore comments), updated dependencies including Flask 2.0.2, fully restarted webservice (stop/start) to avoid label issues [23:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [23:29:47] !log tools.pagepile-visual-filter deployed 66e06c58c5 (remove type ignore comments), updated dependencies including Flask 2.0.2, fully restarted webservice (stop/start) to avoid label issues [23:29:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pagepile-visual-filter/SAL [23:31:25] !log tools.lexeme-forms fully restarted webservice (stop/start) to avoid label issues [23:31:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL