[05:17:42] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10369931 (10tstarling) Can't do HTTP request. ` $ mwscript-k8s -f -- extensions/WikimediaMaintenance/addWiki.php --wiki=idwikivoyage --allow-existing ... Got no data from https:/... [08:42:20] headsup: Jelto is joining the work on reimaging all the k8s nodes to bookworm. So if anybody is planning on doing some, please sync with us here [08:50:08] 👋 I'll start with kubernetes1017 soon [08:51:37] cool, thanks for that! [09:31:23] akosiaris: do you recall what the idea was around leaving the names wikikube-worker1034 to wikikube-worker1239 free to be used for remaining renames? [09:32:06] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370264 (10kostajh) >>! In T341553#10369931, @tstarling wrote: > Can't do HTTP request. > > ` > $ mwscript-k8s -f -- extensions/WikimediaMaintenance/addWiki.php --wiki=idwikivoy... [09:32:28] the add_k8s_nodes script is not able to do the right thing here (as is just picks last node # + 1) - so it would require patching/manual editing of the created patches [09:33:06] I wonder if ther's a reaons you wanted the renamed hosts in the lower numbers (I don't see why that could be a requirement) [09:35:03] hello folks! If you are ok I'd proceed with the tcp proxy's health checks for tegola in prod: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1099649 [09:36:38] elukey: sgtm [09:37:40] oh no, where's jouncebot [09:38:39] jayme there is a comment IIRC, let me find it [09:38:55] akosiaris: I found https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/7eb5b05f4c6a69c6be3ab54c003aed1c296a8170 [09:39:27] but I don't get the reasoning. It's probably just cosmetics? [09:39:53] no, it was also to avoid stepping on each other's toes back then [09:40:04] those hosts were being added while doing reimages [09:45:48] 06serviceops, 06DC-Ops, 10ops-codfw: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381244 (10JMeybohm) 03NEW [09:48:05] 06serviceops, 06SRE, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10370345 (10JMeybohm) [09:48:37] akosiaris: ah, I see...so we can probably ignore it now? That what you're sayinh? [09:48:55] ignore it? [09:49:37] keep on renumber until you reach 1239? and then start again from 1305 ?that's what I would suggest [09:49:42] not use the 'reserved range' for the remaining reimages but just keep cointing the numbers up [09:50:14] define "reserved range", just in case our definitions differ? [09:50:46] my definition is 1034-1239 :) [09:51:22] from https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/kubernetes.yaml#259 [09:53:23] oh, definitely use that one [09:53:39] hmm...okay :) [09:53:44] jelto: ^^ [09:54:08] was my math so bad? [09:54:27] oh no, wait, there were some new hosts added, weren't they? [09:54:45] yeah, I think so [09:55:17] it's just that using that 'reserved range' makes life more complicated as add_k8s_nodes.py does not support it [10:04:06] jayme: we could try to patch the original_hostname_to_workers() function with another exception or add a new optional argument to set the new wikikube-worker id explicitly to override the auto-generated number? [10:04:51] the latter does not seem very intuitive when doing multiple servers at once [10:05:05] ah yes that's true [10:05:49] I haven't looked at the code, but I think it should be fine to modify it so that it fills the 'reserved range' by default now [10:06:02] given all refreshes and extensions are done already [10:06:37] I'll upload a patch [10:06:41] thanks [10:11:58] jayme: https://gitlab.wikimedia.org/repos/sre/serviceops-kitchensink/-/merge_requests/18 [10:12:17] ah, jelto [10:12:18] jelto: ^ [10:13:24] elukey: do you happen to know why we're setting WebServer.1#HostHeaderCheck=Enabled in the provision cookbook? [10:13:56] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370453 (10Urbanecm_WMF) I'd somehow expect https://www.mediawiki.org/wiki/Manual:$wgLocalHTTPProxy (set to http://localhost:6501) to kick in, which would avoid the need for a pr... [10:13:58] I think that makes iDRAC web-ui basicallu unusable via ssh tunnels [10:14:30] jayme: no idea [10:14:41] sweet, uploading a patch then :) [10:14:55] jayme: racadm set IDRAC.WebServer.HostHeaderCheck 0 if you need to revert that for a bit [10:14:58] but Riccardo may know, my knowledge of Dells is very limited :D [10:15:08] checking backlog [10:15:12] claime: yeah, I know...but it seems...counterintuitive [10:15:26] it should be working once the provision cookbook has run though [10:15:49] claime: how's that? The cookbook is what resets HostHeaderCheck to 1 [10:16:29] yeah that's something I had in my todo to check (HostHeaderCheck) because in theory it should work but got reports that doesn't work [10:17:05] but haven't had the time to check in which scenario it does or does not. In theory as we set the hostname properly if we connect via hostname it should work fine and be nice to have it on [10:18:57] claime, jayme: what do you need to do and what's blocking you [10:18:58] volans: but we almost never connect via the hostname - or at least I don't [10:19:14] Me nothing [10:19:28] because I simply create an ssh tunnel - so I connect via IP [10:20:09] and that just returns a HTTP 400 (without any context) [10:20:15] when HostHeaderCheck is 1 [10:21:26] right [10:21:34] so for the immediate thing just run: racadm set IDRAC.WebServer.HostHeaderCheck 0 [10:21:40] claime: thanks for linking the kitchensink MR. I tested this branch with kubernetes1017.eqiad.wmnet and the script generates wikikube-worker1005.eqiad.wmnet as the new hostname (which is a gap/not used, so it should be fine). So I optimistically approved the MR :) [10:21:57] or do you have to do it multiple times? [10:22:41] volans: I'm going to run provisioning for a couple of servers (T358489) effectively enabling HostHeaderCheck on all of them then [10:23:32] it's not blocking me currently because I know about it and I know how to disable it again...it's just that I saw it in the provision diff and it felt wrong to reconfigure the servers into a state that will require the next person to fix it again [10:23:40] elukey: I guess we have two options, either provide sre laptop deb package with a helper script that does the ssh tunnel in a way that allows the hostname (like setting a temporary line in /etc/hosts) or we have to just decide to set it off fleetwide [10:23:44] thoughts? [10:24:33] jelto: thanks, merged to main [10:26:00] jelto [12]0[01][56] used to be nodes dedicated to sessionstore, but we removed that snowflake in https://phabricator.wikimedia.org/T379599 [10:26:09] yep [10:26:13] so I think it's fine to backfill those numbers as well [10:26:22] I removed the exception in the script [10:26:27] cool [10:26:31] we should backfill them imo [10:26:58] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Rename wikikube worker nodes during OS reimage - https://phabricator.wikimedia.org/T365571#10370544 (10JMeybohm) [10:27:10] great I'll upload the renaming puppet patch shortly, then you can double check ❄️ [10:29:25] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370547 (10Joe) >>! In T341553#10370264, @kostajh wrote: >>>! In T341553#10369931, @tstarling wrote: >> Can't do HTTP request. >> >> ` >> $ mwscript-k8s -f -- extensions/Wikimed... [10:29:45] volans: I don't see a lot of value in having it enabled by default, and using the helper script may be forgotten etc.. I'd vote to disable it fleetwide, does it only require a racadm set and not reboot? If so we can probably run this through Moritz/I-F just to have everybody on the same page and then apply it to all Dells [10:29:59] *no reboot [10:30:11] it does not require a reboot [10:30:25] if there is a strong reason to keep it (security wise), then the helper is a good option [10:32:21] IIRC jo.hn wanted to add it, but I don't remember the details, I remember we chatted about it [10:33:38] I'm ok either way, the helper might be good anyway (even without the need to touch /etc/hosts) just to setup the tunnel (maybe with a increasing local IP or port to run multiple) and open a browser tab with it :) [10:34:06] https://github.com/jayme-github/pyflop ;p [10:35:58] Hi, I guess this is more releng but scap deploy to testservers is failing because the legalteam.wikimedia.org returns 503. I'm sure my patch can't cause this. The error is this: [10:36:01] > Uncaught MediaWiki\Config\ConfigException: Translate: Message group subscriptions (TranslateEnableMessageGroupSubscription) are enabled but Echo extension is not installed in /srv/mediawiki/php-1.44.0-wmf.5/extensions/Translate/src/HookHandler.php:438 [10:36:27] (the error is consistent and reliable, check mw-debug) [10:37:23] haha, it's causing 500 without my patch too. [10:37:47] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10370588 (10Jelto) [10:38:37] Amir1: it's broken in prod as well, wth [10:39:12] I'm moving forward with my sync as this is not related [10:39:31] I can create a ticket [10:39:41] yes, please [10:41:15] Amir1: https://phabricator.wikimedia.org/T381250 [10:41:46] T381252 [10:41:49] 06serviceops, 06Release-Engineering-Team: legalteam wiki reliably returns 500s - https://phabricator.wikimedia.org/T381252 (10Ladsgroup) 03NEW [10:42:39] I go ping LPL [10:42:44] 06serviceops, 06Release-Engineering-Team: legalteam wiki reliably returns 500s - https://phabricator.wikimedia.org/T381252#10370615 (10Ladsgroup) →14Duplicate dup:03T381250 [10:42:53] 06serviceops, 06Release-Engineering-Team: legalteam wiki reliably returns 500s - https://phabricator.wikimedia.org/T381252#10370618 (10Clement_Goubert) →14Duplicate dup:03T381250 [10:45:04] pinged LPL [10:45:22] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10370624 (10ops-monitoring-bot) depool host kubernetes1017.eqiad.wmnet by jelto@cumin1002 with reason: Renaming nodes [10:45:23] Amir1: I'd be curious to know how it got merged when it breaks httpbb [10:45:34] scap should have at least warned [10:45:35] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370598 (10Joe) So @tstarling please use MWHttpRequest or MultiHttpClient for the time being, I've opened T381251 to fix the underlying issue. [10:45:53] claime: probably the warning was overriden [10:45:53] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10370633 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 depool for host kubernetes10... [10:46:01] (like I did) [10:46:17] Amir1: we might as well remove the smoke tests then [10:46:45] because if merging a feature for a wiki, and the tests start failing, means people just skip without caring, the tests are useless [10:47:18] I'd say let's talk to the person to make sure they don't ignore it again [10:52:15] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10370671 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin2002 from mw2436 to wikikube-worker2005 completed: - mw2436 (**PASS**... [10:55:58] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10370698 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jayme@cumin2002 from mw2437 to wikikube-worker2006 completed: - mw2437 (**PASS**... [10:56:24] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381244#10370706 (10JMeybohm) [11:02:19] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10370733 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin2002 for host wikikube-worker2006.codfw.wmnet with OS bookworm [11:02:56] 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10370735 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin2002 for host wikikube-worker2005.codfw.wmnet with OS bookworm [11:04:12] 06serviceops, 06Content-Transform-Team, 10WMDE-TechWish-Maintenance, 07Epic, 10Maps (Kartotherian): Allow Kartotherian to use a local HTTP proxy - https://phabricator.wikimedia.org/T381257 (10elukey) 03NEW [11:06:49] 06serviceops, 06SRE, 13Patch-For-Review: mw2420-mw2451 do have unnecessary raid controllers (configured) - https://phabricator.wikimedia.org/T358489#10370772 (10JMeybohm) [11:07:18] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10370773 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from kubernetes1017 to wikikube-work... [11:15:06] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370811 (10Tgr) The actual problem is MWHttpRequest [[https://gerrit.wikimedia.org/g/mediawiki/core/+/f9659bc86e80851eb7b575733dca13a76585523b/includes/http/MWHttpRequest.php#296... [11:22:49] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10370860 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker1005.eq... [11:26:56] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370875 (10Joe) >>! In T341553#10370811, @Tgr wrote: > The actual problem is MWHttpRequest [[https://gerrit.wikimedia.org/g/mediawiki/core/+/f9659bc86e80851eb7b575733dca13a765855... [11:47:33] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10370948 (10akosiaris) >>! In T341553#10370264, @kostajh wrote: > I guess you'd need to use https://wikitech.wikimedia.org/wiki/Url-downloader as a proxy config with HttpRequestFa... [11:58:37] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10371009 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin2002 for host wikikube-worker2006.codfw.wmnet with OS bookworm completed: - wikikube-worke... [12:02:07] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371021 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker1005.eqiad.wmnet with OS bookworm... [12:06:31] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10371030 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin2002 for host wikikube-worker2005.codfw.wmnet with OS bookworm completed: - wikikube-worke... [12:14:06] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, and 2 others: Decommission mc-gp200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T381174#10371068 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1002 for hosts: `mc-gp2001.codfw.wmnet` - mc-gp2001.codfw.wm... [12:18:26] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10371080 (10ops-monitoring-bot) pool host wikikube-worker[2005-2006].codfw.wmnet by jayme@cumin2002 with reason: None [12:18:27] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10371081 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin2002 pool for host wikikube-worker[2005-2006].codfw.wmnet completed: - wikikube-wor... [12:30:31] 06serviceops, 06SRE, 10Wikimedia-Site-requests, 13Patch-For-Review: Change $wgMaxArticleSize limit from byte-based to character-based - https://phabricator.wikimedia.org/T275319#10371119 (10POMI-OLIYN) We have been waiting for this update for a really long time. It doesn't concern only existing pages (like... [12:31:56] 06serviceops, 10Deployments, 06Release-Engineering-Team, 07Wikimedia-production-error: httpb fails upon deployment of 1.44.0-wmf.5 - https://phabricator.wikimedia.org/T380958#10371128 (10Clement_Goubert) >>! In T381250#10370809, @Ladsgroup wrote: > Now erroring with: > ` > 11:13:16 Check 'check_testservers... [13:19:02] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371237 (10ops-monitoring-bot) pool host wikikube-worker1005.eqiad.wmnet by jelto@cumin1002 with reason: None [13:19:03] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371238 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 pool for host wikikube-worker1005.eqiad.wmnet comp... [13:22:46] 06serviceops, 06collaboration-services, 06DC-Ops, 10ops-eqiad, and 2 others: Relabel eqiad kubernetes nodes - https://phabricator.wikimedia.org/T381268 (10Jelto) 03NEW [13:24:27] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, and 2 others: Decommission mc-gp200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T381174#10371258 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1002 for hosts: `mc-gp[2002-2003].codfw.wmnet` - mc-gp2002.c... [13:26:28] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553#10371260 (10Tgr) > As for why the local proxy is disabled from cli, I find it surprising to be honest. It was [[http://mediawiki.org/wiki/Special:Code/MediaWiki/11617|added in 20... [13:30:08] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371268 (10ops-monitoring-bot) depool host kubernetes1018.eqiad.wmnet by jelto@cumin1002 with reason: Renaming nodes [13:30:42] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371274 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jelto@cumin1002 depool for host kubernetes1018.eqiad.wmnet complet... [13:37:56] 06serviceops: kafka-main100[6789] and kafka-main1010 implementation tracking - https://phabricator.wikimedia.org/T363214#10371294 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1b8e1077-6d61-4aa8-9dd3-51831260ac7d) set by jiji@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with... [13:51:59] 06serviceops: kafka-main100[6789] and kafka-main1010 implementation tracking - https://phabricator.wikimedia.org/T363214#10371347 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e69f875e-bd0c-45ff-b223-fbcb65b96846) set by jiji@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with... [14:30:20] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, and 2 others: Decommission mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10371514 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jiji@cumin1002 for hosts: `mc-gp[1001-1003].eqiad.wmnet` - mc-gp1001.e... [15:01:50] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371658 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.rename started by jelto@cumin1002 from kubernetes1018 to wikikube-worker1006 completed: - ku... [15:05:36] 06serviceops, 06MW-Interfaces-Team, 10RESTBase Sunsetting, 10MW-1.44-notes (1.44.0-wmf.4; 2024-11-19), and 2 others: Switchover plan from RESTbase to REST Gateway for rest_v1/page/html and rest_v1/page/title endpoints - https://phabricator.wikimedia.org/T374683#10371676 (10HCoplin-WMF) Hey there! Please co... [15:08:40] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10371689 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker1006.eqiad.wmnet with OS book... [16:02:37] 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 06MW-Interfaces-Team, and 3 others: Testing and verification of MediaWiki on PHP 8.1 in mwdebug-next - https://phabricator.wikimedia.org/T379986#10372023 (10JTweed-WMF) [16:26:05] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10372161 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker1006.eqiad.wmnet with OS bookworm... [16:43:42] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, 06SRE: Decommission mc-gp200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T381174#10372242 (10jijiki) [16:44:52] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-codfw, 06SRE: Decommission mc-gp200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T381174#10372252 (10Reedy) [16:49:31] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10372266 (10Jelto) Reimage of `wikikube-worker1006` failed because the node is not in icinga/alertmanager. I'll try to find out why tomorrow See: https:... [17:02:05] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: Decommission mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10372328 (10jijiki) [17:04:10] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: Decommission mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10372332 (10jijiki) >>! In T381173#10371514, @ops-monitoring-bot wrote: > cookbooks.sre.hosts.decommission executed by jiji@cumin1002 for hosts: `mc-gp[... [17:08:49] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T381244#10372349 (10Jhancock.wm) 05Open→03Resolved [17:13:40] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10372391 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jelto@cumin1002 for host wikikube-worker1006.eqiad.wmnet with OS book... [17:23:18] 06serviceops, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 06SRE: Decommission mc-gp100[1-3].eqiad.wmnet - https://phabricator.wikimedia.org/T381173#10372451 (10jijiki) a:03VRiley-WMF [17:28:56] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Improve calico-typha firewall rules - https://phabricator.wikimedia.org/T365687#10372514 (10JMeybohm) [17:28:58] 06serviceops, 06collaboration-services, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984#10372515 (10JMeybohm) [17:52:30] 06serviceops, 06collaboration-services, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-eqiad to containerd - https://phabricator.wikimedia.org/T377876#10372616 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jelto@cumin1002 for host wikikube-worker1006.eqiad.wmnet with OS bookworm... [18:13:26] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Thumbor workers hang indefinitely when conducting some tiff operations, leading to user-facing error - https://phabricator.wikimedia.org/T374350#10372711 (10Aklapper) @hnowlan: Should this remain at "Unbreak now" priority (["Something is broken and needs to... [18:14:06] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Thumbor workers hang indefinitely when conducting some tiff operations, leading to user-facing error - https://phabricator.wikimedia.org/T374350#10372724 (10hnowlan) p:05Unbreak!→03High [21:21:47] 06serviceops, 06SRE, 10Wikimedia-Site-requests, 13Patch-For-Review: Change $wgMaxArticleSize limit from byte-based to character-based - https://phabricator.wikimedia.org/T275319#10373647 (10Aklapper) @POMI-OLIYN: See the "Details" box above linking to the last comments in the proposed patches