[09:35:36] 06serviceops, 06Content-Transform-Team-WIP, 10CX-cxserver, 10RESTBase Sunsetting, and 2 others: Decommission cxserver endpoints from RESTBase - https://phabricator.wikimedia.org/T372753#10339207 (10akosiaris) [09:38:14] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10339210 (10Joe) To put to bed the ideas of implementing this as a plugin for alertmanager, I don't think we should limit this to only work with prometheus. We might want to... [09:46:53] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10339238 (10JMeybohm) >>! In T379901#10339210, @Joe wrote: > To put to bed the ideas of implementing this as a plugin for alertmanager, I don't think we should limit this to... [09:50:43] 06serviceops, 10MediaWiki-extensions-PropertySuggester, 10MW-on-K8s, 10Wikidata, and 3 others: [PS] Update PropertySuggester update process for mwscript-k8s - https://phabricator.wikimedia.org/T376604#10339243 (10karapayneWMDE) [10:33:31] 06serviceops: kafka-main100[6789] and kafka-main1010 implementation tracking - https://phabricator.wikimedia.org/T363214#10339342 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1cb13fea-2d29-4686-a824-95c9972431a0) set by jiji@cumin1002 for 1 day, 0:00:00 on 2 host(s) and their services with... [10:34:22] 06serviceops, 10API Platform, 10MediaWiki-extensions-ReadingLists, 06MW-Interfaces-Team, and 2 others: Reading List REST Interface: reroute calls - https://phabricator.wikimedia.org/T348493#10339361 (10MSantos) [10:41:08] with mwscript-k8s I need to inspect the exit code of the script, found that it's in the pod resources at status.containersStatus[@name=$CONTAINER_NAME].state.terminated.exitCode, I can get this info doing 'kubectl get pod -l job-name=$MW_CONTAINER', is this a correct way or are there simpler ways to get that info based on the ids returned by mwscript-k8s -o json? [10:41:41] s/-l job-name=$MW_CONTAINER/-l job-name=$J [10:41:46] meh... [10:41:56] sorry: s/-l job-name=$MW_CONTAINER/-l job-name=$JOB_NAME/ [10:53:10] dcausse: well kubectl get pod mwscript.mediawikicontainer [11:00:54] 06serviceops: wikikube-worker13[13-28] implementation tracking - https://phabricator.wikimedia.org/T380350 (10Clement_Goubert) 03NEW [11:01:48] 06serviceops: wikikube-worker13[13-28] implementation tracking - https://phabricator.wikimedia.org/T380350#10339457 (10Clement_Goubert) p:05Triage→03Medium [11:02:41] claime: not sure I understand, for instance I just ran a failed script and got "job": "mw-script.codfw.kftdqx7r", "mediawiki_container": "mediawiki-kftdqx7r-app" [11:03:08] yeah, then you can either use the label selector like you were doing [11:03:36] ooooh yeah got it [11:03:37] claime: sounds good, thanks! [11:03:42] container not pod [11:03:50] * claime isn't totally awake [11:03:53] :) [11:29:57] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Reimaging a kubernetes control-plane invalidates service-account tokens issued by it - https://phabricator.wikimedia.org/T380142#10339541 (10JMeybohm) 05Open→03Resolved a:03JMeybohm The change has been rolled out to all control-planes. I'll let the old... [11:36:49] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2143.codfw.wmnet with OS bookworm [11:37:19] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2144.codfw.wmnet with OS bookworm [11:38:22] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339568 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2145.codfw.wmnet with OS bookworm [11:38:52] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339569 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2146.codfw.wmnet with OS bookworm [11:39:22] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2147.codfw.wmnet with OS bookworm [11:39:54] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339573 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2148.codfw.wmnet with OS bookworm [11:40:24] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339576 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2149.codfw.wmnet with OS bookworm [12:20:34] 06serviceops, 10RESTBase, 10RESTBase Sunsetting, 06Traffic: Block traffic to RESTBase /page/related endpoint before it's deprecated - https://phabricator.wikimedia.org/T376297#10339727 (10MSantos) p:05Triage→03Medium [12:21:25] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339742 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2144.codfw.wmnet with OS bookworm completed: - wikikube-worker2144 (**PASS**) - D... [12:24:06] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339776 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2149.codfw.wmnet with OS bookworm completed: - wikikube-worker2149 (**PASS**) - D... [12:27:05] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339790 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2148.codfw.wmnet with OS bookworm completed: - wikikube-worker2148 (**PASS**) - D... [12:31:13] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339805 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2147.codfw.wmnet with OS bookworm completed: - wikikube-worker2147 (**PASS**) - D... [12:34:44] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339825 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2145.codfw.wmnet with OS bookworm completed: - wikikube-worker2145 (**PASS**) - D... [12:39:25] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339853 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2146.codfw.wmnet with OS bookworm completed: - wikikube-worker2146 (**WARN**) - D... [12:40:31] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339858 (10Clement_Goubert) [12:41:09] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339859 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2150.codfw.wmnet with OS bookworm [12:41:39] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339861 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2143.codfw.wmnet with OS bookworm completed: - wikikube-worker2143 (**PASS**) - D... [12:41:40] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339862 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2151.codfw.wmnet with OS bookworm [12:42:17] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339864 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2152.codfw.wmnet with OS bookworm [12:42:54] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339866 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2153.codfw.wmnet with OS bookworm [12:43:37] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339868 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2154.codfw.wmnet with OS bookworm [12:44:10] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339872 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2155.codfw.wmnet with OS bookworm [12:44:19] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10339874 (10Clement_Goubert) 05Open→03In progress [13:23:29] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340032 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2150.codfw.wmnet with OS bookworm completed: - wikikube-worker2150 (**PASS**) - D... [13:26:17] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340042 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2153.codfw.wmnet with OS bookworm completed: - wikikube-worker2153 (**PASS**) - D... [13:33:41] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340061 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2155.codfw.wmnet with OS bookworm completed: - wikikube-worker2155 (**PASS**) - D... [13:33:50] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340082 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2154.codfw.wmnet with OS bookworm completed: - wikikube-worker2154 (**PASS**) - D... [13:38:51] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340100 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2152.codfw.wmnet with OS bookworm completed: - wikikube-worker2152 (**PASS**) - D... [13:41:25] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340118 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2151.codfw.wmnet with OS bookworm completed: - wikikube-worker2151 (**PASS**) - D... [13:42:07] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340124 (10Clement_Goubert) [13:57:11] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340197 (10ops-monitoring-bot) pool host wikikube-worker[2136-2139,2141-2155].codfw.wmnet by cgoubert@cumin1002 with reason: None [13:57:14] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340211 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by cgoubert@cumin1002 pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet completed: - wikikube-worker[2136-213... [13:57:26] 06serviceops: wikikube-worker21[36-55] implementation tracking - https://phabricator.wikimedia.org/T377028#10340217 (10Clement_Goubert) 05In progress→03Stalled All done and pooled except 2140 waiting on {T380265} [14:34:58] 06serviceops, 06MediaWiki-Engineering, 06MediaWiki-Platform-Team, 06Web-Team, 07OKR-Work: Testing and verification of MediaWiki on PHP 8.1 in mwdebug-next - https://phabricator.wikimedia.org/T379986#10340329 (10MSantos) [14:37:08] 06serviceops, 06Content-Transform-Team, 06MediaWiki-Engineering, 06MediaWiki-Platform-Team, and 3 others: Testing and verification of MediaWiki on PHP 8.1 in mwdebug-next - https://phabricator.wikimedia.org/T379986#10340331 (10MSantos) [15:28:42] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340624 (10Papaul) @Jhancock.wm @Clement_Goubert the interface on the switch side is up ` xe-0/0/26 up up wikikube-worker2140 [15:32:59] 06serviceops, 10MW-on-K8s: Functional replacement for importImages.php on Kubernetes - https://phabricator.wikimedia.org/T377497#10340632 (10RoyZuo) >>! In T377497#10252955, @Urbanecm_WMF wrote: > > I see @RoyZuo [requested](https://commons.wikimedia.org/w/index.php?oldid=855709324#Allowlist_request_-_toolfor... [15:39:19] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340666 (10Clement_Goubert) i just managed to mount the ip adresses on the other interface `eno12399np0` and the link is up. Looks like the wrong one go... [15:46:30] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340713 (10Papaul) @Clement_Goubert on your output below you was looking at the second interface (eno12409np1) ` root@wikikube-worker2140:~# ethtool en... [15:49:36] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340717 (10Clement_Goubert) Yes, `eno12409np1` was the one where the IPs were originally mounted when I encountered the issue. In order to troubleshoot,... [15:53:20] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340723 (10Papaul) 05Open→03Resolved glad all is working> I am resolving this task. Thank you [15:55:31] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340731 (10Clement_Goubert) 05Resolved→03Open @Papaul sorry for the misunderstanding, but it's not resolved. The interface that is supposed to have... [15:58:05] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: Link down for wikikube-worker2140.codfw.wmnet - https://phabricator.wikimedia.org/T380265#10340746 (10Papaul) @Clement_Goubert got you know i will fix it in netbox. Sorry i misunderstood you. [16:19:38] 06serviceops, 13Patch-For-Review: kafka-main100[6789] and kafka-main1010 implementation tracking - https://phabricator.wikimedia.org/T363214#10340791 (10jijiki) [16:50:25] 06serviceops, 10MW-on-K8s: mw-videoscaler helm chart fails to render in staging - https://phabricator.wikimedia.org/T380390 (10Jelto) 03NEW [16:58:28] 06serviceops, 10envoy, 06SRE, 06Traffic: Upgrade Envoy to >= 1.24 - https://phabricator.wikimedia.org/T380211#10340940 (10jijiki) p:05Triage→03Medium [19:31:06] 06serviceops, 10MW-on-K8s, 10Release-Engineering-Team (Priority Backlog 📥): Provide an mwdebug functionality on kubernetes - https://phabricator.wikimedia.org/T276994#10341657 (10Krinkle) I'm not sure where to put this, but perhaps here is a good place: When using WikimediaDebug, I almost always use mwdebug... [21:12:09] 06serviceops, 10MW-on-K8s, 10Release-Engineering-Team (Priority Backlog 📥): Provide an mwdebug functionality on kubernetes - https://phabricator.wikimedia.org/T276994#10342007 (10Scott_French) Thanks, @Krinkle - this is a good point. Agreed that, as it exists today, we don't have the ability to target a spe... [21:25:16] hi serviceops! assuming it's going otherwise unused, I'd like to commandeer tomorrow's 18:00Z mediawiki infra (utc late) window for https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1091328 [21:25:23] O [21:25:40] https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20241121T1800 this one [21:25:54] I'm going to be bold and edit on wikitech but if there are objections or concerns please poke me asap [22:08:55] cdanis: I don't know of any other work that would be happening then, I think swfrench-wmf is the only one who might be using it? but I think it's all yours [22:10:11] rzl: cdanis: nothing planned for the UTC-late infra window on my end :) [22:12:03] thanks!