[01:57:45] 06serviceops: Consider lifting AssembleUploadChunks and PublishStashedFile out of the low-traffic consumer - https://phabricator.wikimedia.org/T379035#10320938 (10Scott_French) 05In progress→03Resolved No issues encountered throughout the rest of the day today - i.e., job execution and backlog time on th... [03:29:45] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10321111 (10Reedy) [08:01:53] I did add containerd steps to https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Exec_into_a_pod_and_run_commands [08:38:10] 06serviceops, 10MediaWiki-extensions-PropertySuggester, 10MW-on-K8s, 10Wikidata, and 2 others: [PS] Update PropertySuggester update process for mwscript-k8s - https://phabricator.wikimedia.org/T376604#10321498 (10ArthurTaylor) a:05ArthurTaylor→03None [10:37:34] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901 (10Joe) 03NEW [10:45:33] 06serviceops, 10Citoid, 10VisualEditor, 10VisualEditor-MediaWiki-References, and 2 others: Register Citoid as a "friendly bot" (or alternatively verified bot) with Cloudflare - https://phabricator.wikimedia.org/T370118#10321896 (10akosiaris) >>! In T370118#10313191, @Mvolz wrote: > Any news? None whatsoe... [10:53:33] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10321904 (10Joe) I should clarify: I've done some research before opening the task, and I didn't find any tool that does something like this. If anyone is aware of a similar... [10:53:46] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10321905 (10hnowlan) [10:53:47] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Thumbor workers hang indefinitely when conducting some tiff operations, leading to user-facing error - https://phabricator.wikimedia.org/T374350#10321906 (10hnowlan) [10:57:22] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10321909 (10Joe) [11:08:25] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10321923 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin2002 Reimaging k8s control planes of cluster wikikube-codfw: container... [12:15:51] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10322163 (10akosiaris) [12:18:59] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10322167 (10akosiaris) I 've done a number of minor improvements in wording and syntax, but otherwise this would work. We can also explore other alternatives, e.g. we can iss... [12:24:21] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10322177 (10JMeybohm) [12:31:44] 06serviceops, 06Data Products, 10Dumps-Generation, 10MW-on-K8s, and 2 others: Migrate current-generation dumps to run from our containerized images - https://phabricator.wikimedia.org/T352650#10322183 (10BTullis) >>! In T352650#10276308, @MatthewVernon wrote: > I suspect I've missed something, so forgive a... [12:31:52] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10322188 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium [12:33:43] 06serviceops, 13Patch-For-Review: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10322199 (10Clement_Goubert) [12:54:01] 06serviceops, 06Data Products, 06Data-Platform-SRE, 10Dumps-Generation, and 3 others: Migrate current-generation dumps to run from our containerized images - https://phabricator.wikimedia.org/T352650#10322255 (10BTullis) I'll add the #epic tag and move this back to the main #data-platform-sre workboard, so... [13:04:21] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10322298 (10JMeybohm) I would like to see some more details on how this compares to "proper" readiness/liveness probes and in which cases we would use this instead of the bui... [13:05:06] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10322300 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin2002 Reimaging k8s control planes of cluster wikikube-codfw: container... [13:59:29] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Update kubeconform schema and CI checks to new target Kubernetes version - https://phabricator.wikimedia.org/T379919 (10Jelto) 03NEW [14:00:15] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984#10322530 (10Jelto) [14:51:18] 06serviceops, 06Infrastructure-Foundations, 10netops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10322697 (10akosiaris) Cool, thanks. In that case, I randomly picked `wikikube-w... [14:52:14] 06serviceops, 06Infrastructure-Foundations, 10netops, 07Kubernetes: Reimage one of the wikikube-worker1240 to wikikube-worker1304 node in eqiad as a replacement for wikikube-ctrl1001 - https://phabricator.wikimedia.org/T379790#10322699 (10akosiaris) [15:00:02] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10322742 (10Joe) [15:18:44] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984#10322798 (10JMeybohm) [15:23:34] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Update Kubernetes clusters to >1.25 - https://phabricator.wikimedia.org/T341984#10322818 (10JMeybohm) @klausman could you please take a look at the kserve/knative-serving version numbers/requirements/dependencies? I always struggle to unde... [15:24:12] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322813 (10Jhancock.wm) I could do this today. or we can wait until next week. assuming no one wants to do a mainte... [15:28:49] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322881 (10ops-monitoring-bot) depool host wikikube-ctrl2002.codfw.wmnet by jayme@cumin2002 with reason: None [15:28:52] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322882 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin2002 depool for ho... [15:45:24] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322923 (10ops-monitoring-bot) pool host wikikube-ctrl2002.codfw.wmnet by jayme@cumin2002 with reason: None [15:45:28] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322926 (10ops-monitoring-bot) Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin2002 pool for host... [15:47:28] 06serviceops, 13Patch-For-Review: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10322928 (10Clement_Goubert) [15:49:21] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC - https://phabricator.wikimedia.org/T379719#10322929 (10JMeybohm) 05Open→03Resolved a:03JMeybohm @Jhancock.wm swapped the cable into port 1, I've chan... [16:18:18] 06serviceops, 13Patch-For-Review: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323052 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1305.eqiad.wmnet with OS bullseye [16:42:28] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10323224 (10RLazarus) [16:59:28] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323310 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1305.eqiad.wmnet with OS bullseye completed: - wikikube-worker1305 (**PASS**) - D... [17:11:04] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323376 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1306.eqiad.wmnet with OS bullseye [17:18:35] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323424 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1307.eqiad.wmnet with OS bullseye [17:19:06] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323427 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1308.eqiad.wmnet with OS bullseye [17:21:21] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323443 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1309.eqiad.wmnet with OS bullseye [17:25:08] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323455 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1310.eqiad.wmnet with OS bullseye [17:25:51] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323457 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1311.eqiad.wmnet with OS bullseye [17:26:28] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323459 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker1312.eqiad.wmnet with OS bullseye [17:27:34] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10323460 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2139.codfw.wmnet with O... [17:52:27] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323557 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1306.eqiad.wmnet with OS bullseye completed: - wikikube-worker1306 (**PASS**) - D... [17:59:25] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1307.eqiad.wmnet with OS bullseye completed: - wikikube-worker1307 (**PASS**) - D... [18:02:24] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1309.eqiad.wmnet with OS bullseye completed: - wikikube-worker1309 (**PASS**) - D... [18:05:59] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323573 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1308.eqiad.wmnet with OS bullseye completed: - wikikube-worker1308 (**PASS**) - D... [18:09:34] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323582 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1311.eqiad.wmnet with OS bullseye completed: - wikikube-worker1311 (**PASS**) - D... [18:16:22] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323624 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1310.eqiad.wmnet with OS bullseye completed: - wikikube-worker1310 (**PASS**) - D... [18:20:41] 06serviceops: wikikube-worker13[05-12] implementation tracking - https://phabricator.wikimedia.org/T377022#10323646 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker1312.eqiad.wmnet with OS bullseye completed: - wikikube-worker1312 (**PASS**) - D... [20:02:16] 06serviceops, 10Dumps 2.0 (Kanban Board): noc.wikimedia.org is slow and it times out sporadically - https://phabricator.wikimedia.org/T379968 (10xcollazo) 03NEW [20:05:57] 06serviceops, 10Dumps 2.0 (Kanban Board): noc.wikimedia.org is slow and it times out sporadically - https://phabricator.wikimedia.org/T379968#10324204 (10xcollazo) Example behavior, with 3 requests taking <= 1 sec, and the 4th one taking 30s: ` mediawiki-config % curl 'https://noc.wikimedia.org/db.php?dc=eqiad... [20:06:03] 06serviceops, 10Dumps 2.0 (Kanban Board): noc.wikimedia.org is slow and it times out sporadically - https://phabricator.wikimedia.org/T379968#10324205 (10xcollazo) a:05xcollazo→03None [20:47:49] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10324342 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2139.codfw.wmnet with OS bo... [20:51:31] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10324347 (10Jhancock.wm) [20:55:39] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker21[36-55] - https://phabricator.wikimedia.org/T377027#10324348 (10Jhancock.wm) 05Open→03Resolved @Clement_Goubert This one's complete. took me a minute to get that last one to behave. [21:19:21] 06serviceops, 10Wikimedia-Site-requests, 10WMF-General-or-Unknown, 13Patch-For-Review: Setup missing.php layer redirects for wikipedia hosting the other projects too - https://phabricator.wikimedia.org/T376923#10324451 (10Pppery) 05Open→03Resolved With the caveat above. [21:25:49] 06serviceops, 10Wikimedia-Site-requests, 10WMF-General-or-Unknown: Setup missing.php layer redirects for wikipedia hosting the other projects too - https://phabricator.wikimedia.org/T376923#10324475 (10Pppery) [22:36:09] 06serviceops, 10Dumps 2.0 (Kanban Board): noc.wikimedia.org is slow and it times out sporadically - https://phabricator.wikimedia.org/T379968#10324659 (10Scott_French) A couple of points of note: Comparing istio-reported error rates for mw-misc in eqiad (https://grafana.wikimedia.org/goto/a4cH6UGHR?orgId=1) v... [22:46:29] 06serviceops, 07Kubernetes: Create tool to monitor and automatically delete misbehaving pods - https://phabricator.wikimedia.org/T379901#10324674 (10Scott_French) [23:27:59] 06serviceops, 13Patch-For-Review: Extend x-wikimedia-debug-routing.lua to support PHP 8.1 mw-debug deployment - https://phabricator.wikimedia.org/T372605#10324767 (10Scott_French) [23:28:06] 06serviceops, 13Patch-For-Review: Turn up PHP 8.1-flavored mw-debug k8s deployment - https://phabricator.wikimedia.org/T372604#10324768 (10Scott_French) [23:59:22] 06serviceops, 13Patch-For-Review: Turn up PHP 8.1-flavored mw-debug k8s deployment - https://phabricator.wikimedia.org/T372604#10324917 (10Scott_French) [23:59:29] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10324918 (10Scott_French)