[01:51:11] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10308033 (10Scott_French) Thanks for flagging, all. Yes, this looks like another isolation failure on the low-traffic consumer, and appears to have largely self-r... [01:52:14] 06serviceops, 06Commons, 10MediaWiki-Uploading, 10UploadWizard: Repeated UploadWizard failures: "Server did not respond in time" - https://phabricator.wikimedia.org/T379462#10308039 (10Scott_French) [10:23:54] 06serviceops, 10Thumbor: Majority of thumbor containers on pods occasionally getting into a stuck state - https://phabricator.wikimedia.org/T374350#10308933 (10hnowlan) If we see a recurrence of this in future, please [[ https://wikitech.wikimedia.org/wiki/Kubernetes/Administration#Isolate_a_pod_from_traffic_a... [11:21:17] 06serviceops, 10Prod-Kubernetes, 07Kubernetes: Migrate wikikube-codfw to containerd - https://phabricator.wikimedia.org/T377877#10309120 (10JMeybohm) [13:06:54] 06serviceops, 06Commons, 10MediaWiki-Uploading, 10UploadWizard: Repeated UploadWizard failures: "Server did not respond in time" - https://phabricator.wikimedia.org/T379462#10309502 (10jijiki) p:05Triage→03High [13:07:56] 06serviceops, 10FlaggedRevs, 10WMF-JobQueue: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) - https://phabricator.wikimedia.org/T378385#10309503 (10jijiki) p:05Triage→03High [13:13:32] 06serviceops, 10Thumbor: Majority of thumbor containers on pods occasionally getting into a stuck state - https://phabricator.wikimedia.org/T374350#10309516 (10jijiki) [13:26:58] 06serviceops, 10Thumbor: Majority of thumbor containers on pods occasionally getting into a stuck state - https://phabricator.wikimedia.org/T374350#10309543 (10hnowlan) tldr: we have an issue with tiff conversion that is causing workers to block indefinitely, revealing a multitude of issues. Comparing the 24... [13:29:16] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Thumbor workers hang indefinitely when conducting some tiff operations, leading to user-facing error - https://phabricator.wikimedia.org/T374350#10309548 (10hnowlan) p:05Medium→03High [13:31:46] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309557 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1305.eqiad.wmnet with OS... [13:32:20] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309560 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1306.eqiad.wmnet with OS... [13:32:36] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309561 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1307.eqiad.wmnet with OS... [13:32:43] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309562 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1308.eqiad.wmnet with OS... [13:32:51] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309563 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1309.eqiad.wmnet with OS... [13:33:28] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1310.eqiad.wmnet with OS... [13:35:09] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309570 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1311.eqiad.wmnet with OS... [13:35:10] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309571 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1312.eqiad.wmnet with OS... [13:49:20] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309607 (10Jclark-ctr) [13:54:06] 06serviceops, 10MediaWiki-extensions-PropertySuggester, 10MW-on-K8s, 10Wikidata, and 2 others: [PS] Update PropertySuggester update process for mwscript-k8s - https://phabricator.wikimedia.org/T376604#10309621 (10Lucas_Werkmeister_WMDE) Thanks! IMHO 1 seems like the best option, but we’ll look into how the... [14:22:06] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309692 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1305.eqiad.wmnet with OS book... [14:26:17] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309707 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1307.eqiad.wmnet with OS book... [14:31:17] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309721 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1308.eqiad.wmnet with OS book... [14:32:02] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Thumbor workers hang indefinitely when conducting some tiff operations, leading to user-facing error - https://phabricator.wikimedia.org/T374350#10309733 (10hnowlan) p:05High→03Unbreak! [14:33:04] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309736 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1306.eqiad.wmnet with OS book... [14:33:12] 06serviceops, 10Thumbor: Alert on high per-pod error rate - https://phabricator.wikimedia.org/T379559 (10hnowlan) 03NEW [14:33:16] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309737 (10Jclark-ctr) [14:33:20] 06serviceops, 10Thumbor: Alert on high Thumbor per-pod error rate - https://phabricator.wikimedia.org/T379559#10309751 (10hnowlan) [14:33:36] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309752 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1312.eqiad.wmnet with OS book... [14:37:42] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309760 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1310.eqiad.wmnet with OS book... [14:45:13] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309808 (10Jclark-ctr) [14:45:43] 06serviceops, 10Thumbor: Thumbor haproxy readiness check isn't failing on unhealthy pods - https://phabricator.wikimedia.org/T379561 (10hnowlan) 03NEW [15:03:58] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309883 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1309.eqiad.wmnet with OS book... [15:04:14] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309884 (10Jclark-ctr) [15:04:56] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309887 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1311.eqiad.wmnet with OS book... [15:05:36] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309889 (10Jclark-ctr) [15:07:28] 06serviceops, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Regenerate UcfirstOverrides.php for PHP 7.4 -> 8.1 transition - https://phabricator.wikimedia.org/T372603#10309901 (10Krinkle) [15:07:29] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10309902 (10Krinkle) [15:09:28] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q2:rack/setup/install wikikube-worker13[05-12] - https://phabricator.wikimedia.org/T377021#10309892 (10Jclark-ctr) 05Open→03Resolved a:05Clement_Goubert→03Jclark-ctr [15:13:36] 06serviceops, 10Dumps-Generation, 06MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10309903 (10Krinkle) [16:02:54] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Reconsider use of `timeout` in Thumbor - https://phabricator.wikimedia.org/T379569 (10hnowlan) 03NEW [16:04:54] 06serviceops, 10Thumbor: Thumbor haproxy readiness check isn't failing on unhealthy pods - https://phabricator.wikimedia.org/T379561#10310105 (10hnowlan) p:05Triage→03Unbreak! a:03hnowlan [16:39:29] 06serviceops, 06Structured-Data-Backlog, 10Thumbor: Reconsider use of `timeout` in Thumbor - https://phabricator.wikimedia.org/T379569#10310178 (10hnowlan) p:05Triage→03High [17:07:18] 06serviceops: Establish a proper process for replacing kafka nodes - https://phabricator.wikimedia.org/T373189#10310216 (10jijiki)