[08:20:59] 10serviceops, 10Data-Platform-SRE, 10SRE, 10Discovery-Search (Current work), 10Patch-For-Review: SUP: Partition update_pipeline kafka topic - https://phabricator.wikimedia.org/T354064 (10CodeReviewBot) pfischer merged https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_req... [08:40:14] 10serviceops, 10MW-on-K8s, 10SRE, 10WMF-JobQueue: Moving jobs to MW-on-k8s decreased their timeout from 1200s to 200s - https://phabricator.wikimedia.org/T354229 (10Joe) Yes, your understanding is correct; I had a patch fixing this that never got merged, I should just make a new version of that. [10:06:39] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507 (10Jgiannelos) How can we setup things to be able to use cassandra (on staging for now)? I can send a patch... [10:18:40] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507 (10Joe) I suggest we standardize on the configuration that we've used for the golang applications using cas... [10:20:05] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464 (10Clement_Goubert) >>! In T353464#9415585, @akosiaris wrote: > Do we track the IOPS bottlenecks we witnessed in some task? Track no, but what triggered creating {T348... [10:34:00] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate wikikube control planes to hardware nodes - https://phabricator.wikimedia.org/T353464 (10akosiaris) >>! In T353464#9431952, @Clement_Goubert wrote: >>>! In T353464#9415585, @akosiaris wrote: >> Do we track the IOPS bottlenecks we witnessed in some task?... [10:40:49] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Update mobileapps k8s deployment chart for Cassandra credentials - https://phabricator.wikimedia.org/T350507 (10Jgiannelos) Sounds good, i will adapt the config to something thats compatible with the snippet. [12:32:56] 10serviceops, 10MW-on-K8s, 10SRE, 10WMF-JobQueue: Moving jobs to MW-on-k8s decreased their timeout from 1200s to 200s - https://phabricator.wikimedia.org/T354229 (10Urbanecm_WMF) 05Open→03Resolved a:03Joe [13:26:05] 10serviceops, 10Data-Platform-SRE, 10SRE, 10Discovery-Search (Current work): SUP: Partition update_pipeline kafka topic - https://phabricator.wikimedia.org/T354064 (10brouberol) ` brouberol@kafka-test1010:~$ kafka topics --topic codfw.cirrussearch.update_pipeline.update.rc0 --alter --partitions 5 kafka-top... [13:27:04] 10serviceops, 10Data-Platform-SRE, 10SRE, 10Discovery-Search (Current work): SUP: Partition update_pipeline kafka topic - https://phabricator.wikimedia.org/T354064 (10brouberol) [15:44:01] 10serviceops, 10MW-on-K8s, 10Scap: Error: failed to download "wmf-stable/mediawiki" when deploying to MW-on-K8s - https://phabricator.wikimedia.org/T333382 (10Clement_Goubert) 05Open→03Resolved a:03Clement_Goubert It was probably a transient failure of chartmuseum, we should have investigated when it h... [15:55:47] 10serviceops, 10MW-on-K8s, 10MediaWiki-Platform-Team, 10Patch-For-Review: mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690 (10Clement_Goubert) [16:28:00] Hi, I'm working on a standalone legacy eventlogging proxy endpoint in mediawiki.org docroot. This will end up POSTing to eventgate-analytics-external. [16:28:01] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/985023/8/docroot/mediawiki.org/beacon/event/index.php#32 [16:28:01] Does this URL need to vary based on how MW is deployed? In k8s it would be better to use the local envoy mesh proxy for eventgate-analytics-external. It would be easier to just use the discovery url so we don't need extra configs for this. Will the discovery url be accessible from k8s? [16:31:23] _joe_: ^ maybe you know? [17:23:49] 10serviceops, 10SRE, 10ops-codfw: Broken CPU on mw2394 - https://phabricator.wikimedia.org/T354193 (10Papaul) Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. Sun 31 Dec 2023 19:43:14 Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. Sun 31 Dec 2023 19:... [18:07:32] <_joe_> ottomata: it definitely should go via envoy [18:07:51] <_joe_> and that's true of both metal and k8s [18:08:17] <_joe_> it won't work on k8s as it's written right now [18:08:28] <_joe_> and the additional config is actually already in place [18:09:04] <_joe_> 'eventgate-analytics-external' => 'http://localhost:6013', [18:09:14] <_joe_> in wmf-config/ProductionServices.php [18:12:41] <_joe_> added a comment on the patch [18:12:42] 10serviceops, 10SRE, 10ops-codfw: Broken CPU on mw2394 - https://phabricator.wikimedia.org/T354193 (10Papaul) After swapping the CPU and DIMM now i am getting ` CPU 2 MEM012 VPP PG voltage is outside of range. Wed 03 Jan 2024 17:43:07 CPU 1 MEM012 VPP PG voltage is outside of range. ` and the server is n... [18:21:03] 10serviceops, 10SRE, 10ops-codfw: Broken CPU on mw2394 - https://phabricator.wikimedia.org/T354193 (10Papaul) ` Create Dispatch: Success You have successfully submitted request SR182660280. ` [18:44:58] 10serviceops, 10SRE: Rebuild PHP 7.4 packages for Bullseye - https://phabricator.wikimedia.org/T350767 (10Dzahn) @MoritzMuehlenhoff When this is done, should I expect that there will be a `component/icu67` in distro `wikimedia-bullseye` just like there is now in distro `wikimedia-buster`? I am just wondering... [18:53:23] 10serviceops, 10SRE: Rebuild PHP 7.4 packages for Bullseye - https://phabricator.wikimedia.org/T350767 (10MoritzMuehlenhoff) Bullseye has ICU 67 as the default ICU version, as such on Bullseye there will only be component/php74 and nothing else. [18:56:34] 10serviceops, 10SRE: Rebuild PHP 7.4 packages for Bullseye - https://phabricator.wikimedia.org/T350767 (10Dzahn) Gotcha! thank you. I will amend my patch accordingly. [19:09:02] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2436.codfw.wmnet with OS bullseye [19:10:18] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2437.codfw.wmnet with OS bullseye [19:11:19] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1377.eqiad.wmnet with OS bullseye [19:11:49] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1378.eqiad.wmnet with OS bullseye [19:18:30] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2440.codfw.wmnet with OS bullseye [19:19:13] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2442.codfw.wmnet with OS bullseye [19:21:31] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1379.eqiad.wmnet with OS bullseye [19:22:29] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1380.eqiad.wmnet with OS bullseye [19:32:29] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2443.codfw.wmnet with OS bullseye [19:33:28] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2450.codfw.wmnet with OS bullseye [19:33:59] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw2451.codfw.wmnet with OS bullseye [19:34:42] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1381.eqiad.wmnet with OS bullseye [19:35:17] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1382.eqiad.wmnet with OS bullseye [19:35:48] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1383.eqiad.wmnet with OS bullseye [19:53:40] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2437.codfw.wmnet with OS bullseye completed: - mw2437 (**PASS**) - Downtimed on I... [19:55:44] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2440.codfw.wmnet with OS bullseye completed: - mw2440 (**WARN**) - Downtimed on I... [19:57:37] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2436.codfw.wmnet with OS bullseye completed: - mw2436 (**PASS**) - Downtimed on I... [20:04:29] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2442.codfw.wmnet with OS bullseye completed: - mw2442 (**PASS**) - Downtimed on I... [20:11:52] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2451.codfw.wmnet with OS bullseye completed: - mw2451 (**WARN**) - Downtimed on I... [20:15:51] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2443.codfw.wmnet with OS bullseye completed: - mw2443 (**PASS**) - Downtimed on I... [20:17:24] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw2450.codfw.wmnet with OS bullseye completed: - mw2450 (**WARN**) - Downtimed on I... [20:34:12] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1377.eqiad.wmnet with OS bullseye executed with errors: - mw1377 (**FAIL**) - Dow... [20:37:12] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1378.eqiad.wmnet with OS bullseye executed with errors: - mw1378 (**FAIL**) - Dow... [20:45:24] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1379.eqiad.wmnet with OS bullseye executed with errors: - mw1379 (**FAIL**) - Dow... [20:47:49] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1380.eqiad.wmnet with OS bullseye executed with errors: - mw1380 (**FAIL**) - Dow... [20:59:54] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1381.eqiad.wmnet with OS bullseye executed with errors: - mw1381 (**FAIL**) - Dow... [21:04:18] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1382.eqiad.wmnet with OS bullseye executed with errors: - mw1382 (**FAIL**) - Dow... [21:06:28] 10serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1383.eqiad.wmnet with OS bullseye executed with errors: - mw1383 (**FAIL**) - Dow... [21:18:42] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1377.eqiad.wmnet with OS bullseye [21:48:11] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=da30f496-adce-4c05-928a-c3187dd1dfd7) set by kamila@cumin1002 for 12:00:00 on 6 host(s) and... [21:52:39] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1377.eqiad.wmnet with OS bullseye completed: - mw1377 (**PASS... [22:24:07] 10serviceops, 10Data-Platform-SRE, 10SRE, 10Discovery-Search (Current work): SUP: Partition update_pipeline kafka topic - https://phabricator.wikimedia.org/T354064 (10pfischer) [22:26:02] 10serviceops, 10Data-Platform-SRE, 10SRE, 10Discovery-Search (Current work): SUP: Partition update_pipeline kafka topic - https://phabricator.wikimedia.org/T354064 (10pfischer) a:03pfischer [22:36:52] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1378.eqiad.wmnet with OS bullseye [22:37:10] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1379.eqiad.wmnet with OS bullseye [22:38:06] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1380.eqiad.wmnet with OS bullseye [22:38:10] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1381.eqiad.wmnet with OS bullseye [22:38:14] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1382.eqiad.wmnet with OS bullseye [22:38:18] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host mw1383.eqiad.wmnet with OS bullseye [23:07:30] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1379.eqiad.wmnet with OS bullseye executed with errors: - mw1... [23:11:05] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1381.eqiad.wmnet with OS bullseye completed: - mw1381 (**WARN... [23:12:31] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1378.eqiad.wmnet with OS bullseye completed: - mw1378 (**WARN... [23:14:18] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1382.eqiad.wmnet with OS bullseye completed: - mw1382 (**WARN... [23:15:35] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1380.eqiad.wmnet with OS bullseye completed: - mw1380 (**WARN... [23:19:06] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by kamila@cumin1002 for host mw1383.eqiad.wmnet with OS bullseye completed: - mw1383 (**WARN... [23:50:28] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f89737b4-3df5-4047-9a72-3c18cfcd7fb8) set by kamila@cumin1002 for 4:00:00 on 1 host(s) and t... [23:50:40] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=cf052ff1-f621-4320-91c0-2b5c12c4f457) set by kamila@cumin1002 for 12:00:00 on 1 host(s) and...