[00:38:00] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047161 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1283.eqiad.wmnet with OS bullseye... [00:39:29] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047162 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1279.eqiad.wmnet with OS bullseye... [00:41:32] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047163 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1282.eqiad.wmnet with OS bullseye... [00:44:57] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047164 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1281.eqiad.wmnet with OS bullseye... [00:46:30] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047165 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1280.eqiad.wmnet with OS bullseye... [00:47:53] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047178 (10Jclark-ctr) [00:50:11] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047179 (10Jclark-ctr) [00:50:28] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047180 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1284.eqiad.wmnet with OS bullseye... [00:50:31] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047181 (10Jclark-ctr) [00:55:04] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047186 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1295.eqiad.wmnet with OS bull... [00:55:10] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047187 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1288.eqiad.wmnet with OS bull... [00:55:19] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047188 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1289.eqiad.wmnet with OS bull... [00:56:12] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047189 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1290.eqiad.wmnet with OS bull... [00:56:15] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047190 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1291.eqiad.wmnet with OS bull... [00:57:01] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047191 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1292.eqiad.wmnet with OS bull... [00:57:48] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047192 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1293.eqiad.wmnet with OS bull... [00:58:26] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047193 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1294.eqiad.wmnet with OS bull... [00:59:03] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047194 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1287.eqiad.wmnet with OS bull... [01:19:50] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047215 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1285.eqiad.wmnet with OS bullseye... [01:19:54] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1286.eqiad.wmnet with OS bullseye... [01:33:52] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047238 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1295.eqiad.wmnet with OS bullseye... [01:36:57] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047241 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1291.eqiad.wmnet with OS bullseye... [01:40:11] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047242 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1293.eqiad.wmnet with OS bullseye... [01:42:20] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1292.eqiad.wmnet with OS bullseye... [01:42:54] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047250 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bull... [01:44:13] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047251 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1287.eqiad.wmnet with OS bullseye... [01:52:20] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047253 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1290.eqiad.wmnet with OS bullseye... [01:55:43] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047254 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1289.eqiad.wmnet with OS bullseye... [01:57:54] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047255 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1294.eqiad.wmnet with OS bullseye... [02:02:47] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047258 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1288.eqiad.wmnet with OS bullseye... [02:02:57] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047259 (10Jclark-ctr) [03:03:10] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10047274 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bullseye... [09:02:58] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana: Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10047595 (10fgiunchedi) I'm working on bumping the statsd-exporter limits, and in the meantime I got curious on general cpu throttling stats in k8s: https://w.wiki/... [09:40:02] hello folks, going ahead with my update of some wikikube workers [10:15:51] all done! The last remain one to run provision on (requires a reboot) is wikikube-ctrl2003.mgmt.codfw.wmnet [10:21:12] elukey: I did not see this yesterday, thanks! [10:21:59] jayme: I caused the issue in the first place, cleaning up :D [10:22:12] not sure what I'd need to do for wikikube-ctrl2003 though [10:22:15] should be fine to do the ctrl node as well. But maybe do that one in a infra maintenenace window to avoid deploys [10:22:33] okok, anything depool-wise that I'd need to know? [12:38:43] elukey: you can just depool it via confctrl to be sure. Nothing specific [12:39:54] ooook [12:42:12] <_joe_> . [15:06:17] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana: Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10048571 (10fgiunchedi) Even though there's no throttling now, e.g. `mw-jobrunner` statsd-exporter still shows as down from e.g. prometheus eqiad k8s: https://prome... [15:07:53] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q1): Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#10048573 (10fgiunchedi) p:05High→03Medium Update from T371885, throttling didn't i... [15:22:08] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana: Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10048619 (10fgiunchedi) Just a quick update: @hnowlan rightfully pointed out that I was looking at port 9125 as failed, which it does, though statsd-exporter actual... [15:59:36] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana: Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10048735 (10JMeybohm) Regarding the throttling: Maybe it would help to set GOMAXPROCS to something sensible/related to the CPU limit (see https://wikitech.wikimedia... [15:59:53] 06serviceops, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install mc-misc100[12] - https://phabricator.wikimedia.org/T371987 (10RobH) 03NEW [16:00:42] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10048757 (10JMeybohm) [16:00:44] 06serviceops: kafka-main replacement nodes don't fit kafka-main (storage wise) - https://phabricator.wikimedia.org/T368714#10048758 (10JMeybohm) [16:01:49] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10048760 (10JMeybohm) [16:01:51] 06serviceops: kafka-main replacement nodes don't fit kafka-main (storage wise) - https://phabricator.wikimedia.org/T368714#10048761 (10JMeybohm) [16:03:14] 06serviceops, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install mc-misc100[12] - https://phabricator.wikimedia.org/T371987#10048763 (10RobH) a:03jijiki Effie, The workflow for racking tasks has changed this quarter, once I create the racking task I assign it to the SRE sub-teams point of contact (for this task... [16:03:20] 06serviceops: mc-misc100[12] implementation tracking - https://phabricator.wikimedia.org/T371988 (10RobH) 03NEW [16:03:44] 06serviceops, 06DC-Ops, 10ops-eqiad: Q1:rack/setup/install mc-misc100[12] - https://phabricator.wikimedia.org/T371987#10048782 (10RobH) [16:03:51] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10048783 (10JMeybohm) The nodes are not in service, so no need to schedule a maint-window from our side. Feel free to choose a time that suits you best. [16:03:54] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10048784 (10JMeybohm) The nodes are not in service, so no need to schedule a maint-window from our side. Feel free to choose a time that suits you best. [17:07:41] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bull... [18:29:09] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049233 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bullseye... [18:30:12] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049237 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bull... [18:45:37] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049259 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host wikikube-worker1296.eqiad.wmnet with OS bullseye... [19:18:05] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 07Grafana: Gaps in Grafana graphs using Thanos - https://phabricator.wikimedia.org/T371885#10049340 (10Scott_French) For the endpoints marked down: it looks as if prometheus is scraping both container ports - i.e., 9102 (correct) and 9125 (statsd listen p... [19:38:58] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049405 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host gerrit1004.wikimedia.org with OS bookworm [20:16:05] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE: Q1:rack/setup/install wikikube-worker1240 to wikikube-worker1304 - https://phabricator.wikimedia.org/T369743#10049555 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host gerrit1004.wikimedia.org with OS bookworm execut...