[00:40:25] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10conftool, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Papaul) @Joe will do [10:07:15] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10elukey) [10:47:07] 10serviceops, 10MW-on-K8s, 10SRE, 10observability, 10Patch-For-Review: New mediawiki.httpd.accesslog topic on kafka-logging + logstash and dashboard - https://phabricator.wikimedia.org/T324439 (10Clement_Goubert) There was a typo made when creating the topics (`mediawiki.http.accesslog` instead of `media... [10:55:03] for your consideration https://gerrit.wikimedia.org/r/880898 should possibly be two CRs rather than one but the logic stands [10:55:24] that said thumbor is in a known state at the moment, and this introduces an unknown state even if it's technically a desired one [11:04:40] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1001 for host mc1048.eqiad.wmnet with OS bullseye [11:28:36] <_joe_> hnowlan: I was looking at https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#monitor%20fail [11:29:08] <_joe_> not sure if we also want to add such a condition, maybe on the request queue getting too large, if that makes sense [11:30:00] <_joe_> something like [11:31:38] <_joe_> monitor fail if queue() > 100 [11:32:05] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1001 for host mc1048.eqiad.wmnet with OS bullseye completed: - mc1048 (**PASS**) - Downtimed on Icinga/Alertmanager - Disa... [11:41:27] _joe_: oh, nice [11:41:42] <_joe_> hnowlan: that's completely untested btw [11:42:03] <_joe_> but on the positive side, you have a nice haproxy container where you can run experiments [11:42:06] there's also avg_queue() which would be handy if we were keeping metal instances around. This is new to me though [12:27:28] 10serviceops, 10Kubernetes: Show less diff context by default on helm apply - https://phabricator.wikimedia.org/T326205 (10Clement_Goubert) I've pushed my investigation a bit further. I can't make it default **but** you can use `--context` with apply. To get a shorter diff: `helmfile -e eqiad -i apply --conte... [14:28:15] I see a lot of alerts on operations channel. Should we hold on deployments to kubernetes? [14:28:38] We're all hands on deck on an incident right now [14:29:17] nemo-yiannis: so if it can wait until we've got that under control, it'd be awesome :p [14:29:23] yeah, sounds good [16:05:36] 10serviceops, 10serviceops-collab, 10GitLab (CI & Job Runners): Standardize Debian package builds on GitLab CI - https://phabricator.wikimedia.org/T304491 (10LSobanski) p:05Triage→03Low [16:45:43] 10serviceops, 10GitLab, 10serviceops-collab, 10Kubernetes: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10LSobanski) 05Open→03Resolved a:03Dzahn Resolving as it seems like the original request was addressed. If there's follow up disc... [19:04:31] 10serviceops: Upgrade mc* and mc-gp* hosts to Debian Bullseye - https://phabricator.wikimedia.org/T293216 (10jijiki) [19:33:52] 10serviceops, 10Parsoid, 10SRE, 10Scap: scap groups on bastions still needed? - https://phabricator.wikimedia.org/T327066 (10Dzahn) +1, I also think those are not relevant anymore. If something like this is needed it should be done from deployment servers or maybe mwmaint but not bastions. I would say rem... [19:48:07] 10serviceops, 10Data-Engineering-Planning, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) Rats, neither the [[ https://gerrit.wikimedia.org/r/879618 | NetworkPolicy...