[09:46:21] o/ seeing Error: execution error at (validate-envoy-config/templates/deployment.yaml:15:12): Listener mw-api-async-transition not found in the proxies (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1051077) [09:54:54] might be related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047447, I suspect that .fixtures/validate_envoy_config.yaml might need to be updated to stop checking mw-api-async-transition? [09:55:04] 06serviceops, 06Infrastructure-Foundations, 10netops, 06Traffic: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545#9938878 (10fgiunchedi) >>! In T368545#9929623, @Vgutierrez wrote: >>>! In T368545#9929335, @ayounsi wrote: >> I think I miss some context, what's t... [09:57:00] cc claime ^ [09:57:39] dcausse: yep, fixing [09:57:54] thanks! [10:01:01] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1051087 [10:01:08] Thanks for catching it [10:02:16] claime: <3 [10:02:37] dcausse: you'll be able to rebase and rerun CI once it's merged [10:02:49] sure [10:46:01] 👋 Can somebody help us with https://phabricator.wikimedia.org/T366819#9930713 ? It looks like trying to connect from PCS to staging eventgate times out [10:57:41] nemo-yiannis: I can curl https://staging.svc.eqiad.wmnet:4492/v1/stream-configs from the pod's namespace so I don't think it's networkpolicy related [10:58:03] hm ok [11:18:35] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9939276 (10Sfaci) Hi @Scott_French! Thanks for your suggestion!. Just for cu... [11:20:25] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939285 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye [11:22:10] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9939303 (10SGupta-WMF) @Scott_French I have updated the repo , and tagged the... [12:01:11] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939413 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye executed with errors: - wi... [12:01:43] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939415 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye [12:12:34] 06serviceops, 06Content-Transform-Team-WIP, 10RESTBase, 10RESTBase Sunsetting, and 2 others: Enable PCS to send resource change events to handle URL purges - https://phabricator.wikimedia.org/T366819#9939478 (10akosiaris) For posterity's sake ` nemo-yiannis: I can curl https://staging.svc.eqiad.w... [12:15:59] hm, yeah of course we are wrapping everything to use service mesh by default so it tried to go through service mesh and failed [12:16:12] thanks claime for helping with debugging [12:54:43] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939573 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye executed with errors: - wi... [12:55:27] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939576 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye [13:13:25] hnowlan: o/ [13:13:41] elukey: he's ooo today and tomorrow [13:13:47] ah snap okok [13:14:04] anybody up for a brainbounce about thumbor-pluging (docker image)? [13:15:07] <_joe_> elukey: what's up? [13:17:43] _joe_ I'd need to trigger a rebuild to get a package upgrade (libvpx), and I am wondering what's best. IIUC thumbor-plugin has its own repo/blubber config, so one way could be to change the base image to include a tag (at the moment it is docker-registry.wikimedia.org/python3-build-bookworm) [13:18:08] otherwise maybe triggering a job in ci, not sure if there is a quicker/smarter option [13:18:45] <_joe_> elukey: sorry, which image layer adds the package you want to add? [13:20:51] _joe_ nono I just need a newer version of the package in the docker image, it is already installed [13:21:08] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: Update all helm modules and charts to be compatible with the restricted PSS - https://phabricator.wikimedia.org/T362978#9939646 (10dcausse) Hi I'm having issues with a flink job running in staging and failing to deploy with an error: `>>> Status | Erro... [13:21:08] sort of what we do with production images and the weekly rebuild [13:21:17] <_joe_> elukey: ok, there is gonna be a layer where it gets installed [13:21:26] <_joe_> the thumbor final image or what? [13:22:52] it should be a dependency installed in the thumbor final image yes [13:22:57] this is why I am asking [13:23:03] <_joe_> elukey: looking at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/thumbor-plugins/+/refs/heads/master/.pipeline/blubber.yaml I'd say it's installed in the final image [13:23:15] <_joe_> so yeah, you just need a null commit to trigger a rebuilt [13:23:29] <_joe_> *rebuild [13:24:58] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: Update all helm modules and charts to be compatible with the restricted PSS - https://phabricator.wikimedia.org/T362978#9939657 (10JMeybohm) 05Resolved→03Open Yes, kind of. What deploy command did you run (for me to reproduce)? [13:25:15] _joe_ okok I hoped there was a quicker way like triggering a build somehow via CI [13:25:30] <_joe_> ofc you can [13:25:33] <_joe_> but it's not faster :D [13:28:01] always a joy :) [13:28:09] thanks! [13:37:24] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939708 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye completed: - wikikube-work... [13:45:06] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: Update all helm modules and charts to be compatible with the restricted PSS - https://phabricator.wikimedia.org/T362978#9939730 (10JMeybohm) FTR: It was the cirrus-streaming-updater depoyment in staging that failed. Looks like we missed adding the security... [14:13:14] 06serviceops, 06DC-Ops, 10ops-codfw, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T368743#9939862 (10Jhancock.wm) 05Open→03Resolved a:03Jhancock.wm [14:29:43] 06serviceops, 06Infrastructure-Foundations, 07Security: Upgrade K8s docker images to running in production on Buster with either Bullseye or Bookworm - https://phabricator.wikimedia.org/T368366#9939926 (10elukey) p:05Triage→03Medium a:03elukey [14:31:35] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9939935 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye [14:33:27] 06serviceops: deploy1003 implementation tracking - https://phabricator.wikimedia.org/T364417#9939958 (10akosiaris) I 've applied the role and now working through packaging `python3-imagecatalog` for bullseye [14:43:18] 06serviceops, 10Prod-Kubernetes, 13Patch-For-Review: Update all helm modules and charts to be compatible with the restricted PSS - https://phabricator.wikimedia.org/T362978#9940043 (10JMeybohm) 05Open→03Resolved [14:59:37] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2023/2024-Q4): Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#9940113 (10Clement_Goubert) All main deployments of mw-on-k8s now send data to `statsd... [15:10:42] 06serviceops, 10MW-on-K8s: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9940138 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host wikikube-worker2026.codfw.wmnet with OS bullseye completed: - wikikube-work... [15:42:48] 06serviceops, 06Infrastructure-Foundations, 07Security: Upgrade K8s docker images to running in production on Buster with either Bullseye or Bookworm - https://phabricator.wikimedia.org/T368366#9940448 (10elukey) Built and rolled out the images listed in the description to staging envs. The next step is to r... [16:48:17] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install deploy1003 - https://phabricator.wikimedia.org/T364416#9940819 (10akosiaris) [16:48:49] 06serviceops: deploy1003 implementation tracking - https://phabricator.wikimedia.org/T364417#9940828 (10akosiaris) * python3-imagecatalog published and gerrit repo updated * php72 component made conditional [16:50:23] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Q4:rack/setup/install deploy1003 - https://phabricator.wikimedia.org/T364416#9940823 (10akosiaris) 05Open→03Resolved Host is imaged, rest of the work is ongoing in T364417 [17:27:07] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9941105 (10Scott_French) [17:41:31] 06serviceops, 10MW-on-K8s: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284#9941224 (10RLazarus) 05Open→03Resolved [17:57:36] 06serviceops, 06SRE, 10Data Products (Data Products Sprint 15), 13Patch-For-Review, 07Service-deployment-requests: Commons Impact Metrics AQS 2.0 Deployment to Staging and Production - https://phabricator.wikimedia.org/T361835#9941344 (10Scott_French) Thanks, @SGupta-WMF ! The service is up and running... [18:27:38] 06serviceops, 10MW-on-K8s: Pipe stdin into one-off maintenance scripts on Kubernetes - https://phabricator.wikimedia.org/T368966 (10RLazarus) 03NEW [18:44:30] 06serviceops, 10MW-on-K8s: Pipe stdin into one-off maintenance scripts on Kubernetes - https://phabricator.wikimedia.org/T368966#9941673 (10RLazarus) [19:06:17] 06serviceops, 10Wikidata, 10wmde-wikidata-tech, 03Discovery-Search (Current work), 13Patch-For-Review: Ensure that WDQS query throttling does not interfere with federation - https://phabricator.wikimedia.org/T361950#9941769 (10Gehel) >>! In T361950#9934236, @dcausse wrote: > tagging #serviceops for help... [19:33:40] 06serviceops, 103D, 06Commons, 07Regression: STL 3D models broken: "Sorry, the file Undefined cannot be displayed since it is not present on the current page." - https://phabricator.wikimedia.org/T368301#9941900 (10TheDJ) @simon04 what do you think ? [20:31:17] 06serviceops: deploy1003 implementation tracking - https://phabricator.wikimedia.org/T364417#9942184 (10dancy) Noting that having `deploy1003.eqiad.wmnet` in deploy1002:/etc/dsh/group/scap-masters before it is fully set up is causing problems for scap deployments. For example, I got the following when trying to... [20:36:00] 06serviceops, 10MW-on-K8s, 10Observability-Metrics, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q1): Create a per-release deployment of statsd-exporter for mw-on-k8s - https://phabricator.wikimedia.org/T365265#9942209 (10lmata)