[01:35:47] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Allow running one-off scripts manually - https://phabricator.wikimedia.org/T341553 (10RLazarus) Surfacing @JMeybohm's reasonable concern from https://gerrit.wikimedia.org/r/c/988851/comments/3827b6cd_15427748: > Running jobs via helmfile will result in one hel... [11:20:17] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Move video transcoding to use Shellbox - https://phabricator.wikimedia.org/T356241 (10Joe) [11:20:19] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), and 2 others: Convert TimedMediaHandler to use BoxedCommand/Shellbox - https://phabricator.wikimedia.org/T356242 (10Joe) 05Open→03Resolved [11:24:48] 10serviceops, 10MW-on-K8s, 10Video: Create new flavour of shellbox for video transcoding - https://phabricator.wikimedia.org/T357296 (10Joe) [11:41:07] 10serviceops, 10Data-Engineering, 10Wikidata, 10Wikidata-Termbox, and 3 others: Migrate Termbox SSR from Node 16 to 18 - https://phabricator.wikimedia.org/T355685 (10Lucas_Werkmeister_WMDE) > Patches are up for review! Looks alright to me – I think if another SRE can review the general changes, we can try... [11:56:06] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Create a deployment for `shellbox-timedmedia` - https://phabricator.wikimedia.org/T357309 (10Joe) [12:56:35] 10serviceops, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 11 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10MSantos) [13:57:08] 10serviceops, 10EventStreams, 10Prod-Kubernetes, 10Data-Engineering (Sprint 8), and 2 others: eventstreams regularly uses more than 95% of its memory limit - https://phabricator.wikimedia.org/T357005 (10gmodena) >>! In T357005#9531775, @tchin wrote: > Looking at the logs, this seems to coincide with the re... [15:20:49] 10serviceops, 10Infrastructure-Foundations, 10SRE-tools, 10Wikimedia-Mailing-lists: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10joanna_borun) p:05Triage→03Medium [15:48:52] 10serviceops, 10SRE, 10observability, 10Patch-For-Review: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10joanna_borun) [15:49:39] 10serviceops, 10SRE, 10observability, 10Patch-For-Review: Cert renewal for {appserver,api}.svc.{eqiad,codfw}.wmnet - https://phabricator.wikimedia.org/T304237 (10joanna_borun) @lmata is it still valid? [16:00:08] 10serviceops, 10Thumbor, 10Kubernetes: Consider moving to haproxy ingress for Thumbor workers - https://phabricator.wikimedia.org/T357145 (10akosiaris) Thanks for the writeup. I agree on almost everything but I think some clarification would help me figure out some things. > If a worker is busy, kube-probe... [16:11:50] 10serviceops, 10CX-cxserver, 10Citoid, 10Content-Transform-Team-WIP, and 11 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Sbailey) a:03Sbailey [16:34:34] 10serviceops, 10iPoid-Service (iPoid 1.0): Determine cause of HTTP 503 errors for ~8% of MediaWiki requests to ipoid service - https://phabricator.wikimedia.org/T356766 (10jijiki) so far: * dumped traffic at the pod level (ipoid has only one pod, service is low traffic), and I never saw a packet from an app... [16:42:21] 10serviceops, 10iPoid-Service (iPoid 1.0): Determine cause of HTTP 503 errors for ~8% of MediaWiki requests to ipoid service - https://phabricator.wikimedia.org/T356766 (10kostajh) >>! In T356766#9534518, @jijiki wrote: > so far: > > * dumped traffic at the pod level (ipoid has only one pod, service is low t... [16:48:31] 10serviceops: Cross fleet runc upgrades - https://phabricator.wikimedia.org/T356661 (10akosiaris) Adding @rzl @Scott_French and @Volans per recent discussion on spicerack/cumin training/onboarding. The 15 line candidate patch is at T356661#9516327. T277677 might also have some useful information [16:48:59] 10serviceops, 10iPoid-Service (iPoid 1.0): Determine cause of HTTP 503 errors for ~8% of MediaWiki requests to ipoid service - https://phabricator.wikimedia.org/T356766 (10jijiki) [16:49:10] 10serviceops, 10iPoid-Service (iPoid 1.0): Determine cause of HTTP 503 errors for ~8% of MediaWiki requests to ipoid service - https://phabricator.wikimedia.org/T356766 (10jijiki) [18:15:35] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Create a deployment for `shellbox-timedmedia` - https://phabricator.wikimedia.org/T357309 (10kamila) a:03kamila [18:16:06] 10serviceops, 10iPoid-Service (iPoid 1.0): Determine cause of HTTP 503 errors for ~8% of MediaWiki requests to ipoid service - https://phabricator.wikimedia.org/T356766 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=4d30b62d-0b87-403c-94a6-0a6b14becab4) set by cgoubert@cumin2002 for 1 day,...