[04:55:46] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December), 10Patch-For-Review: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10santhosh) https://test.wikipedia.org/w/rest.php/cor... [08:46:45] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December), 10Patch-For-Review: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) @santhosh ok, I filed {T350661} for enablin... [09:37:37] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [10:56:33] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [11:12:06] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) wikifeeds deployment is blocked by a config change (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/961696) tha... [11:28:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [11:42:34] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [11:43:53] jayme: I think we're set to switch Wikifunctions to General Availability in a couple of hours' time. [11:44:24] nice! [11:52:23] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [12:01:23] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Joe) >>! In T223413#9309890, @Tgr wrote: > That's not the case here - the Echo reque... [12:17:29] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [13:52:26] 10serviceops, 10Abstract Wikipedia team: helmfile -e staging -i apply --context 5 times out for version bump of Python function-evaluator (but not JS) - https://phabricator.wikimedia.org/T350685 (10Jdforrester-WMF) [14:04:32] 10serviceops, 10Abstract Wikipedia team: helmfile -e staging -i apply --context 5 times out for version bump of Python function-evaluator (but not JS) - https://phabricator.wikimedia.org/T350685 (10JMeybohm) helmfile will rollback the deployment in case it does not get "ready" within 10 minutes (timeout parame... [14:10:25] 10serviceops, 10Abstract Wikipedia team: helmfile -e staging -i apply --context 5 times out for version bump of Python function-evaluator (but not JS) - https://phabricator.wikimedia.org/T350685 (10JMeybohm) See https://logstash.wikimedia.org/goto/250702d316f57befaa341112c63fa99e for the k8s events emitted dur... [14:13:48] 10serviceops, 10Abstract Wikipedia team: helmfile -e staging -i apply --context 5 times out for version bump of Python function-evaluator (but not JS) - https://phabricator.wikimedia.org/T350685 (10JMeybohm) The automatic rollback has completed successfully, so no issue there. You might try again any time. For... [14:34:21] 10serviceops, 10collaboration-services, 10GitLab (CI & Job Runners): Standardize Debian package builds on GitLab CI - https://phabricator.wikimedia.org/T304491 (10MatthewVernon) Tutorial now moved to main namespace [[ https://wikitech.wikimedia.org/wiki/Debian_packaging/Tutorial | Debian_packaging/Tutorial ]... [15:37:00] 10serviceops, 10iPoid-Service, 10Patch-For-Review, 10Trust and Safety Product Sprint (Sprint Bodhrán): [M] Write CronJob configuration - https://phabricator.wikimedia.org/T346861 (10jijiki) 05Open→03In progress p:05Triage→03High [16:09:41] 10serviceops, 10Machine-Learning-Team: Multiple images fail to build from sources - https://phabricator.wikimedia.org/T350366 (10Joe) [16:36:20] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [17:06:33] 10serviceops, 10Data-Engineering, 10Event-Platform: Increase k8s namespace limits for eventgate-analytics - https://phabricator.wikimedia.org/T350707 (10Ottomata) [17:15:30] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) Did another eventgate-main deployment just now. I don't see any flo... [17:16:30] <_joe_> ottomata: \o/ [17:18:28] 10serviceops, 10WMF-JobQueue, 10Patch-For-Review, 10Unstewarded-production-error, and 2 others: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - https://phabricator.wikimedia.org/T249745 (10Ottomata) In {T349823}, we added prestop sleep settings for the envoy tls proxy in... [17:56:13] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Novem_Linguae) For me, I am in USA, and the bug for MediaWiki.org notifications tryin... [17:59:05] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Joe) >>! In T223413#9313665, @Novem_Linguae wrote: > For me, I am in USA, and the bug... [18:37:22] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] eventgate-wikimedia occasionally fails to produce events due schema fetch errors - https://phabricator.wikimedia.org/T350713 (10Ottomata) [18:37:34] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] eventgate-wikimedia occasionally fails to produce events due schema fetch errors - https://phabricator.wikimedia.org/T350713 (10Ottomata) [18:37:43] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] eventgate-wikimedia occasionally fails to produce events due schema fetch errors - https://phabricator.wikimedia.org/T350713 (10Ottomata) [18:45:00] _joe_: hopefully that will help! not sure though! [18:45:16] _joe_: if you are still around, do you have any pointers for https://phabricator.wikimedia.org/T350713 ? [18:45:26] I'd like to know why these service proxy requests are 503ing [19:25:28] hello! I am trying to make unrelated DNS changes but get this diff: [19:25:32] +mw-jobrunner 1H IN A 10.2.2.90 [19:25:42] +mw-jobrunner 1H IN A 10.2.1.90 [19:25:56] since these are service addresses being added I hesitate to proceed [19:26:12] but my only other option is to abort my entire makevm cookbook [19:26:43] which will mean being maybe in an undefined state and having to cleanup remnants or unsure [19:28:05] hnowlan: it looks like this change is https://gerrit.wikimedia.org/r/c/operations/dns/+/972394 but that is not merged. ? do you know? [20:12:39] <+icinga-wm> PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL [20:15:23] mutante: go ahead, the svc record are currently both in the manual dns repo and netbox and the one that goes in prod i the dns repo one [20:15:34] so the netbox one is a noop [20:15:51] volans: thank you! ok, I am doing the "go" [20:15:52] *only* for svc.{eqiad,codfw}.wmnet record [20:16:37] yea, that's what I saw, one for each DC [20:16:55] plus my own change [20:17:19] volans: also I can confirm the reimage worked earlier [20:17:46] and now it's adding that host into netbox host status data [20:17:52] * mutante types go again [20:17:53] glad it worked, sorry for the trouble [20:18:01] no worries [20:18:46] I really wanted that special VM to exist in both DCs.. no more one-offs [20:18:59] so same thing now in eqiad [20:19:04] totally agree on that [20:19:12] :) [20:37:45] 10serviceops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE-tools, and 2 others: Switch conftool to use the version 3 etcd datastore - https://phabricator.wikimedia.org/T350565 (10KOfori) [20:51:04] mutante: apologies, that was me reserving those IPs. [22:25:46] 10serviceops, 10Beta-Cluster-Infrastructure, 10Thumbor, 10Beta-Cluster-reproducible: deployment-prep needs a Thumbor instance - https://phabricator.wikimedia.org/T344605 (10Tgr) Well that didn't work. The error actually comes from [[https://gerrit.wikimedia.org/g/operations/puppet/+/b86fe11b85d3b27b1fcedcc... [22:30:17] hnowlan: no worries, it's merged [23:03:01] 10serviceops, 10CirrusSearch, 10MediaWiki-Configuration, 10MediaWiki-Engineering, 10Discovery-Search (Current work): Provide a method for internal services to run api requests for private wikis - https://phabricator.wikimedia.org/T345185 (10aaron) >>! In T345185#9141119, @Tgr wrote: > I would maybe creat...