[04:29:03] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December), 10Patch-For-Review: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10santhosh) It seems we need to continue with restbas... [07:02:15] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting, 10Language-Team (Language-2023-October-December), 10Patch-For-Review: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) >>! In T344982#9307497, @santhosh wrote: >... [08:28:35] 10serviceops, 10Data-Persistence, 10SRE-tools, 10Spicerack, 10Traffic: Switch conftool to use the version 3 etcd datastore - https://phabricator.wikimedia.org/T350565 (10Joe) [09:35:19] _joe_: With regard to the spark3 images, yes I'm afraid we do need three at the moment. I can't decommission 3.1 until T338057 has been completed, which affects all production pipelines. [09:35:38] <_joe_> btullis: I'm a bit surprised we need 3.1, 3.3 AND 3.4 [09:35:43] <_joe_> I could understand 2 versions [09:35:55] <_joe_> but 3 seems a bit much personally [09:36:04] That's not really my call, that's the data engineers using it. [09:36:24] The best I can offer is to try to move the image build pipeline from production-images to GitLab-CI. [09:37:20] I made some headway on this, but I reverted to production-images because I didn't have an easy way to say: "only build this image if the same version is not already available on docker-images.wikimedia.org" [09:38:03] <_joe_> btullis: well I object; it is your call [09:38:20] <_joe_> else if I let ddevelopers choose we'd be running 6 php versions in production [09:38:33] <_joe_> but ok [09:40:38] I will take another look at this branch, which is where I started working on building spark images under GitLab-CI. https://gitlab.wikimedia.org/repos/data-engineering/spark/-/blob/add_initial_spark_pipeline/.gitlab-ci.yml [09:42:06] If I can find a simple way to do that conditional build from gitlab, then I can move this image out of production-images. Would that work for you? [09:45:25] <_joe_> btullis: I don't think it's such a big issue, it just makes the weekly rebuild a tad slow; if it becomes an issue I'll just exclude spark from it [09:46:08] <_joe_> and I'd prefer to keep images that are not "final" in production-images as much as possible [09:47:54] _joe_: I'm not sure that I understand that second statement. Could you elaborate please? [09:48:25] <_joe_> btullis: are you using the spark images to build other images or they're just "final" images you use directly? [09:50:44] Both. Each spark version has three images. Two are final (spark-operator) and (spark) - One is a build artifact (spark-build) which contains the whole build directory. This spark-build image contains jar files that are extracted and then distributed to the servers via puppet. [09:51:48] Well, actually, a single jar file at the moment. [09:56:27] I'd like to work out a good time this week to deploy this patch, if possible: Deploy multiple spark shufflers for yarn to production | https://gerrit.wikimedia.org/r/c/operations/puppet/+/964008 [09:58:18] Sorry, wrong channel. [10:10:27] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [11:09:11] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Joe) I can't imagine why calling the primary datacenter would be a problem in this ca... [13:01:16] 10serviceops, 10Growth-Team, 10MW-on-K8s, 10MediaWiki-Platform-Team, and 6 others: MediaWiki\Extension\Notifications\Api\ApiEchoUnreadNotificationPages::getUnreadNotificationPagesFromForeign: Unexpected API response from {wiki} - https://phabricator.wikimedia.org/T342201 (10Joe) When `mcrouter-primary-dc`... [14:14:42] 10serviceops, 10Data Engineering and Event Platform Team, 10Data-Engineering, 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) Ah, got it. It is an envoy setting. https://www.envoyproxy.io/docs/envoy/lates... [14:32:19] 10serviceops: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [14:35:18] 10serviceops, 10Growth-Team, 10MW-on-K8s, 10MediaWiki-Platform-Team, and 5 others: MediaWiki\Extension\Notifications\Api\ApiEchoUnreadNotificationPages::getUnreadNotificationPagesFromForeign: Unexpected API response from {wiki} - https://phabricator.wikimedia.org/T342201 (10Joe) Since my deployment of this... [14:37:02] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Joe) I suspect the fix I made for T342201 actually might have solved this issue as we... [15:10:48] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform, 10Patch-For-Review: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) [15:13:11] 10serviceops, 10Growth-Team, 10MW-on-K8s, 10MediaWiki-Platform-Team, and 5 others: MediaWiki\Extension\Notifications\Api\ApiEchoUnreadNotificationPages::getUnreadNotificationPagesFromForeign: Unexpected API response from {wiki} - https://phabricator.wikimedia.org/T342201 (10Joe) 05Open→03Resolved [15:14:25] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Joe) @matmarex do you have a way to verify if the bug still presents itself? It's sli... [15:16:57] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Move 25% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T348122 (10Joe) >>! In T348122#9240973, @matmarex wrote: > The Kubernetes work so far has caused problems with cross-wiki Echo notifications (see T223413, T342... [15:25:06] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [15:52:00] jayme just a heads-up, we are update rdf-streaming-updater to use the flink-operator in staging-eqiad ATM, so you might see some weirdness there [15:52:18] uh, nice! [15:59:11] Y, T349095 is phab ticket FWIW [18:36:23] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10matmarex) 05Open→03Resolved a:03Joe Cross-wiki notifications reliably show up f... [18:36:37] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Move 25% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T348122 (10matmarex) It seems it was the same cause, as both issues look fixed to me. Thanks! [18:47:14] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Tgr) >>! In T223413#9308043, @Joe wrote: > I can't imagine why calling the primary da... [19:29:17] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF) >>! In T319432#9306384, @tstarling wrote: > We can start work on 8.3 CI whenever. Yup, T339350 is tracking that; right now we're blocke... [20:56:38] 10serviceops, 10Abstract Wikipedia team, 10SRE, 10Service-deployment-requests: New Service Request: function-orchestrator and function-evaluator (for Wikifunctions launch) - https://phabricator.wikimedia.org/T297314 (10Jdforrester-WMF) 05In progress→03Resolved [20:58:13] 10serviceops, 10Abstract Wikipedia team: Sandboxing Strategy for Wikifunctions - https://phabricator.wikimedia.org/T343829 (10Jdforrester-WMF) [20:58:23] 10serviceops, 10Abstract Wikipedia team: Sandboxing Strategy for Wikifunctions - https://phabricator.wikimedia.org/T343829 (10Jdforrester-WMF) 05In progress→03Resolved [21:50:05] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) Okay, I just applied the prestop_sleep settings to all eventgates.... [21:54:30] 10serviceops, 10Data-Engineering, 10Data Engineering and Event Platform Team (Sprint 4), 10Event-Platform: [Event Platform] Gracefully handle pod termination in eventgate Helm chart - https://phabricator.wikimedia.org/T349823 (10Ottomata) a:03Ottomata