[08:41:22] 10serviceops, 10Maps: Upgrade maps servers to bullseye - https://phabricator.wikimedia.org/T327513 (10awight) Please also consider the libmapnik upgrade from [[ https://packages.debian.org/stretch/libmapnik3.0 | 3.0 ]] to [[ https://packages.debian.org/bullseye/libmapnik3.1 | 3.1 ]] . This was the goal of sig... [08:44:57] 10serviceops, 10Maps: Upgrade maps servers to bullseye - https://phabricator.wikimedia.org/T327513 (10awight) I think the task is already saying this, but just to be clear nodejs 12 is end-of-life, and node 14 will be deprecated in April 2023, which makes the k8s migration more appealing than chasing these old... [09:43:44] 10serviceops, 10Maps: Upgrade maps servers to bullseye - https://phabricator.wikimedia.org/T327513 (10MoritzMuehlenhoff) >>! In T327513#8547924, @awight wrote: > I think the task is already saying this, but just to be clear nodejs 12 is end-of-life, Note that while that is true for the upstream releases, Node... [10:01:18] hello folks [10:01:42] if nobody opposes I'd merge this changeprop's chart change [10:01:43] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/881594 [10:01:51] and its next inline, for staging [10:02:22] (basically enable a new workflow to call liftwing for certain mediawiki.revision-create events) [10:14:40] <_joe_> elukey: this is currently a noop in production, minus the chart bump, right? [10:15:09] elukey: +1, but changeprop configuration is enough of a black art to not be utterly sure [10:15:21] _joe_ yes exactly! [10:16:20] I love how there is an ignore stanza for 503 [10:16:29] but the post happens 2 levels below [10:16:38] <_joe_> elukey: oh so you're generating the events *from liftwing* [10:16:53] posting to liftwing and liftwing posting to eventgate [10:16:57] that's my understanding [10:17:37] <_joe_> yeah I thought we wanted to go another way and just have changeprop post to eventgate the value extracted from liftwing [10:17:57] <_joe_> I don't have a preference here tbh [10:18:31] this can be more async in nature, no need to changeprop to wait for a result from liftwing to post it to eventgate [10:18:36] <_joe_> apart from the fact that we remove centralization of logic, which is overall a positive; and that we have more things that need to talke to eventgate [10:18:36] I kinda prefer it [10:18:59] <_joe_> it depends on how liftwing is implemented [10:19:08] which is way I said "can" [10:19:14] yes correct lift wing should generate the event, I wanted to move away from this horror https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/change-propagation/+/refs/heads/master/sys/ores_updates.js [10:19:33] <_joe_> elukey: fully agreed that horror isn't necessarily the way to go [10:19:46] <_joe_> I kinda liked the benthos idea though [10:20:06] <_joe_> but that's for later I guess [10:20:06] elukey: do I understand correctly that you rely on already written changeprop code? No need to implement something similar to ores_updates.js ? [10:20:10] (to be clear - I am 100% convinced that at the time, with ORES, it was the best and most efficient way to go, but not now) [10:20:20] <_joe_> akosiaris: that's correct [10:20:33] exactly yes, only yaml config for changeprop [10:20:35] 🥳 [10:20:44] and in the future we may use benthos as well of course [11:13:22] merged the changeprop chart change, after lunch I'll deploy to staging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/881664 [11:13:25] :) [11:32:00] Welcome Eoghan :) [11:32:10] <_joe_> oh hey welcome! [11:33:37] Hello, thank you! [11:35:38] hey eoghan :D [11:40:15] welcome! [11:52:06] Welcome eoghan o/ [11:58:01] Welcome Eoghan! [12:00:26] Hey, thanks! Looking forward to working with you all [12:22:45] 10serviceops, 10VisualEditor, 10MW-1.39-notes (1.39.0-wmf.21; 2022-07-18), 10Parsoid (Tracking), and 4 others: Preemptively warm caches for Parsoid output - https://phabricator.wikimedia.org/T301371 (10daniel) [13:07:18] 10serviceops, 10Content-Transform-Team-WIP, 10Maps, 10Patch-For-Review: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) Import to codfw has been completed, and we have bootstrapped its tile storage using https://gerrit.wikimedia.org/r/c/operations/pu... [13:28:26] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Update staging-codfw to k8s 1.23 - https://phabricator.wikimedia.org/T326340 (10JMeybohm) [13:38:24] o/ jayme no hurry but when you find a moment i think i'm doing something dumb or have messed something when trying to deploy flink-operator https://phabricator.wikimedia.org/T324576#8544808 [13:38:57] i'm not sure if i need an entry in chartVersions? the doc seems to say I don't. [13:39:40] <_joe_> ottomata: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/882612 btw [13:39:59] ohho [13:40:00] ty [13:40:17] <_joe_> sextant was complaining vigorously :P [13:40:26] as it should [13:54:35] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8544808, @Ottomata wrote: > What am I doing wrong? You're just runn... [13:56:53] ty jayme i knew there was something silly [13:57:28] yw :) [13:58:02] riiight because it applies all releases for admin_ng all at once? [13:58:14] (without the -l) [13:59:51] Yes. And it needs the values includes from the master helmfile (like the chart versions on which it was failing in your case) [13:59:57] back in ~40min [14:00:12] makes much sense, thank you [14:14:32] going to deploy changeprop in staging :) [14:31:27] welp jayme we are beyond the previous error, but on to the next one. k8s api netpol seems to be fine now. [14:31:45] now it is taskmanager pods trying to talk to jobmanager : Failed to connect to [flink-app-main.stream-enrichment-poc/10.67.25.13:6123 [14:32:38] 10.67.25.13 is the jobmanager pod ip. i guess i need a netpol for that port? [14:44:54] do you allow ingress traffic on port 6123? [14:45:22] in generall pod-to-pod egress is allowed, but ingress needs to be allowed explicitely [14:45:30] ottomata: ^^ [14:45:53] oh, okay thank you. i don't think i have anything special there, will do that then. [15:19:52] If I can do anything on the DSE/flink stuff LMK [15:27:08] ah,, I see otto-mata has left me a note on 881458 , will start there [15:32:55] 10serviceops, 10Content-Transform-Team-WIP, 10Maps: Re-import full planet data into eqiad and codfw - https://phabricator.wikimedia.org/T314472 (10jijiki) [15:42:39] 10serviceops, 10observability: Create a visual representation of where each service is active from, any given time - https://phabricator.wikimedia.org/T327663 (10jijiki) [15:51:59] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update staging-eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T327664 (10JMeybohm) [15:52:59] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [15:53:15] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update staging-eqiad to k8s 1.23 - https://phabricator.wikimedia.org/T327664 (10JMeybohm) [15:58:29] 10serviceops, 10Infrastructure-Foundations: Create a cookbook to help us depool *all* services from a datacentre - https://phabricator.wikimedia.org/T327665 (10jijiki) [16:21:37] jayme: https://integration.wikimedia.org/ci/job/helm-lint/9113/console look okay? [16:22:06] also: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/882680 [16:26:00] do people know if staging's changeprop is supposed to process rules or not? [16:27:14] elukey: a philosopher friend of mine once taught me how to lie without lying. e.g. "Not many people know that staging's changeprop is supposed to process rules." [16:28:24] ottomata: was it something supposed to help?? :D :D :D [16:28:40] nope, probably the opposite :p [16:28:45] jayme: i'm going to merge those and try, just to keep moving. if you have comments lemme know and i'll fix. [16:28:47] ahhh thanks <3 [16:28:56] haha [16:29:17] anyway, in theory I would expect changeprop in staging not to process rules since it may interfere with production's traffic [16:29:26] but I don't find a clear setting/comment about it [16:32:18] it uses eventgate staging though [16:32:19] mmmmm [16:38:35] 10serviceops, 10Maps: Upgrade maps servers to bullseye - https://phabricator.wikimedia.org/T327513 (10jijiki) p:05Triage→03Medium [16:54:22] ottomata: I'm in meetings rest of the day, sorry [16:54:35] k [17:30:10] 10serviceops, 10Service-deployment-requests: New Service Request 'security-api' - https://phabricator.wikimedia.org/T325147 (10Joe) >>! In T325147#8516086, @STran wrote: > @Joe Suman said you were the person to talk to regarding next steps? Hi @STran, first of all sorry if it took this long to get back to you... [18:27:36] hmm, looks like the helm linter doesn't like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/881458/7/charts/flink-kubernetes-operator/Chart.yaml#19 , not sure why yet [18:29:17] oh inflatador Chart.yaml is not a template, so you don't need the temlate commenters there. [18:29:46] the reason to have them in the template/ files, is without them, helm diff etc. wil always print out those licenses, since they are part of the rendered output. [18:31:50] ottomata thanks, easy fix then ;) [18:57:37] inflatador: did you see my comment about needing the image updated too? [19:20:12] 10serviceops, 10CampaignEvents, 10Wikimedia-Site-requests, 10Campaign-Registration, 10Campaign-Tools (Campaign-Tools-Sprint-28): Run the timezone update script periodically in prod and in beta - https://phabricator.wikimedia.org/T320403 (10vaughnwalters) [19:21:03] ottomata indeed, LMK if this covers it https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/881907 [19:28:39] 10serviceops, 10MediaWiki-libs-Rdbms, 10Performance-Team, 10Platform Engineering, and 3 others: Determine and implement multi-dc strategy for ChronologyProtector - https://phabricator.wikimedia.org/T254634 (10Krinkle) [20:07:45] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10bking) [20:43:47] inflatador: that will probably cover it! [20:44:02] you can try building it and using it. it is possible to build wth docker-pkg locally. [20:44:15] have you tried flink-operator locally with minikube yet? [20:44:20] if not we can try together if you like! [20:45:21] ottomata nothing beyond what's in the quickstart. And yeah, I'd love to work on it together if/when you have time [20:45:51] k, i might have some time in 45 mins, maybe sooner! trying frantically to finish 2 things! [20:46:04] ottomata no worries, I'll be around for the next 2h or so [20:46:12] (or the rest of the wk for that matter) [20:54:43] remind me what path docker-pkg goes into within the deployment-charts repo? [20:55:05] hm, it doesn't do deployment-charts, you mean production-images? [20:55:26] to build locally [20:55:26] i do [20:55:32] cd production–images [20:55:33] docker-pkg --info -c ./config.yaml build --use-cache images [20:55:39] I think i had to pip install docker-pkg [20:55:42] maybe from source? [20:55:50] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/docker-pkg/ [20:57:25] YESSSSSSSSI AM RUNNING flink-example-app in dse-k8s-eqiad [20:57:26] FINALLY! [20:57:54] 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 07), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) FINALLY GOT flink-example-app running. YESSSS! [20:58:10] inflatador: okay that was 1 of the two things [20:58:36] once you get minikube ready and running, and can build images locally, let's jump in a huddle and get you going! [20:58:41] ottomata yeah, that's the ticket! It was rolling around in my head