[08:37:12] 10serviceops, 10CX-deployments, 10MinT, 10Language-Team (Language-2023-April-June): Remove Flores key from production - https://phabricator.wikimedia.org/T337284 (10jbond) [08:46:35] https://github.com/wikimedia/operations-deployment-charts/blob/master/_scaffold/service/_skel/templates/secret.yaml is really nice folks [08:46:38] <3 [10:36:19] 10serviceops, 10Commons, 10UploadWizard: Uploading via UploadWizard gets stuck for a 11 MB JPG - https://phabricator.wikimedia.org/T274150 (10jbond) [10:37:05] 10serviceops, 10Traffic: Either include X-Varnish in MediaWiki logs and include the X-Varnish in Varnish 5xx logs; or, include the beresp X-Request-Id in Varnish 5xx logs - https://phabricator.wikimedia.org/T274595 (10jbond) [11:20:12] 10serviceops, 10Wikimedia-Site-requests, 10Performance Issue: Choose a sensible set of thumbnail sizes for Special:Preferences - https://phabricator.wikimedia.org/T106640 (10jbond) [12:07:09] 10serviceops, 10Commons, 10MediaWiki-File-management, 10Thumbor: Thumbnail rendering of complex SVG file leads to Error 500 or Error 429 instead of Error 408 - https://phabricator.wikimedia.org/T226318 (10jbond) [12:09:53] 10serviceops, 10Wiki-Setup (Delete / Redirect): Merge or delete grantswiki - https://phabricator.wikimedia.org/T229950 (10jbond) [12:43:28] ottomata: https://gerrit.wikimedia.org/r/c/operations/puppet/+/922497 you're missing staging [12:45:53] 10serviceops, 10Commons, 10Traffic, 10Wikimedia-Site-requests, 10Patch-For-Review: Enforce upload rate limits for bots on commons - https://phabricator.wikimedia.org/T248177 (10jbond) [12:48:33] jayme: in staging we are going to use zookeeper test cluster, which has ferm rule that allows prod domain access [12:48:47] ottomata: ah, okay [12:48:52] jayme: could you take a quick look at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/922505 though? [12:49:14] esp around using networkpolicy_1.0.0.tpl like I did [12:49:52] i kept that vendor template the same, no edits, but just added an egress rule in our networkpolicy.tpl to use it. so hopefully the interface in values helmfiles will be the exact same [12:52:21] I can take a look later today but on first glance you're not using modules but manual copy of stuff which needs to be avoided. Please see https://gitlab.wikimedia.org/repos/sre/sextant [12:52:48] right hm okay... [12:53:14] right [12:56:58] okay jayme think i fixed that [13:02:01] ottomata: hmm...I'm confused. Why would the operator talk to zookeeper? [13:03:18] I thought this is what rdf-streaming-updater does by using k8s config maps (keeping track of the last tombstone and leader election). Shouln't htat be done by the flink cluster itself? [13:05:03] that's what I thought too, but it looks like the operator does too? https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/#jobmanager-high-availability [13:06:00] I don't see where that is written [13:06:03] trying to resolve this https://logstash.wikimedia.org/app/discover#/doc/0fade920-6712-11eb-8327-370b46f9e7a5/ecs-k8s-1-1.11.0-6-2023.21?id=o2aQSIgBJcVXP4KOJQYz [13:06:52] "the operator supports ... Zookeeper HA Services" [13:07:02] i'm not 100% sure why it cares though [13:07:59] well. I did read that as "the operator supports running flink clusters with zookeeper" - but the log suggests indeed it tries to clean up stuff [13:08:14] maybe that's part of his housekeeping stuff to remove the mess the flink cluster made in zookeeper [13:08:28] ya perhaps [13:08:44] do you allow the flink cluster (flink-app) to talk to zookeeper? [13:09:27] https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L175-L183 [13:09:29] jayme: yes [13:09:45] does it need to talk to zookeeper? :D [13:10:20] a q I am asking myself too, I had assumed so... [13:12:01] I thought the same. FWIW I don't see anything apart from that cleanup code in the operator [13:12:26] so creating that data is probably still part of the jobmanager itself [13:12:27] yes. those zk confs are for flink. and, we did get farther along after we enabled zk egress for flink [13:12:32] yeah [13:13:39] i do see code in flink about managing ha eleciton with zk [14:27:00] thanks jayme fixed your comment, also added egress.dst_nets in dse and staging eqiad values to see if diff looked correct. i think it does [14:40:05] jayme: i'm going to merge this to get some stuff done before later meetings. please if there's anyting else let me know and we can change it [14:43:17] ottomata: you should add fixtures to the chart to have that new template rendered [14:43:40] hm, k with networkpolicy dst nets okay [14:43:56] yeah. there is probably something in _skel [14:44:25] _scaffold, sorry [14:44:34] jayme: can I just add them to my .fixtures or do I need to copy the whole file? [14:44:53] you can just add them to an existing fixture if you like [14:44:58] k thanks [14:46:20] also the README is no longer correct because of the changed egress.enabled key [14:46:26] you are da best [14:48:48] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10jbond) [14:51:19] okay jayme done [14:53:20] ty [14:53:22] ottomata: +1ed - I missed out on double checking the IPs though [14:53:27] s'okay [14:53:34] we'll see :-p [14:53:36] :) [14:53:37] we will [14:53:38] haha [14:58:07] jayme: also when you get a chance (less hurry on this one) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/922138 [15:26:45] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10MatthewVernon) @dcausse just to be clear, do you still need the associated ms-swift account (which I think is `search_backup`?)? [15:35:35] ottomata: 🤦 @ zookeper version dependency [15:40:16] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10dcausse) @MatthewVernon yes we'd like to keep it, we don't use it on a regular basis but it might happen that we need to dump the elasticsearch content to s... [15:40:51] jayme: mwarf indeed [15:41:08] we did plenty of local minikube tests, but of course the dev zk images we pull in are later versions [15:41:33] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10MatthewVernon) OK, thanks for confirming; I think that means there's no swift-related action needed on this ticket. [15:58:38] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10dcausse) @MatthewVernon just double checked with the team and it seems that the account we requested for these recovery procedures is `search_platform` (c.f... [17:24:32] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10CodeReviewBot) otto opened https://gitlab.wikimedia.org/repos/data-... [17:28:17] 10serviceops, 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10CodeReviewBot) otto merged https://gitlab.wikimedia.org/repos/data-... [21:34:11] 10serviceops, 10Release-Engineering-Team, 10SRE, 10Continuous-Integration-Config, 10Test-Coverage: Add pcov PHP extension to wikimedia apt so it can be used in Wikimedia CI - https://phabricator.wikimedia.org/T243847 (10thcipriani)