[05:35:21] 10serviceops, 10Beta-Cluster-Infrastructure, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.0 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10tstarling) >>! In T340542#8973199, @dom_walden wrote: > Or compare: > * https://en.wikti... [08:27:51] 10serviceops, 10iPoid-Service, 10Patch-For-Review, 10Service-deployment-requests: New Service Request 'iPoid' - https://phabricator.wikimedia.org/T325147 (10kostajh) [08:35:35] 10serviceops, 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway): Review LiftWing's usage of the API Gateway - https://phabricator.wikimedia.org/T340982 (10elukey) [08:38:27] 10serviceops, 10Machine-Learning-Team, 10Platform Team Initiatives (API Gateway): Review LiftWing's usage of the API Gateway - https://phabricator.wikimedia.org/T340982 (10elukey) I had a chat with @akosiaris and @Joe on this subject, and we agreed to the following compromise: * raise anonymous traffic to s... [09:32:23] 10serviceops, 10Kubernetes, 10Patch-For-Review: Add a second control-plane to wikikube staging clusters - https://phabricator.wikimedia.org/T329827 (10jijiki) [09:42:51] 10serviceops, 10Observability-Alerting, 10Traffic: Timeouts when talking to phabricator API - https://phabricator.wikimedia.org/T341039 (10fgiunchedi) [10:50:18] 10serviceops, 10Beta-Cluster-Infrastructure, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.1 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10TheresNoTime) [10:51:39] 10serviceops, 10Beta-Cluster-Infrastructure, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Install wikidiff2 1.14.1 deb on deployment-prep & test - https://phabricator.wikimedia.org/T340542 (10TheresNoTime) @MoritzMuehlenhoff (see above for context) — we've released `1.14.1` to fi... [10:55:18] duesen: fyi, we merged and deployed a change to switch some changeprop metrics from summaries to histograms. As a result the backlog graphs are temporariliy broken, we 'll get them fixed. But the old graph was wrong to begin with (it was doing statistically false aggregations) [11:11:58] 10serviceops, 10Kubernetes: Add a second control-plane to wikikube staging clusters - https://phabricator.wikimedia.org/T329827 (10jijiki) 05Open→03Resolved [11:24:07] 10serviceops, 10Data Engineering and Event Platform Team (Sprint 0), 10Event-Platform (Sprint 14 B): Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10gmodena) @JMeybohm did something change in the staging operator deployment? R... [11:31:25] 10serviceops, 10Data Engineering and Event Platform Team (Sprint 0), 10Event-Platform (Sprint 14 B): Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10JMeybohm) I did not check git/deployments but I don't think anybody apart from... [11:57:35] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [11:57:46] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [12:27:43] hi folks! [12:27:49] going to deploy the api-gateway [12:27:57] cc: hnowlan, kamila_ --^ [12:34:06] 10serviceops, 10Kubernetes: Add a second control-plane to wikikube staging clusters - https://phabricator.wikimedia.org/T329827 (10jijiki) [12:37:45] hnowlan: qq just to avoid issues - afaics https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/933427 is going out with my deployment, because I see some quoted values, is it ok? [12:39:13] and the other thing is - I don't see any diff for, does it need a chart's version bump? [12:39:18] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/933084 [12:39:29] Cc: kamila_ [13:16:23] elukey: my bad, looking [13:19:16] not at all, I asked to avoid doing messes myself :) [13:19:21] elukey: quoted values are fine [13:19:46] there's a bump for the mem limit thing here https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/935428 but if you'd prefer to get your change out solo no big deal [13:19:54] ah nice, nm :D [13:19:59] nono all good I am happy to deploy all :) [13:21:52] merging, ty [13:21:58] 10serviceops, 10Observability-Alerting, 10SRE, 10Traffic: Timeouts when talking to phabricator API - https://phabricator.wikimedia.org/T341039 (10fgiunchedi) [13:22:54] all right deploying then :) [13:23:18] or do you prefer to do it to check etc.. ? [13:26:03] hnowlan, elukey, btw, we're planning on deploying this https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/905947 with _joe_ this week [13:26:35] I'll bump the chart version [13:27:47] claime: ack! On Lift Wing we call api-ro.discovery.wmnet, do you prefer us to call mw-api-int.discovery.wmnet ? [13:28:10] elukey: what's the call volume ? [13:28:37] 10serviceops, 10Observability-Alerting, 10SRE, 10Traffic: Timeouts when talking to phabricator API - https://phabricator.wikimedia.org/T341039 (10fgiunchedi) [13:28:39] I don't think we'd identified liftwing, are you calling it through envoy or direct? [13:28:46] claime: rad [13:28:59] elukey: go for it, looks safe on staging [13:29:06] hnowlan: If you have a quick way to check it still works correctly after merge, I'd be grateful :P [13:29:13] claime: it depends, so far not a lot but we are ramping up (ORES will be moved to Lift Wing etc..) [13:29:38] claime: we use istio sidecars/mesh, not your mesh sadly [13:29:47] hnowlan: ack proceeding with codfw then [13:30:32] claime: yeah for sure, staging is pretty representative of prod in that regard [13:30:34] elukey: ack, I'll discuss it with j.oe, we probably will finish up those that got left to rot (api-gateway and termbox) first, but in the end everything will be moved to it [13:30:59] sure! [13:31:31] j.oe is always happy when I bring up new stuff from Lift Wing :D [13:33:01] :D [13:42:12] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Allow to address Kubernets API servers from NetworkPolicy - https://phabricator.wikimedia.org/T287491 (10jijiki) a:03jijiki [14:06:21] sorry, back, was busy badly burning my hand while attempting to make lunvh [14:06:46] (tyoing with one hand is hard, as you can see) [14:07:29] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) All nodes and most k8s deployments have been updated to run 1.23.10, only exceptions are api-gateway and rest-gateway which still run 1.18 as well as da... [14:08:14] kamila_: 😬 ouch [14:08:35] kamila_: :| eek [14:09:09] 10serviceops, 10SRE, 10envoy: Refactor envoy max_requests_per_connection from Cluster to HttpProtocolOptions - https://phabricator.wikimedia.org/T304124 (10JMeybohm) 05Stalled→03Open [14:09:14] 10serviceops, 10SRE, 10Traffic, 10envoy, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10JMeybohm) [14:09:30] I'll live, it's just... why does something happen every time I should not be afk? '^^ [14:21:19] "should not be afk" is a concept lol [14:37:09] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10MatthewVernon) @dcausse Are you able to confirm I can dispose of the `search:backup` ms-swift account, please? Or if not do you know who can give the OK? [14:50:32] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10dcausse) @MatthewVernon yes we can delete this account and containers (cc @EBernhardson) [15:03:39] 10serviceops, 10SRE, 10Patch-For-Review: Use encrypted rsync for deployment::rsync - https://phabricator.wikimedia.org/T289857 (10Clement_Goubert) 05In progress→03Resolved Deployed, data transfer works between deploy2002 and deploy1002. Resolving. [15:15:38] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Direct 0.5% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341078 (10Clement_Goubert) [15:15:56] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [15:16:08] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Direct 0.5% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341078 (10Clement_Goubert) 05Open→03In progress p:05Triage→03High [15:17:15] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Clement_Goubert) [15:41:01] 10serviceops, 10SRE-swift-storage: Remove search:backup swift account and storage - https://phabricator.wikimedia.org/T341081 (10MatthewVernon) [15:41:51] 10serviceops, 10SRE-swift-storage, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10MatthewVernon) Thanks for confirming; I'll track that work on the new subtask (and remove the swift storage tag from this one). [15:42:06] 10serviceops, 10Wikimedia-Site-requests: Cleanup cirrus keys in $wmfSwiftEqiadConfig - https://phabricator.wikimedia.org/T199220 (10MatthewVernon) [17:31:25] 10serviceops, 10Data Engineering and Event Platform Team (Sprint 0), 10Event-Platform (Sprint 14 B): Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10gmodena) Right now staging deployments are working, but `mw-page-content-chang... [20:21:12] 10serviceops, 10Data Engineering and Event Platform Team (Sprint 0), 10Event-Platform (Sprint 14 B): Flink k8s operator in staging sometimes will not sync changes to FlinkDeployments - https://phabricator.wikimedia.org/T340059 (10gmodena) > Keep you posted. I'll open a dedicated phab task if needed. This HA...