[07:08:24] 10serviceops, 10Data-Engineering, 10Epic, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JMeybohm) [07:09:18] 10serviceops, 10Data-Engineering, 10Epic, 10Event-Platform Value Stream (Sprint 10), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JMeybohm) Could you please share resource requirements for the operator from your experiments on DSE here... [08:33:14] folks I'd like to rollout https://gerrit.wikimedia.org/r/c/operations/puppet/+/901551 [08:33:49] this has already been applied to the other kafka clusters, it switches the brokers truststores to one containing both puppet and PKI ca certs [08:34:00] it requires a roll restart of the brokers via cookbook, nothing more [08:34:17] but it will allow us to upgrade to PKI one broker at the time etc.. [08:34:31] there is no real hurry but I thought to progress the work anyway [08:34:33] lemme know [08:36:44] +1 [08:38:03] thanks :) [08:39:45] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Increase memory_limit for jobrunners to $wmgMemoryLimitParsoid - https://phabricator.wikimedia.org/T333528 (10Clement_Goubert) [08:40:08] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Increase memory_limit for jobrunners to $wmgMemoryLimitParsoid - https://phabricator.wikimedia.org/T333528 (10Clement_Goubert) p:05Triage→03High [08:58:55] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Thumbor-k8s performance improvements - https://phabricator.wikimedia.org/T333445 (10akosiaris) I 've upload a couple of changes to switch summaries to histograms in both environments. That way we will be able to have aggregatable data acros... [09:26:48] o/ uploaded https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/904464/ to illustrate the problem I'm facing, using the mesh sidecar container without exposing a public_port (only use listener) [09:31:33] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [09:45:59] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10DAlangi_WMF) @daniel and I spoke about this today in our sync call. I'll be investigating which services are using parsoid via RESTBase. On... [09:47:49] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10DAlangi_WMF) 05Open→03In progress [09:59:08] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10DAlangi_WMF) [10:02:03] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [10:05:21] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10Joe) A starting point for this investigation can be which services currently call restbase: https://grafana... [10:41:21] I'd like to start moving the rest-gateway forward soon if anyone has time spare - dns https://gerrit.wikimedia.org/r/904493 basic puppet stuff https://gerrit.wikimedia.org/r/c/operations/puppet/+/891510 [10:58:38] looks good to me. I only wonder why CI did not catch the missing TTL - might be worth a phab task? [11:02:41] 10serviceops, 10SRE-Sprint-Week-Sustainability-March2023, 10Sustainability (Incident Followup): Expand upon Kask/Sessionstore documentation - https://phabricator.wikimedia.org/T320398 (10hnowlan) >>! In T320398#8718719, @akosiaris wrote: > * Some links to important graphs to look at and correlate when in an... [11:02:53] jayme: thank you! [11:02:58] From bind's perspective the missing TTL is still valid, it just applies the default so I dunno - is it something we want to enforce? I guess if it's worth fixing then it's worth enforcing [11:04:02] hm..no idea tbh. [11:05:45] dcausse: oh, I did not see your message here. As said in q I would lean towards fixing this differntly (by installing wmf-certificates in the envoy image) maybe. You mind creating a phab task to we can discuss? [11:12:45] <_joe_> hnowlan: we don't use bind :) [11:13:13] <_joe_> but the rest is still valid :) [11:14:58] oops, ofc [11:29:54] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) Thank you for this list Joe. It's very useful and helpful. I'll help me narrow down the search... [11:30:27] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [12:23:53] 10serviceops: Install wmf-certificates on the envoy docker image - https://phabricator.wikimedia.org/T333551 (10dcausse) [12:35:50] proceeding with kafka-main codfw's restart now :) [13:28:28] 10serviceops, 10Patch-For-Review: Install wmf-certificates on the envoy docker image - https://phabricator.wikimedia.org/T333551 (10JMeybohm) Updated image: `docker-registry.discovery.wmnet/envoy:1.18.3-2` A patch to envoy-future was not required as that image already includes wmf-certificates. To actually ma... [13:30:55] helm parts of rest-gateway https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/895327/ (last ask for today on this hopefully :)) [13:50:54] hnowlan: I've added myself as reviewer but might be tomorrow [13:53:33] thanks! [13:59:45] _joe_: if I do a "sextant vendor -f" after this https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/904538/1 it still pulls me configuration_1.1.1 - is ther anything else to it? [14:00:18] <_joe_> jayme: you changed to 1.2.0 [14:00:24] yep [14:00:30] <_joe_> so you need to explicitly bump dependency in package.json [14:00:34] ah [14:00:43] <_joe_> you indicated you're breaking compatibility [14:00:53] lol [14:00:53] <_joe_> by bumping a minor :) [14:00:56] "Update all dependencies to the latest patch version" [14:01:05] <_joe_> *patch* [14:01:07] <_joe_> :) [14:01:10] I stopped reading after the first 3 words :D [14:01:21] 💪 [14:01:32] <_joe_> a new level of TLDR [14:01:54] <_joe_> I mean, I fully expect in 5 years we'll need to make man pages tiktok reel stories [14:01:57] I just read what I wanted to read :D [14:10:00] _joe_ │ by bumping a minor :) < That's illegal. [14:10:05] 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): Add kamila to ops group - https://phabricator.wikimedia.org/T333565 (10hnowlan) [14:10:15] <_joe_> ^_^ [14:10:57] * _joe_ adds "tell me your favourite pun" to the list of interview questions [14:11:13] <_joe_> the right answer is "puns suck, unless you're mel brooks" [14:12:18] You're just grumpy [14:12:25] Embrace the pun [14:13:35] 10serviceops, 10Patch-For-Review, 10Platform Team Workboards (Platform Engineering Reliability): Add kamila to ops group - https://phabricator.wikimedia.org/T333565 (10Kappakayala) Approved! [14:26:52] * kamila_ comes back to IRC after a coffee break, sees "bump a minor", takes another break [15:10:37] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking), 10Patch-For-Review: Increase memory_limit for jobrunners to $wmgMemoryLimitParsoid - https://phabricator.wikimedia.org/T333528 (10Clement_Goubert) 05Open→03In progress [15:31:44] 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): Add kamila to ops group - https://phabricator.wikimedia.org/T333565 (10hnowlan) 05Open→03Resolved [16:47:05] 10serviceops, 10Observability-Metrics, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Migrate use of infrastructure_users tokens to client certificates - https://phabricator.wikimedia.org/T325268 (10JMeybohm) a:03JMeybohm [17:03:41] The helm-lint job feels like it is slow these days. Has anyone thought about ways we could cut down on the number of charts tested for each patch? [17:04:21] https://integration.wikimedia.org/ci/job/helm-lint/9911/console was a bit over 5 minutes of runtime.