[07:39:14] jelto: Hey! I updated the patch here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/726891 Wanna take a look? [07:40:59] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Add label kubernetes.io/metadata.name to all namespaces - https://phabricator.wikimedia.org/T290476 (10JMeybohm) The change has been deployed to main clusters. [07:46:11] 10serviceops, 10MW-on-K8s, 10SRE-swift-storage, 10Shellbox: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10fgiunchedi) >>! In T292322#7406310, @Legoktm wrote: >>>>! In T292322#7403338, @Legoktm wrote: >>> @fgiunchedi I'd appreciate your input on how this would potentially in... [08:48:54] 10serviceops, 10Maps: Create a dedicated tegola postgres user - https://phabricator.wikimedia.org/T292694 (10jijiki) [08:49:44] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10SRE, 10Service-deployment-requests: New Service Request tegola-vector-tiles - https://phabricator.wikimedia.org/T274390 (10jijiki) [08:49:48] 10serviceops, 10Maps, 10Patch-For-Review, 10User-jijiki: Deploy tegola-vector-tiles to kubernetes - https://phabricator.wikimedia.org/T283159 (10jijiki) 05Openā†’03Resolved [08:51:14] 10serviceops, 10Maps, 10Patch-For-Review, 10User-jijiki: Deploy tegola-vector-tiles to kubernetes - https://phabricator.wikimedia.org/T283159 (10jijiki) Tegola is running on kubernetes, #maps mirrored 100% of production traffic where we had no #sre-swift-storage issues. šŸŽ‰ [09:28:30] 10serviceops, 10MW-on-K8s, 10SRE-swift-storage, 10Shellbox: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10tstarling) I did consider having swift access in Shellbox, but we didn't have a use case for it, and allowing network access and giving it a swift secret means there is... [09:40:45] 10serviceops, 10SRE-swift-storage: Allow maps2009/maps1009 (master nodes) access thanos-swift - https://phabricator.wikimedia.org/T292700 (10Jgiannelos) [09:46:00] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) Running some tests (c=60, ~1.9m URLs) agains mwdebug services, we found 2 issues: 1) Our client was returning the following e... [10:01:27] jelto effie I updated the cronjob patch for tegola with some context here: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/726891/7#message-18a96b0d521c7bd153855ce94bf18297e2bacc6d, that said i am not sure if i am missing something else for envoy [10:14:48] nemo-yiannis: I will let jelto have a look, I am not very familiar with cronjobs [10:20:56] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [10:21:02] 10serviceops, 10MW-on-K8s, 10SRE, 10MW-1.37-notes (1.37.0-wmf.20; 2021-08-23): Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 (10jijiki) [10:21:06] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Benchmark performance of MediaWiki on k8s - https://phabricator.wikimedia.org/T280497 (10jijiki) [10:59:50] 10serviceops, 10SRE: Migrate node-based services in production to node12 - https://phabricator.wikimedia.org/T290750 (10Pginer-WMF) [11:05:43] <_joe_> I have updated the information about adding a new service to kubernetes: now you only need to define the user tokens in one place https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service [11:05:57] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [11:10:37] 10serviceops, 10MW-on-K8s: Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707 (10jijiki) p:05Triageā†’03Medium [11:10:57] 10serviceops, 10MW-on-K8s: Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707 (10jijiki) [11:11:08] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10jijiki) [11:13:05] 10serviceops, 10MW-on-K8s, 10wikitech.wikimedia.org: Migrate Wikitech to Kubernetes - https://phabricator.wikimedia.org/T292707 (10Majavah) [12:06:02] 10serviceops, 10SRE-swift-storage: Allow maps2009/maps1009 (master nodes) access thanos-swift - https://phabricator.wikimedia.org/T292700 (10fgiunchedi) My two cents: I think what's needed here is get puppet to write the credentials on the filesystem (with adeguate ownership/permissions) in a format suitable f... [12:21:15] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10akosiaris) >>! In T290357#7406082, @bd808 wrote: >>>! In T290357#7405708, @akosiaris wrote: >> * We are trying to avoid having to allocate a P... [13:04:38] 10serviceops, 10Anti-Harassment, 10IP Info, 10SRE, 10Patch-For-Review: Update MaxMind GeoIP2 license key and product IDs for application servers - https://phabricator.wikimedia.org/T288844 (10phuedx) [13:20:41] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Cmjohnson) [13:21:14] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10Cmjohnson) [15:35:28] 10serviceops, 10SRE: Migrate node-based services in production to node12 - https://phabricator.wikimedia.org/T290750 (10Jdforrester-WMF) [17:50:27] <_joe_> bd808: I had an idea re: envoy not exiting in cronjobs [17:50:52] <_joe_> we could have the cronjob run a wrapper script that executes what you want, then tells envoy to shutdown using the admin interface [17:51:15] <_joe_> which inside the pod is available on localhost:1666 [17:52:50] <_joe_> a POST to localhost:1666/quitquitquit should do it https://www.envoyproxy.io/docs/envoy/latest/operations/admin [18:15:44] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Cmjohnson) [18:16:57] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Cmjohnson) The BIOS and Idracs are set up, kubernetes1020 would not power on. @Jclark-ctr can you call Dell about 1020, I am on holiday next week. I will insta... [18:17:16] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2: (Need By: TBD) rack/setup/install kubestage100[34].eqiad.wmnet - https://phabricator.wikimedia.org/T290894 (10Cmjohnson) [19:19:37] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Jclark-ctr) kubernetes1020 is powered on might of been delayed [20:12:27] _joe_: that sounds possible! I wonder if a PreStop hook could be used to send that signal? [20:13:41] Toolhub's job also has an mcrouter sidecar, but right now I could remove it as there really should not be valuable memcached interactions made by the crawler job. [20:13:42] <_joe_> possibly, but I'm not sure when that would Fire [20:15:10] *nod* I haven't tested yet either. I'll see if I can do a POC today/tomorrow [20:15:45] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[18-21] - https://phabricator.wikimedia.org/T290202 (10Cmjohnson) Iā€™m sorry, I meant 1021 in D3, U33 [20:17:42] All the docs I'm seeing so far talk about the PreStop hook from the point of view of the kublet terminating a pod. For this we would need an event related to a terminating container inside the pod I think... I'll keep looking [20:41:52] So yeah, this (jobs with sidecars) is a mess. Upstream knows it's a mess, but it's so messy that they closed the bug asking for it in favor of an enhancement proposal and then closed the enhancement proposal in favor of "they've gone back to the drawing board and will be coming up with some new KEPs" [20:42:33] https://github.com/kubernetes/kubernetes/issues/25908 && https://github.com/kubernetes/enhancements/issues/753 [20:44:06] ah, a new KEP that sounds less twisted is up at https://github.com/kubernetes/enhancements/issues/2872 [20:49:15] an annotation driven 3rd party controller -- https://github.com/nrmitchi/k8s-controller-sidecars [20:50:11] a `shareProcessNamespace: true` hack as well -- https://suraj.io/post/how-to-gracefully-kill-kubernetes-jobs-with-a-sidecar/ [21:02:41] <_joe_> yeah in our case we should have ways to do it in a nicer way [21:45:53] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-10), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10Legoktm) Has this been rolled out further since? {T292762} is reporting generating diffs is slower. [22:02:27] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-10), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10Daimona) >>! In T285857#7410752, @Legoktm wrote: > Has this been rolled out further since? {T29276...