[08:35:15] elukey | akosiaris: From scrollback I get the DSE + prometheus issue is not resolved, right? I can look into it then [08:35:59] jayme: I am looking into it [08:36:02] aren't you on PTO ? [08:36:14] I am about to upload my patch btw [08:44:39] elukey: I hijacked your gerrit change: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/905171 [08:44:47] I am testing it in wikikube now [08:46:47] effie: You around? I have some questions regarding maps :) [08:47:38] claime: for 35 euros you can buy this beauty https://vendora.gr/items/63y8n6/politikos-chartis-tis-ellados-geofisikos-paragogikos-amfiplevrosdekaetias50-ekdosis-agkiras-diastasis-100ch70-ekatosta.html [08:47:47] and have all your questions answered [08:48:25] * claime tries to use a paper map to fix the tile generation cronjob [08:48:28] IT'S NOT WORKING [08:49:02] this map has not changed in years, needs no cron job [08:49:17] lol [08:49:22] 1950s [08:49:35] akosiaris: ack sorry I was afk [08:49:42] let's hope it doesn't change anytime soon [08:49:47] lol akosiaris [08:49:54] claime: sorry, go ahead please [08:50:04] elukey: ok, it worked on wikikube [08:50:20] elukey: it ended up being more involved than just list nodes after all [08:50:29] which is why I hijacked your change [08:50:35] akosiaris: ahahah yes I was about to say: "basically what I wrote!" :D [08:50:48] thanks a lot [08:51:11] Basically the planet_sync_tile_generation-gis cron fails because it uses the wrong user. It isnĀ“t updated because in puppet tile_generation_command is undef for imposm3, which means we never update or remove the timer [08:51:18] I 've only test deployed on wikikube@eqiad, but I guess we wanna deploy it everywhere asap [08:51:25] My question is, should it actually run, or should it be removed ? [08:51:31] I can take over that one elukey, is that ok? [08:52:10] sure sure, just +1ed [08:52:44] and IIUC we don't need any puppet private change since the cluster role binding has the User: prometheus bit [08:53:01] yup [08:53:42] ack then I think we can probably proceed [08:54:13] I was on PTO yesterday. ;) [08:54:32] ahhh okok then let's wait for Janis' review [08:54:39] claime: I think this is the cron we have been meaning to remove [08:54:43] but never got to it [08:55:02] wansnt that alert silenced [08:55:19] It probably was, until yesterday [08:55:25] elukey: Am I understanding right that this is not related to DSE but all clusters? [08:56:28] It was silenced until july but I think the alert host failover yesterday may have reset these downtimes or something [08:56:53] I'll resilence it [08:57:15] claime: if it is not done by the hackathnon, nemo-yiannis and I promise to fix it then [08:57:23] a'ight [08:57:30] cheers and sorry for the noice [08:57:31] noise [08:57:53] no worries [09:00:20] jayme: correct yes [09:00:35] ah, okay. All makes way more sense now [09:00:46] *All of it [09:03:47] effie, claime: I believe that was caused by me removing a bunch of legacy stuff - afaict we can just remove that timer but I will confirm and do that today [09:04:06] hnowlan: no no it has been going on for months and we have been lazy [09:04:16] myself included [09:07:24] it has but this is a new version of the same problem :D [09:07:46] my change removed the pregeneration_command param [09:08:00] lol [09:08:20] I have to admit it took me a minute to track down why the timer was there, but not being updated [09:08:36] jayme: elukey: ottomata: issue fixed. https://grafana.wikimedia.org/goto/PAb1YVYVz?orgId=1 now has pods from all namespaces [09:08:51] nice work! [09:09:21] I'm not sold :) see #k8s-sig [09:14:10] -- [09:14:28] I've checked quite a bit (via tshark) traces of TLS handshake failures on kafka-main1001 but so far all good [09:15:01] when you want I think that we can proceed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/905251 (slowly, one not at the time etc..) [10:08:11] pooling codfw thumbor-k8s, and then depooling codfw non-k8s for a bit [10:11:42] actually not doing that ^, no need [10:17:03] 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Post Kubernetes v1.23 cleanup - https://phabricator.wikimedia.org/T328291 (10JMeybohm) [10:17:11] doing 50/50 in eqiad instead [10:28:19] ack [10:50:52] Are any of you cool kubernetes people going to be in Athens around the Hackathon? At least two of us from WMDE's Wikibase.cloud team are hoping to be there; I know we'd talked a bit about maybe spending some time together back in 2022; even if you're not going to be at the event proper I was wondering if you might like to meet up [10:52:14] nor sure about the cool people but I won't be there unfortunately [11:01:04] hehe, I know there's usually a push to not swamp the hackathon with staff which was why I was thinking if it was necessary we could also do something along side (or maybe just after) it [11:05:01] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10jijiki) [11:06:05] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10jijiki) [11:11:30] tarrow: I 'll be there. Definitely not cool though [11:12:45] hnowlan: I owe you to update the thumbor dashboard panels to properly work now that we changed histograms to summaries, I haven't forgotten [11:12:56] sigh summaries to histograms [11:13:24] hah, I have you both in the cool category (surely because you are way cooler than I am) [11:16:35] akosiaris: would you be interested in doing something on the Monday (22nd) like we vaguely talked about in Berlin? [11:20:56] akosiaris: I've been doing a little bit of messing with them since pooling, might have some of the work done (although getting meaningful maxes broken) [11:36:02] depooled eqiad thumbor-k8s fyi [11:42:05] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10Clement_Goubert) We've gone over the maths again with @akosiaris and the current provisioning for the `jobrunner` cluster should be able to handle the load tran... [11:42:19] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10Clement_Goubert) [12:19:55] 10serviceops, 10PoolCounter, 10Performance-Team (Radar): poolcounter-exporter upgrade - https://phabricator.wikimedia.org/T333947 (10fgiunchedi) [12:43:02] tarrow: you mean after the hackathon? It's a school day, I 'll only have a few hours before I have to pick up the kid from school, but we can try it out. [12:51:19] 10serviceops, 10API Platform, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10VirginiaPoundstone) [13:05:45] ah lovely, we need to refresh all kafka main nodes [13:15:12] 10serviceops, 10API Platform, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [13:15:24] 10serviceops, 10API Platform, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [13:21:00] 10serviceops, 10PoolCounter, 10Performance-Team (Radar): poolcounter-exporter upgrade - https://phabricator.wikimedia.org/T333947 (10fgiunchedi) [13:43:51] 10serviceops, 10API Platform, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [13:44:45] 10serviceops, 10API Platform, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [14:03:17] 10serviceops, 10RESTbase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10Kappakayala) Hi @daniel , as the Svc Ops team is figuring out what needs to be done, I would like to understand the priority of this task. The reason I am askin...