[04:16:25] <wikibugs>	 06serviceops, 10LPL Essential, 10MinT, 10Community Wishlist (Translations), 10Community-Tech (Ezo Red Fox (July 29 - Aug 9, 2024)): Caching service request for MinT - https://phabricator.wikimedia.org/T370755#10056502 (10santhosh) @jijiki That should be ok. Our team capacity is also thin in this month....
[07:18:12] <wikibugs>	 06serviceops, 07Kubernetes: Better visibility for throttled pods - https://phabricator.wikimedia.org/T372241 (10fgiunchedi) 03NEW
[07:19:55] <wikibugs>	 06serviceops, 10observability, 07Kubernetes: Alert on unscrapable pods - https://phabricator.wikimedia.org/T372242 (10fgiunchedi) 03NEW
[07:43:40] <wikibugs>	 06serviceops, 10observability, 07Kubernetes: Alert on unscrapable pods - https://phabricator.wikimedia.org/T372242#10056802 (10JMeybohm) With how the prometheus service discovery currently works (e.g scraping every container port by default) we do have a large number of "okay to be down" targets, so an alert...
[07:51:03] <wikibugs>	 06serviceops, 07Kubernetes: Better visibility for throttled pods - https://phabricator.wikimedia.org/T372241#10056826 (10JMeybohm) Generally speaking throttling is not an issue (as long as availability/latency targets are still met) but more a measure against processes going rough (so it's very common and kind...
[09:19:52] <wikibugs>	 06serviceops, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: cfssl-issuer: Generate Kubernetes Events - https://phabricator.wikimedia.org/T337928#10056995 (10JMeybohm) a:03JMeybohm
[12:02:43] <wikibugs>	 06serviceops, 07Kubernetes: Better visibility for throttled pods - https://phabricator.wikimedia.org/T372241#10057336 (10fgiunchedi) That's fair, thank you for the rationale @JMeybohm ! Feel free to resolve/decline the task as you see fit
[12:11:08] <wikibugs>	 06serviceops, 10observability, 07Kubernetes: Alert on unscrapable pods - https://phabricator.wikimedia.org/T372242#10057363 (10fgiunchedi) Indeed on the pod granularity the alert would be noisy, I checked the data in terms of "percentage of reported `up`" by namespace + app and maybe this has more signal? ht...
[12:13:16] <wikibugs>	 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10057365 (10JMeybohm) One thing I've noticed is that kafka-main2010 seems to have a different disk then all the others (all others are 1.7T models):  ` sde...
[13:06:43] <wikibugs>	 06serviceops, 07Kubernetes: Remove deprecated cloudnative-pg charts from chart-museum - https://phabricator.wikimedia.org/T371667#10057600 (10brouberol) Due to a review error, we also had a chart misnaming with a chart called `cluster` instead of `cloudnative-pg-cluster`. Could you remove the `cluster` chart a...
[13:16:59] <wikibugs>	 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Install (2) 960GB SSDs each in kafka-main20[06-10] - https://phabricator.wikimedia.org/T371423#10057632 (10Jhancock.wm) yes, I have a surplus of 1.7G disks and almost no 1G. so you get a bonus.
[14:09:01] <wikibugs>	 06serviceops, 07Kubernetes: Remove deprecated cloudnative-pg charts from chart-museum - https://phabricator.wikimedia.org/T371667#10057894 (10JMeybohm) Just to be extra sure, you want the following to be removed:   - stable/cloudnative-pg-operator-0.2.0.tgz   - stable/cloudnative-pg-operator-crds-0.1.0.tgz   -...
[15:02:51] <wikibugs>	 06serviceops, 07Kubernetes: Remove deprecated cloudnative-pg charts from chart-museum - https://phabricator.wikimedia.org/T371667#10058085 (10brouberol) Yes, that's perfect.
[15:41:50] <nemo-yiannis>	 Hi! We were reviewing some grafana boards for parsoid and the cluster overview shows up like it has no data: https://grafana.wikimedia.org/goto/xTAvcVjIR?orgId=1
[15:41:58] <nemo-yiannis>	 Is this related to k8s migration ?
[15:42:30] <nemo-yiannis>	 Correction: it shows up metrics for the parsoid cluster, but utilization is very low 
[15:43:03] <cdanis>	 nemo-yiannis: yes, all parsoid traffic has been moved to k8s, and I think the `parsoid` cluster of baremetal hosts is vestigal 
[15:43:12] <nemo-yiannis>	 ok, thanks!
[15:43:12] <hnowlan>	 something more useful here: https://grafana.wikimedia.org/d/U7JT--knk/mediawiki-on-k8s?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-service=mediawiki&var-namespace=mw-parsoid&var-container_name=All&refresh=1m&from=now-12h&to=now
[15:43:29] <cdanis>	 or also https://grafana.wikimedia.org/d/35WSHOjVk/application-servers-red-k8s?orgId=1&refresh=1m&var-site=All&var-deployment=mw-parsoid&var-method=GET&var-code=200&var-handler=php&var-service=mediawiki I think?
[15:44:41] <nemo-yiannis>	 sounds good, thanks!
[17:44:46] <bd808>	 brouberol: you probably saw by now, but the bug was caused by the puppet change where quoting of the integer map key was missed. Fixed up in <https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060915>, but I didn't submit a revert of the revert to put things back.
[17:49:23] <wikibugs>	 06serviceops, 06Data Products, 06Data-Platform-SRE, 10Dumps-Generation, and 2 others: Migrate current-generation dumps to run from our containerized images - https://phabricator.wikimedia.org/T352650#10058764 (10xcollazo) >>! In T352650#10051791, @dr0ptp4kt wrote: > - If I'm understanding correctly, people...
[17:49:58] <brouberol>	 I saw, that's on me! I did revert the revert, and everything seems to be working fine now
[17:54:54] <bd808>	 excellent. glad you are back towards being on track.
[19:10:57] <wikibugs>	 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Install (2) 960GB SSDs each in kafka-main10[06-10] - https://phabricator.wikimedia.org/T371422#10059256 (10VRiley-WMF) Thanks @JMeybohm Currently, at eqiad we don't have many 960 gig SSDs. However, we do have larger sizes. As I understand, t...
[19:12:38] <wikibugs>	 06serviceops, 10Shellbox, 06SRE, 10Charts (Sprint 3): Figure out how a shellbox instance for the Chart extension would work - https://phabricator.wikimedia.org/T370739#10059265 (10Catrope) 05Open→03Resolved Thank you for weighing in everyone! I think we've gotten enough useful advice here that we c...
[21:05:59] <wikibugs>	 06serviceops, 10LPL Essential, 10MinT, 10Community Wishlist (Translations), 10Community-Tech (Fennec Fox (Aug 12-23, 2024)): Caching service request for MinT - https://phabricator.wikimedia.org/T370755#10059861 (10MusikAnimal)