[09:06:41] 06serviceops, 10MW-on-K8s, 10Scap, 13Patch-For-Review: Evaluate the performance improvements brought in by prefetching MW images on WikiKube hosts - https://phabricator.wikimedia.org/T366778#9866679 (10akosiaris) [09:45:45] 06serviceops, 07Wikimedia-production-error: registry2004 sometimes reporting: too many open files problems - https://phabricator.wikimedia.org/T366481#9866810 (10JMeybohm) I see that we run nginx with the default debian nginx.conf which has `worker_connections 768;` and no `worker_rlimit_nofile` set. The gener... [10:00:32] 06serviceops, 07Wikimedia-production-error: registry2004 sometimes reporting: too many open files problems - https://phabricator.wikimedia.org/T366481#9866876 (10Clement_Goubert) Yes, that's what I figured out yesterday as well, and didn't get the time to lay down on task. Let's set it to 768*2 and see what ha... [11:15:25] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9867031 (10hnowlan) >>! In T364921#9866023, @Scott_French wrote: > The last log line is: > > ` > {"@timestamp":"2024-06-05T22:49:28Z","message":"Conne... [13:48:00] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9867423 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by kamila@cumin1002 for host wikikube-ctrl1001.eq... [13:59:30] 06serviceops, 10MW-on-K8s, 10Scap, 13Patch-For-Review: Evaluate the performance improvements brought in by prefetching MW images on WikiKube hosts - https://phabricator.wikimedia.org/T366778#9867458 (10akosiaris) [14:03:55] 06serviceops, 10Thumbor, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): [XL] Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881#9867474 (10Ladsgroup) >>! In T336881#9859198, @TheDJ wrote: > Can anyone update the ticket with the current state ? > I believe thumbor hasn't... [14:35:44] hi folks! [14:36:07] for some reason ther is a sessionstore pod running on mw1390, that has no dedicated="kask" taint [14:36:35] what is the procedure? Simply kill it? [14:51:32] {{done}} [14:54:13] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9867690 (10Scott_French) Thanks for taking a look, @hnowlan! Agreed, yeah - I was initially suspicious of a networking issue, but after verifying that... [14:58:39] elukey: yeah that happens because of the reboots, since the dedicated kask nodes are in the same taint groups, they get cordoned at the same time, pod gets evicted and scheduled on a non kask node [14:59:02] elukey: Usually since we're rebooting the whole cluster, it just gets re-evicted later and scheduled to a rebooted kask node [14:59:35] Thanks for kicking it back <3 [15:00:25] ack got it thanks! [15:21:16] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9867836 (10Scott_French) I just had a very interesting conversation with @Sfaci about the initialDelaySeconds recently added to AQS 2.0 services. In s... [15:37:36] 06serviceops, 10Thumbor, 10Structured-Data-Backlog (Current Work): [XL] Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881#9867938 (10Jdforrester-WMF) 05Open→03Resolved a:03hnowlan Resolved enough? [18:49:49] 06serviceops, 06DC-Ops, 10ops-eqiad, 06SRE-OnFire, 10Sustainability (Incident Followup): eqiad:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366204#9868742 (10kamila) @VRiley-WMF the reimage of wikikube-ctrl1001 was finally successful, I want to run a few more tests due to hav... [18:59:56] 06serviceops, 06MW-Interfaces-Team, 06Traffic: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9868777 (10daniel) [21:08:15] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9869298 (10Scott_French) Many thanks to @Eevans for humoring my experiments. The results are in, and it seems that upgrading from gocql v1.2.0 to v1.... [21:08:30] 06serviceops, 10Cassandra, 06Data Products, 06SRE, and 2 others: Commons Impact Metrics: Data Gateway endpoints - https://phabricator.wikimedia.org/T364921#9869304 (10Scott_French)